danepeterson
danepeterson
  • Threads: 11
  • Posts: 43
Joined: Apr 23, 2012
September 29th, 2019 at 2:27:20 PM permalink
Hey everyone,

Ok so I was trying to understand Standard Deviation as it relates to the probability of a Single Zero Roulette wheel. Not the Standard Deviation of a data set or a population of results from the wheel but before you get any data. I think it's called Binomial Standard Deviation. Not sure, I've had trouble understanding as it seems there are "two types" of Standard Deviation. One that you calculate from a set of numbers to see how spread out that data is from the Mean. And the other is calculated before any data by setting "n" trials and "p" odds. Hopefully someone could clear that up for me?

I know Mean is (n * p)
I know Variance is (n * p * q)
I know the Standard Deviation is Sqrt(n * p * q)

Where,
"n" is number of spins or trials.
"p" is probability of success.
"q" is probability of failure. (1-p)

So for example, in 37 spins playing an even chance bet the odds of success is (18/37) and odds of failure is (19/37).

The Mean would be 37 * (18/37) = 18
Real world, as I approach infinite spins I would average 18 wins in 37 spins.

The Standard Deviation would be Sqrt(37 * (18/37) * (19/37)) = 3.04027025826
Real world, +3 or -3 successes would be 1 Standard Deviation from the Mean in either direction. So 21 or 15 successes in 37 spins would be 1 Standard Deviation from the Mean. 24 or 12 successes in 37 spins would be 2 StD from the Mean. And so on.

Obviously the sample is too small to have it fall within Normal Distribution but I'm just using this example so I can understand the math. Hopefully I am right so far as I think I understand it.

Now this is the number that I can't seem to understand. The "Variance" and how it relates.

Variance = 37 * (18/37) * (19/37) = 9.24324324324

What does that number 9 tell me? I thought Variance was the average of the squared differences from the Mean. So how can you have a number for Variance without any spins recorded? I don't know why I can't seem to wrap my head around what a Variance of 9 means in my example above. And that number stays constant with a multiple of 10 if I increase "n" by a multiple of 10.

Variance = 37 * (18/37) * (19/37) = 9.24324324324
Variance = 370 * (18/37) * (19/37) = 92.4324324324
Variance = 3,700 * (18/37) * (19/37) = 924.324324324
Variance = 37,000 * (18/37) * (19/37) = 9,243.24324324

Some guesses would be that it says in 37 spins I should expect a "Variance" of + or - 9 from the Mean? Or maybe its just a total of 9 around the Mean so + or - 4.5 from the Mean?

Thank you.
Dane Peterson
KevinAA
KevinAA
  • Threads: 18
  • Posts: 283
Joined: Jul 6, 2017
Thanked by
danepeterson
September 30th, 2019 at 1:21:04 AM permalink
It's because the units are squared. Assume a $1 bet on even playing roulette, and do this 37 times. Start with 37 one dollar chips and stack up the winners separately. On average, you'll have 18*2=36 chips in the winning stack.

The variance is 37*$1*18/37*19/37=9.24 square dollars. That's why is not real useful by itself. What are square dollars?

When you take the square root of that to get standard deviation, the square root of ($1)^2=$1, and $1 is real whereas a square dollar is not.
Ajaxx
Ajaxx
  • Threads: 4
  • Posts: 36
Joined: Sep 15, 2019
Thanked by
danepetersondjtehch34t
September 30th, 2019 at 8:02:21 AM permalink
Dane, it seems like you've really made an effort to understand this yourself first, so kudos on a solid start. I am by no means a stats expert, as you can probably tell by how long-winded this explanation ended up being, but hopefully it still helps a bit.

Quote: danepeterson

Ok so I was trying to understand Standard Deviation as it relates to the probability of a Single Zero Roulette wheel. Not the Standard Deviation of a data set or a population of results from the wheel but before you get any data. I think it's called Binomial Standard Deviation. Not sure, I've had trouble understanding as it seems there are "two types" of Standard Deviation. One that you calculate from a set of numbers to see how spread out that data is from the Mean. And the other is calculated before any data by setting "n" trials and "p" odds. Hopefully someone could clear that up for me?


There is only one type of standard deviation. It's always just the square root of the variance, which in turn is exactly as you said: the average of the squared differences from the mean. In the real world, the only way to come up with a standard deviation is to do a bunch of trials, calculate the mean, calculate the difference between each data point and that mean, and then average the squares of all those numbers to get the variance, the square root of which is your sample standard deviation, s. And even that is just the s of our limited data; if we wanted to know the "true" standard deviation (what we call the population standard deviation, σ) we would have do an experiment with an infinite number of trials. Luckily, in the abstract world of math, it's possible to know ahead of time the exact probability p of something happening, and when we do, mathematical formulas can act us shortcuts that save us the work of an infinitely-long experiment. The formula you gave, √(n * p * q), is not a "second type" of standard deviation; it's just one of those shortcuts. Specifically, we use it to calculate the "true" standard deviation (the σ we would get from an infinitely-long experiment) when the thing we want to know about is a binomial distribution. Binomial distributions are what you get from doing an experiment in which each trial consists of looking at several — in your case, exactly 37— yes-no/win-lose binary choices. If you Google "Binomial Distribution Visualization" you'll find some great interactive tools to play around with. I would also highly recommend Crash Course Statistics on YouTube (episode #15 is on Binomial Distributions).

If such a shortcut sounds daunting, remember that this is exactly the same kind of mathematical leap that allowed you to write "I know Mean is (n * p)", and to figure out that the expected wins in your 37-spin example would be 18, without going to a casino infinitely many times to count the number of wins in a set of 37 spins.

To illustrate with a simpler example, say we define one "trial" to be flipping a fair coin 2 times and counting the total number of heads that come up. In this case, n=2, and p=0.5 because there's a 50% chance of "success" (heads) on each flip. Imagine we do one of these trials every day for a month - some days getting 0 heads, some days getting 1, some days 2. If we tally up the data from all 30 days in a bar graph which ranges from "0 heads" to "2 heads" on the horizontal axis, and "0%" of the days" to "100% of the days" on the vertical axis, the graph will look like a binomial distribution (specifically, the distribution B(2, 0.5), if you're curious about notation). That is, it will look roughly like a bell curve, since the bars for 0 and 2 will likely be shorter than the bar in the middle for 1; as we know, there's only a 25% chance of getting zero or two heads, while there's a 50% chance of getting exactly one heads in 2 flips. But since we only did 30 trials, it won't match those percentages exactly; maybe two heads came up 9 times (30% of the days), and zero heads only 3 times (10% of the days).

Now, suppose a friend asks me, "What's the expected number of heads for one of these trials?". What they are really asking is "If I did an infinite number of these 2-flip trials instead of just 30, what would be the average number of heads per trial (μ)?". If I didn't know already that the true odds of the coin were p=0.5, I could try to come up with a prediction by calculating the mean of our bar graph (0 * 3/30 + 1 * 18/30 + 2 * 9/30 = 36/30 = 1.2) and using that (which we call the sample mean, ) as my guess for μ. But if we know the coin is fair, it would be silly to use our bar graph, because we already have enough information to draw the graph for an ideal, infinite-day experiment; the bars for 0 heads and 2 heads would be at exactly 25%, and the bar for 1 head in the middle would be at exactly 50%, giving a mean (a.k.a expected value, a.k.a μ) of 0 * .25 + 1 * 0.5 + 2 * 0.25 = 1. It turns out that the mean we'll get from such an idealized experiment will always be n * p, so we can also arrive at the same answer without even figuring out what the height of each idealized bar would be. That's what the formula "μ = n * p" really is: a shortcut that tells us the mean of an infinitely-long experiment without actually having to do it.

(Note: this doesn't imply that getting = 1.2 for our graph was incorrect; 1.2 is, in fact, precisely the sample mean of our distribution, and if the friend had asked "What was the average number of heads per trial in your experiment?" then 1.2 would be the exactly correct answer. It's just that in statistics, we almost never care about the mean or standard deviation of an experiment that already happened for its own sake. When we ask "what was the average earnings per share last quarter?" it's not because we're curious about the lives of last quarter's shareholders; it's because we want to know the "true" earnings per share - the number we'd get if we held on to this stock in these conditions for an infinite amount of time - and we have a hunch that last quarter's numbers can give us a clue.)

Now, to answer your question...
Quote: danepeterson

So how can you have a number for Variance without any spins recorded?

... suppose our friend now asks "What's the variance for this type of 2-flip trial?" For our 30-trial experiment it happens to be about 0.37, but they probably don't care; what they're really asking for is the "true" variance, i.e. "If I did an infinite number of these trials, and each time noted the difference (a.k.a deviation) between that trial's outcome and the expected value, what's the average of the squares of all those deviations?" Since they're asking a question about the idealized distribution, and we know exactly what that ideal bar graph looks like, it would be silly to use our very unideal experimental graph to answer it. Instead, we forget about 1.2 and use the idealized "true" mean of μ=1 to calculate the squared deviations for each of our three possible outcomes: for 0 heads, (0-1)2 = 1; for 1 head, (1-1)2 = 0; and for 2 heads, (2-1)2 = 1. Then we average those deviations based on our ideal 25%-50%-25% distribution to get that the variance is 1 * 0.25 + 0 * 0.5 + 1 * 0.25 = 0.5. Notice that, again, this agrees with the formula you gave: n * p * q = 2 * 0.5 * 0.5 = 0.5. That "shortcut" saves us a lot of time when n is much bigger than 2.

So, when you said about your 37-spin trial example...
Quote: danepeterson

Obviously the sample is too small to have it fall within Normal Distribution

... hopefully you can see now that that doesn't really make sense; the "distribution" we're talking about when we use the binomial formulas you introduced doesn't come from plotting 37 separate spins as 37 separate data points, but from repeatedly observing 37 spins at a time, and recording the total number of wins from each 37-spin observation as one data point. The resulting distribution will look very normal if we repeat this observation many times, with a nice peak around the number 18, and look ugly if we only do it a few. The number 37 being relatively small has nothing to do with how smooth the graph is or is not.

As for what the variance actually means, I'm afraid, as KevinAA showed, it doesn't really have a satisfying intuitive meaning in the same way that standard deviation can be described as "the average distance from a data point to the mean." In fact, standard deviation is so much more tenable that variance doesn't even get its own Greek letter (we just represent it as σ2).

Just think of variance as an intermediate step on the way to the standard deviation. If you're really curious why it's around at all: when coming up with this stuff, mathematicians needed a way to make all the deviations positive (in our coin example, 0-1 is -1 and 2-1 is +1, but since they are equally far from the mean of 1 they should contribute identically to our calculation of how spread out the data is). They basically had two options — use the absolute value or square it — and they choose squaring because it cooperates better with calculus (the absolute value function is pointy and so not differentiable everywhere).

If you're interested in things that are harder to calculate than an ideal unbiased wheel (like the variance for a wheel with one or more biased pockets), feel free to PM me; I've built software which answers those kinds of questions.
Last edited by: Ajaxx on Sep 30, 2019
"Not only [does] God play dice... he sometimes confuses us by throwing them where they can't be seen." ~ Stephen Hawking
danepeterson
danepeterson
  • Threads: 11
  • Posts: 43
Joined: Apr 23, 2012
September 30th, 2019 at 5:38:38 PM permalink
Thank you for the response KevinAA. I really appreciate you taking the time to help.
Dane Peterson
danepeterson
danepeterson
  • Threads: 11
  • Posts: 43
Joined: Apr 23, 2012
Thanked by
Ajaxx
September 30th, 2019 at 5:40:12 PM permalink
Thank you for the detailed response Ajaxx. You were very thorough and easy to understand. I appreciate the step by step help. It makes perfect sense now. I've been teaching myself statistics and I just couldn't find a good explanation online. I got hung up on something so small like this and it's crazy how simple it turns out to be once I fully understand. I will definitely check out that youtube video Crash Course Statistics ep. 15. Thank you again!
Dane Peterson
  • Jump to: