I'm new to the forum, but have followed it for years. I have a question that I know someone here can help me with.
I was helping my son with a science project on probability, and he wanted to know what the chances a major league hitter who, on average gets a hit one of every 3 times (i.e. .333 lifetime average) hits .400 over a "season" of 500 at bats.
I couldn't figure the math easily, so we decided to do it by trial and error. I wrote a quick BASIC program using a TRS-80 emulator, and ran 1000 trials (or seasons) at a time, each with 500 theoretical at bats. I programmed a random number selection between 0 and 999, and scored a "hit" if the number selected was 333 or below, and an out otherwise.
The first 25 times I ran 1000 trials (i.e. 25000) trials, I generally found the highest average of the 1000 trials was anywhere from .370 to a high of .394. The lowest was .260 at the low end to .286 at the high end. This seemed about right to me, as I had run 25,000 seasons and still not got a .400 season.
Then, on the 26th 1000 trial series, I got a low of .274, but a high of .500 even! That blew me away, and it was so surprising that it seemed like it had to be in error. Obviously, there is some theoretical chance that the .333 hitter would hit .500 over 500 "at bats", but the odds seemed astronomical. I ran another 10 trials of 1000, i.e. 10,000 "seasons" and never got a .400.
So, I am wondering, can anyone tell me what the odd are of my computer .333 hitter hitting .500 over 500 at bats? How about .400?
Thanks in advance, and I look forward to seeing the responses.
Quote: udflyer87All -
I'm new to the forum, but have followed it for years. I have a question that I know someone here can help me with.
I was helping my son with a science project on probability, and he wanted to know what the chances a major league hitter who, on average gets a hit one of every 3 times (i.e. .333 lifetime average) hits .400 over a "season" of 500 at bats.
I couldn't figure the math easily, so we decided to do it by trial and error. I wrote a quick BASIC program using a TRS-80 emulator, and ran 1000 trials (or seasons) at a time, each with 500 theoretical at bats. I programmed a random number selection between 0 and 999, and scored a "hit" if the number selected was 333 or below, and an out otherwise.
The first 25 times I ran 1000 trials (i.e. 25000) trials, I generally found the highest average of the 1000 trials was anywhere from .370 to a high of .394. The lowest was .260 at the low end to .286 at the high end. This seemed about right to me, as I had run 25,000 seasons and still not got a .400 season.
Then, on the 26th 1000 trial series, I got a low of .274, but a high of .500 even! That blew me away, and it was so surprising that it seemed like it had to be in error. Obviously, there is some theoretical chance that the .333 hitter would hit .500 over 500 "at bats", but the odds seemed astronomical. I ran another 10 trials of 1000, i.e. 10,000 "seasons" and never got a .400.
So, I am wondering, can anyone tell me what the odd are of my computer .333 hitter hitting .500 over 500 at bats? How about .400?
Thanks in advance, and I look forward to seeing the responses.
About 0.1% for hitting .400 or more over 500 at-bats.
This is a binomial distribution. It's easy to calculate the probability of getting an exact number of hits. But then you need to add them all up... ie, the probability of getting 200 hits + the probability of getting 201 hits +...
There is a normal approximation of the binomial that can be used, which is more or less accurate (though not exact). I calculated the 0.1% probability with this calculator: http://stattrek.com/online-calculator/binomial.aspx . It gives exact numbers when the number of trials is under 1000 (as it is here) and uses the normal approximation for more than 1000 trials.
I would suggest that you and your son look at the wikipedia page for binomial distribution.
ZCore13
A batter does not generally have the same batting abilities over their entire career. It will generally peak very quickly, stay constant for a while, and then gradually diminish as their skills physically deteriorate.
Tony Gwynn was a lifetime .338 hitter for the 20 years he played. But clearly he was at his peak for only about 10 of those years and maybe was closer to a 0.350 hitter during this time. And in fact he very nearly batted 0.400 with 0.394 in 1994.
Another consideration is that to qualify for a batting title a player only needs to have 502 plate appearances. This means that the number of bats could be a lot less than 500 making it more probable that a batter could qualify as a 0.400 hitter.
=BINOMDIST(200,500,1/3,TRUE) to get .9992307 or about 0.077%.
I'll put in a pretty graph for you. The .5 line is the most likely scenario, at 167 hits. Read the graph as "the probability of geting n number of hits or less".
Quote: udflyer87I was helping my son with a science project on probability, and he wanted to know what the chances a major league hitter who, on average gets a hit one of every 3 times (i.e. .333 lifetime average) hits .400 over a "season" of 500 at bats.
So you are looking for 200 hits on 500 tries, where each hit as a probability of p = 1/3. The exact answer to that question is a binomial distribution, in your case of 200 hits it is C(500, 200) * p^200 * (1-p)^(500-200).
But that only gives you the (exact) probability of 200 hits, but nothing more. Calculating the mean number of hits, it's variance (and other stuff) is complicated.
For a science project, a much more relevant picture should be extracted. Since 500 is much larger than 1/p = 3, the resulting binomial distribution is well approximated by a Poisson distribution (which becomes exact in the limit of infinite number of tries - and 500 can be viewed almost "infinite" if compared to 3).
Anyway, the resulting Poisson distribution has a mean of 500*p (which is pretty obvious). Nothing seems won - but the variance of a Poission distribution is the same as it's mean value, so the variance itself is also 500/3, which yields a standard deviation of sqrt(500/3). As the Poisson distribution (depending on only a single parameter, it's mean) is much more simpler than a binomial distribution (depending on 2 parameters, number of tries and the hit probability), you can do a lot more stuff with it.