Thread Rating:

Wonko33
Wonko33
  • Threads: 15
  • Posts: 122
Joined: Aug 29, 2015
January 24th, 2017 at 8:59:27 AM permalink
Hi

I am trying to figure out a problem similar to this one:

"If there are 5 red marbles in a bag containing 300. On average how many pulls do I need to get all 5 (or 4 or 3 etc)"

I haven't done serious prob calculations in a while and I am not sure how to phrase this when I use search engines.

I don't need you guys to solve it for me, just a push in the right direction or a link to a page where they explain this type of problems would be great,

Thanks for your time.
So Wizard, still no basic strategy for strip poker huh?
Romes
Romes
  • Threads: 29
  • Posts: 5602
Joined: Jul 22, 2014
January 24th, 2017 at 9:20:34 AM permalink
Just look at your odds on the individual draws...

Draw 1: Odds of getting a red marble = 5/300 = 1/60 = .0167

Draw 2 (given draw 1 red marbel): Odds of getting red marble = 4/299 = .0134

etc.

A small 'cheat' or way to think about probabilities... if you can use the word "OR" to describe the events, it's additive of the individual odds... if you use the word "AND" then it's multiplicative.

ex.
If you want to know what are the odds of drawing a red marble on the first draw AND on the second draw... then it's multiplicative. So the odds of getting 2 red on the first 2 draws would be .0167 * .0134 = .0002238...

What are the odds of getting a red marble on the first draw OR the second draw (assuming you don't draw a red marble on the first)... it's additive... 5/300 + 5/299 = .0334.
Playing it correctly means you've already won.
ThatDonGuy
ThatDonGuy
  • Threads: 117
  • Posts: 6286
Joined: Jun 22, 2011
Thanked by
ChesterDog
January 24th, 2017 at 9:31:45 AM permalink

The average number of pulls = 5 x (the probability of getting all 5 in the first 5 pulls) + 6 x (the probability of needing 6 pulls) + 7 x (the probability of needing 7 pulls) + ... + 300 x (the probability of needing to pull all 300 marbles out).

The probability of needing, say, 8 pulls = the probability of having 4 red and 3 white in the first 7 pulls (if you have all 5 red in the first 7, then you need fewer than 8 pulls) x the probability that the 8th pull is the remaining red marble. Of the (300)C(7) ways to select 7 marbles, there are (5)C(4) = 5 ways of having 4 red marbles and (295)C(2) = 43,365 of having 2 of the 295 white marbles; for the eighth marble, there are 293 marbles remaining, only one of which is red.

ChesterDog
ChesterDog
  • Threads: 8
  • Posts: 1511
Joined: Jul 26, 2010
Thanked by
mamatWonko33
January 24th, 2017 at 10:05:48 AM permalink
Quote: Wonko33

... I am not sure how to phrase this when I use search engines...



Try searching for "coupon problem."
mamat
mamat
  • Threads: 3
  • Posts: 494
Joined: Jul 13, 2015
Thanked by
Wonko33ChesterDog
January 24th, 2017 at 10:48:24 AM permalink
Assume all marbles have equal probability (no clumpiness in the original marbles), every draw is the same, no draw depends on any other draw, and no “restarting” (e.g. someone sees a partial result, causes an error, and drawing is restarted).
- Mathematicians call this “i.i.d”

------------------------------------------------------------------------------------------------------------------
“Unordered sampling without replacement” is the mathematician's name for this general area.
...but googling "keno math" will give you better webpages for practical calculations.
------------------------------------------------------------------------------------------------------------------

5 red marbles in bag of 300. How many draws do I need for >50% chance to have at least X red marbles?

Quick guess:
50% chance of 5 marbles, Look for 4.5 marbles based on 1/60 red marbles
5 guess: 270 (guess 4.5 marbles)
4 guess: 210 (3.5)
3 guess: 150 (2.5)
2 guess: 90 (1.5)
1 guess: 30 (0.5)

Brute-force (ala Romes): 3 is midpoint (150). 1-5 & 2-4 are mirror-image answers.
Calculating 1 and 5 are easier. 2 & 4 are harder.
5: 262 = 50.56% Also combin(262,5)/combin(300,5) or p(0 red balls in 38 draws).
4: 206 = 50.01%
3: 150 = 50.00%
2: 95 = 50.67%
1: 39 = 50.41%

https://en.wikipedia.org/wiki/Coupon_collector's_problem
"Coupon collector's problem" is a related problem (Thanks ChesterDog) WITH replacement.
Your problem is WITHOUT replacement.

These days I'm often too lazy to think of elegant solutions. I didn't see anything obvious in google, so I brute-forced Excel.

-----------------------------------------------------------------------------------------------
The technical way (making a "closed-form expression", e.g. a simple formula):
-----------------------------------------------------------------------------------------------
(number of ways to have at least X red balls when drawing Y balls out of Z balls) / (number of ways to draw Y balls out of Z balls)
e.g.
(number of ways to have at least 3 red balls when drawing 220 balls out of 300 balls) / (number of ways to draw 220 balls out of 300 balls)

The denominator is combin(Z,Y) ... or combin(300,220), but I can't recall a simple expression for the numerator.

Usually n(ways to have at least 3 red balls) = n(ways with 3 red balls) + n(ways with 4 red balls) + n(ways with 5 red balls). And then you have to figure out when the total crosses 50%. Pain.

Easier for me to brute-force a big table with all probabilities for 0-5 red balls in 1-300 draws...and just scan for 50% crossing-points.

-----
P(5 red balls in 262 draws) = P(0 red balls in 38 draws) = 295/300*...*258/263 = 295…258/300…263 = (295!/257!) / (300!/262!) = (262!/257!) / (300!/297!) = 262*261*260*259*258/300*299*298*297*296 = 50.56%

combin(n,k) = "n choose k" = n!/(n-k)!k! ... so in this notation, P(5 red balls in 262 draws) = combin(262,5)/combin(300,5) = 50.56%

-----
P.S. There are closed-form expressions for keno math, but in practice, I just brute-force an Excel spreadsheet & I can muck it up however I want. Easier for me to spot errors because I can do all kinds of cross-checks on Excel.

In technical language, it's similar to an Markov Chain, but transition probabilities change slightly each time a ball is collected.
You could make a "chain" with 6 states (0-5 red balls collected) and a "remaining balls" counter.
Instead, I create a massive unrolled Markov Chain with 301x6 - 30 = 1,776 states.
https://en.wikipedia.org/wiki/Markov_chain
Last edited by: mamat on Jan 24, 2017
ChesterDog
ChesterDog
  • Threads: 8
  • Posts: 1511
Joined: Jul 26, 2010
Thanked by
Wonko33
January 24th, 2017 at 12:54:49 PM permalink
Thanks to Mamat for pointing out that the coupon collector's problem assumes replacement of each ball after each pull.

I like ThatDonGuy's hint. Using his method, I find the expected number of pulls to get the 5 red marbles from the bag of 300 marbles without replacement is:
1505/6 = 250.833... But I also agree with Mamat that the probability of getting all 5 red balls within the first 262 spins is 50.56%.
Last edited by: ChesterDog on Jan 24, 2017
mamat
mamat
  • Threads: 3
  • Posts: 494
Joined: Jul 13, 2015
January 24th, 2017 at 4:51:28 PM permalink
Quote: ChesterDog

Thanks to Mamat for pointing out that the coupon collector's problem assumes replacement of each ball after each pull.

I like ThatDonGuy's hint. Using his method, I find the expected number of pulls to get the 5 red marbles from the bag of 300 marbles without replacement is:

The difference between these two approaches is similar to "mean" vs. "median" (or what kind of estimator do you use...to make an "average" or "expected number").

(1) How many draws do I need for >50% chance to have at least X red marbles? Sum(F(1)toF(x))>.5, where x is # of draws
(2) What is the expected value of the number of draws required to get the Xth red marble? E[xF(x)]

The OP's question was: "If there are 5 red marbles in a bag containing 300. On average how many pulls do I need to get all 5 (or 4 or 3 etc)".

Both interpretations are valid. It's an open question what the OP intended.





-----
There are similar confusions with "confidence intervals" and "credible intervals"; e.g. 100 +/- 20 ... or (80, 120) @66%.
Most scientists and non-statisticians are probably treating Frequentist "confidence intervals" as Bayesian "credible intervals".

Probability can get really tricky, and the terminology is not what non-statisticians expect.

"fair" and "unbiased" are NOT the same thing to statisticians. And neither one means 50/50.

There are "likelihoods" (which don't have to add up to 1) and "probabilities" (which sum to 1).

Some estimators are called "maximum likelihood estimators". But "maximum likelihood" might not be "unbiased".

---------------------------------------------------------------------------------------------------------------------------------
Trivia question from 1st year probability/statistics (Try to answer this before reading the wikipedia pages):
---------------------------------------------------------------------------------------------------------------------------------

"Standard deviation" is a popular measure of "statistical dispersion" (or how much a distribution spreads out) from the pre-1960s pre-computer pre-Excel era of statistics which many "old fogies" like me learned in high school or college. It's often easy to calculate with a slide rule, calculator, abacus, or simply by hand on paper.

In the formula, s = sqrt (sum of squares / X), when & why would you use a denominator of
(a) n
(b) n-1
(c) n+1
(d) n-1.5
(e) even more complicated stuff

https://en.wikipedia.org/wiki/Standard_deviation
https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
https://en.wikipedia.org/wiki/Bessel%27s_correction

P.S. If you studied applied statistics in a non-math science class, you may never have been given this exercise. It's a "math wonk" thing...




-------
The main issue is what kind of estimator do you want? Biased or unbiased? Sample or population? etc...
There's a TON of "standard deviations" and "standard deviation estimators"...so which one is someone using? And why?
And what about "confidence/credible intervals" on "standard deviations"? Just to add more layers...to this Russian doll. :-)

Lots of really obscure distinctions which lay-people generally don't think about....until a presidential poll goes wrong (Then everyone wants to know why...)!

-----
For purposes of this gambling discussion board, you might think about what do "quoted SDs" actually mean to you if you spend X amount of time, playing a game with X EV & Y SD (or Z variance).

A lot of what people have learned about SDs are simplified facts based on "bell curves" (or univariate normal distributions).
People often equate 1 SD = 68%, 2 SD = 95%, 3 SD = 99.7%, but this is a drastic simplification.
They come from the "cumulative distribution function of the univariate normal distribution"...and are not true for all distributions.
https://en.wikipedia.org/wiki/68–95–99.7_rule

In real life, we play games for a finite time, which are based on non-random PRNGs (Pseudo-random-number generators), and we make mistakes compared to idealized strategies.

We are often curious about the "statistical dispersion" of our real-world gambling sessions.
The "SD" of the those sessions is NOT the same as an idealized SD of infinite time, true i.i.d. (independent identically distributed), and perfect play.

Often the RoR is more useful than an SD.
What bankroll do I need to have a less than <1% chance of going broke (with 95% confidence)?
Or what chance do I have of making $X,000 starting with $Y,000 in Z hours (with A% confidence)?

https://en.wikipedia.org/wiki/Statistical_dispersion
Last edited by: mamat on Jan 24, 2017
ChesterDog
ChesterDog
  • Threads: 8
  • Posts: 1511
Joined: Jul 26, 2010
Thanked by
Wonko33
January 25th, 2017 at 12:47:09 AM permalink
Using ThatDonGuy's hint, a surprisingly easy formula for the expected number of pulls results:

Suppose a bag has R red balls and B black balls. Then the expected number of pulls needed, N, to get all R of the red balls is: N = R * ( R + B + 1 ) / ( R + 1 ).

Proof left to the reader. (lol)
ThatDonGuy
ThatDonGuy
  • Threads: 117
  • Posts: 6286
Joined: Jun 22, 2011
January 25th, 2017 at 9:31:23 AM permalink
Quote: ChesterDog

Using ThatDonGuy's hint, a surprisingly easy formula for the expected number of pulls results:

Proof left to the reader. (lol)



The expected number of pulls =
R x (the probability of getting all R in the first R pulls)
+ (R + 1) x (the probability of needing R + 1 pulls)
+ (R + 2) x (the probability of needing R + 2 pulls)
+ ...
+ (R + B) x (the probability of needing to pull all R + B balls)

There are (R+B)C(N-1) combinations of N-1 balls
Of these, R-1 of them must be red, and N-R must be black
There are (R)C(R-1) = R combinations of R-1 red balls out of R, and (B)C(N-R) combinations of N-R black balls out of B
The probability that the first N-1 balls have R-1 red balls and N-R black balls is:
R (B)C(N-R) / (R+B)C(N-1)
This leaves one red ball and B-(N-R) = R+B-N black balls
The probability that ball N is red is 1 / (R+B-N+1)

R * (B)C(N-R) / ((R+B)C(N-1) * (R+B-N+1)) = R B! (N-1)! / ((N-R)! (R+B)!)
The expected number = the sum of (N * the probability of doing it in exactly N balls) for each N from R to R + B
This will be proven to equal R (R + B + 1) / (R + 1) by induction for all positive integers B for any particular R

For B = 1, N can be R or R + 1:
R / (R+1)! * (R! / 0! + (R+1)! / 1!)
= R * R! / (R+1)! + R * (R+1)! / (R+1)!
= R / (R+1) + R
= R(1 + (R+1)) / (R+1)
= R (R+2) / (R+1)

Assume it is true for some positive integer B:
X = R / (R + 1) * (R + B + 1)

We are assuming that B! R / (R+B)! * (R! + (R+1)! + (R+2)! / 2! + (R+3)! / 3! + ... + (R+B)! / B!) = R / (R+1) * (R+B+1)
(B+1) / (R+B+1) * B! R / (R+B)! * (R! + (R+1)! + (R+2)! / 2! + (R+3)! / 3! + ... + (R+B)! / B!) = R / (R+1) * (R+B+1) * (B+1) / (R+B+1)
(B+1)! R / (R+B+1)! * (R! + (R+1)! + (R+2)! / 2! + (R+3)! / 3! + ... + (R+B)! / B!) = R / (R+1) * (B+1)

(B+1)! R / (R+B+1)! * (R! + (R+1)! + (R+2)! / 2! + (R+3)! / 3! + ... + (R+B)! / B!) + (B+1)! R / (R+B+1)! * (R+B+1)! / (B+1)!
= R / (R+1) * (B+1) + (B+1)! R / (R+B+1)! * (R+B+1)! / (B+1)!
= R / (R+1) * (B+1) + R
= R / (R+1) * (B+1) + R (R+1) / (B+1)
= R (R+1+B+1) / (R+1)
= R (R + B + 2) / (R + 1) QED

It is true for B = 1 and it is true for B = N + 1 if it is true for B = N; therefore, it is true for all positive integers B

mustangsally
mustangsally
  • Threads: 25
  • Posts: 2463
Joined: Mar 29, 2011
Thanked by
Wonko33
January 25th, 2017 at 2:10:01 PM permalink
Quote: Wonko33

Hi
I am trying to figure out a problem similar to this one:

so you have a different problem. hope this all helps.
fun question. I play a game with a deck of cards kind of like this. Just a few red cards shuffled in.
Quote: Wonko33

I don't need you guys to solve it for me, just a push in the right direction or a link to a page where they explain this type of problems would be great,

Thanks for your time.

I like to set up a simple recursion sheet (transition matrix) in Excel or Google Sheets. Just add and multiply and subtract.
that way one can SEE the probs to the paths to all the different states the game (experiment) could be in after each draw.

Here is my link to me Excel in Google Sheets
https://goo.gl/f7pNZc

I think it is easy to understand and follow
and just a few minutes to make ;)

I too get
250.8333333
for all 5 reds

here R the others
1: 50.16666667
2: 100.3333333
3: 150.5
4: 200.6666667

looks to be a simple pattern
I did not look that up

also
agree
the median
is 262 draws at a probability of 0.505565872
the distribution for the # of Reds at 262 draws:
0: 2.56317E-05
1: 0.000987575
2: 0.014728983
3: 0.106375986
4: 0.372315952

as easily seen in the spreadsheet

there might be an easier way for the distribution to B calculated
but I love to see
what's UP

Sally
I Heart Vi Hart
  • Jump to: