beachbumbabs Joined: May 21, 2013
• Posts: 14232
June 29th, 2018 at 4:50:50 PM permalink
Quote: Gabes22

The only problem I see in the McDonald's problem is don't they "release" the different toys at different dates. i.e. releasing toy A on June 1st and toy B on June 8th and so on and so forth, They might do it in waves to like release A, B and C on one date then D, E and F on another and so on and so forth. Knowing that information may drastically change the odds.

I have seen them do that, yes.
If the House lost every hand, they wouldn't deal the game.
mustangsally Joined: Mar 29, 2011
• Posts: 2463
July 1st, 2018 at 9:51:07 AM permalink
Quote: Ace2

Has anyone ever worked the coupon collector problem, but with 2 hits required? So, for instance, what is the expected number of roulette spins for every number to have appeared at least twice ?

the simple answer is, of course, just less than 2x that of collecting at least 1 each...

I have some help with this question now,
as the pdf I was using had the transition matrix populated correctly
but how to do it was by far not correct. (the main diagonal is easy using a helper column, the others are elusive right now)

I found a formula for this
in this pdf on page 14 that looks good
"By termwise integration it is easy to obtain the following exact
formula, which is a finite sum, for T2,
the average number of trials needed to collect
at least two of each of the d kinds of coupons
:"

http://www.brynmawr.edu/math/people/anmyers/PAPERS/SIGEST_Coupons.pdf

but do not understand where and what m and j is (I think m=2 and j=number of states) in the formula they arrive at
(the results from the formula are correct so far)

but I never took calculus and the calculus to inclusion exclusion conversion
lost me
at this point in time

Sally
Last edited by: mustangsally on Jul 1, 2018
I Heart Vi Hart
ThatDonGuy Joined: Jun 22, 2011
• Posts: 4838
July 1st, 2018 at 12:56:34 PM permalink
Quote: mustangsally

I found a formula for this in this pdf on page 14 that looks good
"By termwise integration it is easy to obtain the following exact formula, which is a finite sum, for T2, the average number of trials needed to collect
at least two of each of the d kinds of coupons
:"

http://www.brynmawr.edu/math/people/anmyers/PAPERS/SIGEST_Coupons.pdf

but do not understand where and what m and j is (I think m=2 and j=number of states) in the formula they arrive at
(the results from the formula are correct so far)

I think m and j under the sigma sign usually means "for all nonnegative integers m and j," but in this case, it appears to be 0 <= j <= m <= d - 1, as otherwise (d-1)C(m) and (m)C(j) are undefined.
However, while I get results of 2, 11/2, 347/36, and 11259/864, I get them for d = 1 through 4 instead of 2 through 5, which makes sense, as if d = 1, it will always take 2 draws to get "each" coupon twice, while for d = 2, getting all of the coupons in a total of 2 draws happens only half of the time.
mustangsally Joined: Mar 29, 2011
• Posts: 2463
July 1st, 2018 at 8:28:17 PM permalink
Quote: ThatDonGuy

I think m and j under the sigma sign usually means "for all nonnegative integers m and j," but in this case, it appears to be 0 <= j <= m <= d - 1, as otherwise (d-1)C(m) and (m)C(j) are undefined.

ok, I have time now to look at this again.
nice observation
Quote: ThatDonGuy

However, while I get results of 2, 11/2, 347/36, and 11259/864,

what values did you use for m and j?
Quote: ThatDonGuy

I get them for d = 1 through 4 instead of 2 through 5, <snip>

I saw that too. I think that is a typo as the rows of results below that shows the correct values 1:4

For the 38 numbers on a 00 Roulette wheel,
Excel says (once I was shown how to populate the transition matrix fast)
mean spins = 234.8326629 for 'at least 2' of each number
that is just over a 40% increase for the 'at least 1' value.
(time for a fast simulation)

now, to see if I can too get the formula to work (from the pdf I linked to)
The Markov chain method may take some time to get into R or even Pari GP
I want the distribution (as always)

Sally
I Heart Vi Hart
Ace2 Joined: Oct 2, 2017
• Posts: 973
July 1st, 2018 at 9:26:42 PM permalink
Can someone run the number for 1000 coupons, getting at least 2 of each ? I�d be interested to know if there�s a limit. Based on what I�ve seen the limit could be the natural log of 2. So if the average is about 7,485 tries to get all of them once, it would be that divided by 0.693 = 10,800 to get all of them twice. Just a guess.
It�s all about making that GTA
mustangsally Joined: Mar 29, 2011
• Posts: 2463
July 2nd, 2018 at 7:16:29 AM permalink
Quote: Ace2

Based on what I�ve seen the limit could be the natural log of 2. So if the average is about 7,485 tries to get all of them once, it would be that divided by 0.693 = 10,800 to get all of them twice. Just a guess.

Here is a small table of data so far
ABCDEF
couponsat least 1at least 2B/CC-BE/B
1120.50011
235.50.5452.50.833333333
35.59.63890.5714.13890.752527273
48.333314.1890.5875.85570.702686811
511.41719.0410.6007.6240.667776123
614.724.1340.6099.4340.641768707
718.1529.4250.61711.2750.621212121
821.74334.8850.62313.1420.604424412
925.4640.4920.62915.0320.590416339
1029.2946.230.63416.940.578354387
38160.66234.8330.68474.1730.461676833

Sally
I Heart Vi Hart
mustangsally Joined: Mar 29, 2011
• Posts: 2463
July 4th, 2018 at 8:59:17 AM permalink
Quote: mustangsally

Here is a small table of data so far

extended the data as I finished the R code for the Markov chain solution (my computer is so small and old as 8GB ram is all)
couponsat least 1at least 2B/CC-BE/B
70338.299481.71050.702143.41150.423919373
80397.238562.72220.706165.48420.416587033
90457.431645.14010.709187.70910.410355004
100518.7378728.80520.712210.06740.404958729
110581.046813.59070.714232.54470.400217367
120644.264899.3930.716255.1290.396000708
125676.1905942.6480.717266.45750.394056852
130708.317986.12610.718277.80910.392210126
140773.141073.7170.720300.5770.388774349
2001175.6061614.3130.728438.3940.372908951
3001884.7992557.8840.737673.0850.35711235
4002627.9723538.970.7439110.346654378

to see what the 'at least 1' and 'at least 2' look like together
Oh, they look to be going apart ok,
math is done 4 today
Happy 4th of July 2018 USA!

Sally
Last edited by: mustangsally on Jul 4, 2018
I Heart Vi Hart
ThatDonGuy Joined: Jun 22, 2011
• Posts: 4838
July 4th, 2018 at 10:37:49 AM permalink
Quote: mustangsally

what values did you use for m and j?

For d = 3, I used the (m,j) pairs (0,0), (0,1), (0,2), (1,1), (1,2), and (2,2).
For d = 4, I used the (m,j) pairs (0,0), (0,1), (0,2), (1,1), (1,2), (2,2), (0,3), (1,3), (2,3), and (3,3).

In other words, replace the Sigma with "m,j" at the bottom with a Sigma with "m=0" at the bottom and "d-1" at the top, followed by another Sigma with "j=0" at the bottom and "m" at the top.
mustangsally Joined: Mar 29, 2011
• Posts: 2463
July 5th, 2018 at 8:12:56 AM permalink
Quote: ThatDonGuy

In other words, replace the Sigma with "m,j" at the bottom with a Sigma with "m=0" at the bottom and "d-1" at the top, followed by another Sigma with "j=0" at the bottom and "m" at the top.

thank you.
I found another pdf that shows what m and j are
problem I see is this
(j + 2)!
in the formula

Excel does 170! and below but requires the gamma() for higher factorials
R takes too long over that (and precision suffers)
I used a recursion method in R to make this
couponsat least 1at least 2at least 3at least 4at least 5
112345
235.57.87510.187512.46094
35.59.638913.4868817.1920220.80842
48.333314.18919.5659124.7102329.71024
511.41719.04125.9868432.6027939.01481
614.724.1338732.6771640.7894348.63511
718.1529.42539.5895449.2179658.51471
821.74334.88546.6906857.8517568.61409
925.4640.49253.9558966.663778.90403
1029.2946.2361.3661475.6329689.36209
1133.2186552.0842568.906384.7429999.97046
1237.2385358.0450376.564193.98035110.7147
1341.3417464.1028484.32933103.3339121.5829
1445.5218770.2500392.19342112.7944132.5651
1549.7734376.48005100.149122.3537143.6525
1654.0916682.78725108.1899132.005154.838
1758.4723989.1667116.3104141.7422166.115
1862.9119595.61405124.5058151.5602177.4778
1967.40705102.1254132.7717161.4543188.9216
2071.955108.6974141.1043171.4202200.4418
2176.55253115.3269149.5002181.4542.
2281.19789122.0111157.9563191.553.
2385.8887128.7475166.4696201.7135.
2490.623135.5337175.0378211.9329.
2595.39895142.3677183.6583222.2086.
26100.2149149.2474192.329232.5384.
27105.0693156.1711201.0481242.92.
28109.9608163.1372209.8135253.3515.
29114.888170.144218.6236263.831.
30119.85177.1902227.4768274.3569.
31124.8446184.2745236.3717284.9274.
32129.8718191.3955245.3068295.5411.
33134.9303198.552254.2809..
34140.0191205.7431263.2927..
35145.1373212.9677272.3412..
36150.2841220.2247281.4251..
37155.4587227.5133290.5436..
38160.6603234.833299.6956..
39165.8882242.1819308.8803..
40171.142249.5602318.0967..

this looks good for now
Sally
Last edited by: mustangsally on Jul 5, 2018
I Heart Vi Hart
ThatDonGuy Joined: Jun 22, 2011
• Posts: 4838
Thanks for this post from: July 5th, 2018 at 4:23:09 PM permalink
Quote: Ace2

Can someone run the number for 1000 coupons, getting at least 2 of each ? I�d be interested to know if there�s a limit. Based on what I�ve seen the limit could be the natural log of 2. So if the average is about 7,485 tries to get all of them once, it would be that divided by 0.693 = 10,800 to get all of them twice. Just a guess.

I can't use the formula for anything above about 50, as there's either an overflow or an underflow (i.e. the number is so small that it is treated as zero) that's throwing the results off.
For those of you interested in details, it's something like this: for 51 coupons, with M = 50 and J = 25, the value involved in the sum is on the order of 1018. However, 64-bit floating point numbers as implemented by Microsoft are only accurate to about 15 digits, so it's going to be off by up to 1000, which is not good when the final sum (over all J from 0 to 50) is supposed to be around 400.

However, when I run a simulation on various numbers of coupons up to 1000, and divide the simulated average number of draws needed to get two of each by the expected number needed to get one of each, it approaches something like 1.319.
EDIT: At 5000, it's more like 1.278.
As a matter of fact, I wouldn't be surprised if the limit of the ratio is 1. If you have, say, Googolplex (that's 1 followed by googol zeroes) different coupons, there's a very, very, very good chance you will be very close to having chosen all of the coupons at least twice by the time you choose every coupon once.
Last edited by: ThatDonGuy on Jul 5, 2018