thecesspit
thecesspit
Joined: Apr 19, 2010
  • Threads: 53
  • Posts: 5936
November 7th, 2010 at 10:53:06 PM permalink
I've been updating my coding skills over the last week or so, and have taken on a toy project to help myself explore Groovy scripting further. However, this question is not really limited to groovy, but a general algorithm.design pattern question.

Say I have an ordered hash map of items, each with a weighted value: ( Nothing - 50, Pair - 25, Two Pair - 10, Trips - 5).

How would you code a method that returns a random item from that list so the frequency distribution of each item would equal to the weighted value. Ideally the method must work for any ordered map. Or even any map, as I'm not sure how strong the native map ordering is in Groovy.

I've got a solution that works, but it doesn't seem to be the most efficient way of doing it.

I'm imaging there's a couple of programmers who've hit this before and sailed through it. I'm happy to share my code if it helps.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
mkl654321
mkl654321
Joined: Aug 8, 2010
  • Threads: 65
  • Posts: 3412
November 7th, 2010 at 10:58:24 PM permalink
Quote: thecesspit

I've been updating my coding skills over the last week or so, and have taken on a toy project to help myself explore Groovy scripting further. However, this question is not really limited to groovy, but a general algorithm.design pattern question.

Say I have an ordered hash map of items, each with a weighted value: ( Nothing - 50, Pair - 25, Two Pair - 10, Trips - 5).

How would you code a method that returns a random item from that list so the frequency distribution of each item would equal to the weighted value. Ideally the method must work for any ordered map. Or even any map, as I'm not sure how strong the native map ordering is in Groovy.

I've got a solution that works, but it doesn't seem to be the most efficient way of doing it.

I'm imaging there's a couple of programmers who've hit this before and sailed through it. I'm happy to share my code if it helps.



Well, twenty years ago, a programmer might have simply set up an array: nothing, nothing, nothing (x10), pair, pair (x5), two pair, two pair, trips, etc. That might still be a decent brute-force way to do it, but I'm sure there's a much more elegant algorithm available.

Do you know if video poker machines (I assume that's what you're experimenting with here) still use that same ancient C program that was running the 486 processors in 1993? I know that IGT was kind of reluctant to upgrade their programming as long as the old stuff worked, since a VP program isn't exactly taxing for the CPU.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
thecesspit
thecesspit
Joined: Apr 19, 2010
  • Threads: 53
  • Posts: 5936
November 7th, 2010 at 11:03:44 PM permalink
Quote: mkl654321

Well, twenty years ago, a programmer might have simply set up an array: nothing, nothing, nothing (x10), pair, pair (x5), two pair, two pair, trips, etc. That might still be a decent brute-force way to do it, but I'm sure there's a much more elegant algorithm available.



The weighted numbers are much larger than what I've used here, so that would rapidly get ugly. However, the hash map is small.

Quote:

Do you know if video poker machines (I assume that's what you're experimenting with here) still use that same ancient C program that was running the 486 processors in 1993? I know that IGT was kind of reluctant to upgrade their programming as long as the old stuff worked, since a VP program isn't exactly taxing for the CPU.



No idea... but I'd suspect a lot of the core code has been the same for years... writing bullet proof code is hard, and testing it to ensure it's bullet proof is why they pay me. I'd avoid paying me if I was IGT and use something that was "known good".
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
MathExtremist
MathExtremist
Joined: Aug 31, 2010
  • Threads: 88
  • Posts: 6526
November 7th, 2010 at 11:40:29 PM permalink
Quote: thecesspit

Say I have an ordered hash map of items, each with a weighted value: ( Nothing - 50, Pair - 25, Two Pair - 10, Trips - 5).



Abstractly, you're talking about the same concept as a weighted slot machine reel. You want to "spin" the array and display the chosen "symbol". The algorithm choice comes down to space vs. time. You can do it in constant-time if you pre-generate an array of size N = total weight, as mkl suggested, and then just do a random draw between 0 and N-1. But if you're looking for a more space-efficient algorithm then you can probably auto-generate a tree and do a log-n traverse somehow. How big is N, anyway?
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
Dween
Dween
Joined: Jan 24, 2010
  • Threads: 66
  • Posts: 339
November 8th, 2010 at 5:07:27 AM permalink
I have used weighted choices for a slot simulation as well. Come to think of it, hasn't the Wizard done the same things for his Atkins slot simulation? Nonetheless, my solution follows in some kind of pseudo-code. I tried to make everything use 1-index versus zero-index:

hands = new Array(50,25,10,5);          // This initializes an array with each of the weights.

possibilities = 0; // Initialize the total number of possibilities to zero
for x=1 to hands.length // Count through the number of different outcomes (in this case, 4)
possibilities += hands(x); // Add up all the weights.
next x

function pickhand {
randnum = int(rand()*possibilities)+1; // pick a random number from 1 to total possibilities
pick = 0; // Initialize pick to zero
do {
pick++; // Add one to the pick variable
randnum -= hands(pick); // subtract the weighted chance from the random number
} while (randnum >=0); // if the random number drops to zero or below, keep the pick
return pick;
}


In English:
Add up all the weights.
Pick a number between 1 and that total.
Subtract the first weight from your picked number.
- If you hit zero or below, that's your pick.
- If not, go to the next weight and repeat.

Example:
50, 25, 10, 5 are the weights for picks 1, 2, 3, 4.
Total of weights: 90.
Pick a number between 1 and 90. (example: 75)
Subtract weight of pick 1. (75 - 50 = 25)
Drop to zero? (No)
Subtract weight of pick 2 (25 - 25 = 0)
Drop to zero? (Yes)
Pick 2 is your weighted pick.

Example 2:
Pick a number between 1 and 90. (example: 88)
Subtract weight of pick 1 (88 - 50 = 38)
Subtract weight of pick 2 (38 - 25 = 13)
Subtract weight of pick 3 (13 - 10 = 3)
Subtract weight of pick 4 (3 - 5 = -2)
Drop to zero or below? (Yes)
Pick 4 is your weighted pick.
-Dween!
weaselman
weaselman
Joined: Jul 11, 2010
  • Threads: 20
  • Posts: 2349
November 8th, 2010 at 5:35:19 AM permalink
The question is what exactly you mean be "efficient". The naive and most straightforward algorithm (scan the map once to get the sum of weights, then get a random number, and scan again to pick a value) is constant memory , and linear time, which is pretty good.

If you want to trade memory for speed, there are things you can do to make selection in constant time. For example, use the "alias method", described here: http://prxq.wordpress.com/2006/04/17/the-alias-method/. It is really really elegant, runs in constant time, and requires linear memory.

Interestingly, the map being sorted does not help the efficiency (but hurts the memory footprint). I can't think of any good approach that would make use of the sort order (apart from naive scanning of the map, that could save half the scan on average, but would still be linear).
"When two people always agree one of them is unnecessary"
thecesspit
thecesspit
Joined: Apr 19, 2010
  • Threads: 53
  • Posts: 5936
November 8th, 2010 at 7:12:15 AM permalink
X is one million.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
MathExtremist
MathExtremist
Joined: Aug 31, 2010
  • Threads: 88
  • Posts: 6526
November 8th, 2010 at 8:04:24 AM permalink
Quote: thecesspit

X is one million.


Yeah, then your best bet is to use something smarter with memory than the single-array pick. You've got a few choices - both the algorithm Dween described and the tree-traversal I suggested are listed here:

http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/

More on the Alias method, including a comment section describing how it works:

http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
MathExtremist
MathExtremist
Joined: Aug 31, 2010
  • Threads: 88
  • Posts: 6526
November 8th, 2010 at 8:41:37 AM permalink
One more link, this time to a Springer book:
http://cg.scs.carleton.ca/~luc/rnbookindex.html

Chapter 3 probably has a solution for what you want. I haven't read it yet, but I'll probably take the PDFs to a printer and make myself a hard copy (the author says it's okay).
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
thecesspit
thecesspit
Joined: Apr 19, 2010
  • Threads: 53
  • Posts: 5936
November 8th, 2010 at 8:59:33 AM permalink
Thanks all... Dween's algorithm makes a lot of sense... my coding skills are so out of shape I'd forgotten that a WHILE loop would cut down my steps. My solution was was iterating across the entire cumalative frequency map to find the range which matched the random number. Working the other way is much smoother.

I'm going to take a look at the other algorithms as well.

I'm peeling off the whole thing into a seperate object, so the pay table, frequency and other items will be methods and variables I can call at any time.

Cheers.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829

  • Jump to: