socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 21st, 2014 at 1:21:37 PM permalink
Quote: AxiomOfChoice

Well, yeah. If we are talking about an analysis running in 2s or 5s it really doesn't matter -- it's not like you can't wait 5s. Of course for games with more cards the difference might be a lot larger.


Not only that, but my goal in achieving good times at this level is to allow for another layer of abstraction to really minimize programmer time. Early results make it look like this could add an order of magnitude to run times.

Also, thanks JB for sharing your code. I think I'd already seen it in an old post, but it is pretty clever and it's nice to have more tools in the chest, since they often represent different tradeoffs. If you'd care to elaborate on why the specialK algorithm has a lot of fat, I'd be interested. It looks like he's indexing a 7 card unsorted hand with 6 additions and maybe a comparison; doing it in a way that works well on a 32-bit machine; and not using obscene amounts of memory. Maybe you only meant that he's not minimizing memory usage?
JB
Administrator
JB
  • Threads: 334
  • Posts: 2089
Joined: Oct 14, 2009
June 21st, 2014 at 3:55:07 PM permalink
Quote: socks

If you'd care to elaborate on why the specialK algorithm has a lot of fat, I'd be interested. It looks like he's indexing a 7 card unsorted hand with 6 additions and maybe a comparison; doing it in a way that works well on a 32-bit machine; and not using obscene amounts of memory.


It looked like his approach uses a unique prime number for each card rank, and multiplies the primes together to obtain a unique integer corresponding to a specific 7-card rank. So for AKQJT98, the result would be 41*37*31*29*23*19*17 = 10,131,543,907. That value is larger than what a 32-bit integer can support.

I would say this is pointless, since it's highly unusual for a game to use actual 7-card hands; it's always the best 5-card hand of the 7 cards. In other words, every 7-card combination still has a (best) 5-card hand rank.

I use a 32-bit integer and store the type of hand in it, so that you can still determine the better hand by simply comparing the two integers; the higher integer represents the better hand.

I do it in two phases:

1) Determine the hand rank of every 5-card combination.

2) Loop through every 7-card combination, and pick the best 5-card hand rank of the 21 ways to make a 5-card hand. Set this best 5-card hand rank as the hand rank of the 7-card combination.

As far as storing the 5-card hand rank, I use a 32-bit integer as well, coded as follows:

highest 8 bits are unused (00000000)
next 4 bits for the hand type (0000=high card, 0001=one pair, 0010=two pair, 0011=three of a kind, 0100=straight, 0101=flush, 0110=full house, 0111=four of a kind, 1000=straight flush)
next 4 bits for the highest card rank (0000=deuce, 0001=three, ..., 1100=ace)
next 4 bits for the second-highest card rank
next 4 bits for the third-highest card rank
next 4 bits for the fourth-highest card rank
lowest 4 bits for the lowest card rank

Most hand types don't need to use all 5 ranks. With a straight or straight flush, you only need to use the highest rank, which would be 5 through Ace. With Four of a Kind or Full House, you only need to use two of the ranks (quad rank + kicker, or trip rank + pair rank). With Two Pair and 3 of a Kind, you only need to use 3 of the ranks (high pair + low pair + kicker, or trip rank + high kicker + low kicker). With one pair you need 4 ranks (pair rank + high kicker + middle kicker + low kicker). With flushes and high-card hands, you need all 5 ranks.
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 21st, 2014 at 5:16:17 PM permalink
Quote: JB

It looked like his approach uses a unique prime number for each card rank, and multiplies the primes together to obtain a unique integer corresponding to a specific 7-card rank. So for AKQJT98, the result would be 41*37*31*29*23*19*17 = 10,131,543,907. That value is larger than what a 32-bit integer can support.


No. The prime number discussion was background and was the inspiration for the technique used. Instead of multiplication and primes he used addition and numbers chosen (by brute force) to not produce collisions when added together. The largest number produced fits into 23 bits. In part 2 he expands the approach, using the same addition method, in the remaining 9 bits, to take care of the flush check.
JB
Administrator
JB
  • Threads: 334
  • Posts: 2089
Joined: Oct 14, 2009
June 21st, 2014 at 5:18:55 PM permalink
Quote: socks

No. The prime number discussion was background and was the inspiration for the technique used. Instead of multiplication and primes he used addition and numbers chosen (by brute force) to not produce collisions when added together. The largest number produced fits into 23 bits. In part 2 he expands the approach, using the same addition method, in the remaining 9 bits, to take care of the flush check.


That's still a lot of extra effort when bit-shifting is simpler and faster, especially when decoding the integer. Simpler is better in my opinion.
AxiomOfChoice
AxiomOfChoice
  • Threads: 32
  • Posts: 5761
Joined: Sep 12, 2012
June 22nd, 2014 at 10:26:13 AM permalink
Quote: JB

I use a 32-bit integer and store the type of hand in it, so that you can still determine the better hand by simply comparing the two integers; the higher integer represents the better hand.

I do it in two phases:

1) Determine the hand rank of every 5-card combination.

2) Loop through every 7-card combination, and pick the best 5-card hand rank of the 21 ways to make a 5-card hand. Set this best 5-card hand rank as the hand rank of the 7-card combination.

As far as storing the 5-card hand rank, I use a 32-bit integer as well, coded as follows:

highest 8 bits are unused (00000000)
next 4 bits for the hand type (0000=high card, 0001=one pair, 0010=two pair, 0011=three of a kind, 0100=straight, 0101=flush, 0110=full house, 0111=four of a kind, 1000=straight flush)
next 4 bits for the highest card rank (0000=deuce, 0001=three, ..., 1100=ace)
next 4 bits for the second-highest card rank
next 4 bits for the third-highest card rank
next 4 bits for the fourth-highest card rank
lowest 4 bits for the lowest card rank

Most hand types don't need to use all 5 ranks. With a straight or straight flush, you only need to use the highest rank, which would be 5 through Ace. With Four of a Kind or Full House, you only need to use two of the ranks (quad rank + kicker, or trip rank + pair rank). With Two Pair and 3 of a Kind, you only need to use 3 of the ranks (high pair + low pair + kicker, or trip rank + high kicker + low kicker). With one pair you need 4 ranks (pair rank + high kicker + middle kicker + low kicker). With flushes and high-card hands, you need all 5 ranks.



This is exactly what I did when I did an analysis of a 7-card game where hand ranks had to be compared (player vs dealer)
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 25th, 2014 at 5:59:13 PM permalink
I finally got around to implementing JB's approach so I could try it out...

But before that, thinking about optimizations had led me to revise a 5 card hand evaluator I'd coded from the ground up last year. I stuck in 5 loops to run it through all the combos to test, and it evaluated the 2.6M 5c combos in 2sec (1 core). After optimization, it ran in 355ms. I was feeling pretty good, but realized, post optimization, that I was assuming a sorted hand, so I threw in a sorting network. Sorting 5 cards can be accomplished in 9 compare and swaps if you use this order 1-2, 3-4, 1-3, 0-2, 2-4, 0-3, 0-1, 2-3, 1-2; see: http://www.angelfire.com/blog/ronz/Articles/999SortingNetworksReferen.html

Sorting appears to add about 140msec to my times, for a total of 495msec

Then I tried JB's setup. using the same loops to test, I first got 150 msec, and 290 msec when adding time to sort. I was using my default of 64-bit integers to store results in since clojure defaults to that and if you want to use anything else, you have to be careful you're not having anything auto boxed, which can slow things down to a crawl. But then I decided to throw the main array into something smaller, a 16-bit array and I gained another 40msec.

Finally I wanted a standard to compare against, so I tried my clojure implementation of the suffe algorithm and got 650msec on an unsorted run (no sorting required for suffe).

tl;dr - The algo performance I achieved on 52c5 unsorted hands:

JB's - 290 msec (edit: 250msec if using a hand strength representation that fits into shorts)
My Algo - 495msec
Suffe - 650msec


Thanks again JB.


edit: also, I couldn't leave 3CP well enough alone. I got my time down to 300msec there as well.
JB
Administrator
JB
  • Threads: 334
  • Posts: 2089
Joined: Oct 14, 2009
June 25th, 2014 at 6:58:48 PM permalink
Quote: socks

Sorting 5 cards can be accomplished in 9 compare and swaps if you use this order 1-2, 3-4, 1-3, 0-2, 2-4, 0-3, 0-1, 2-3, 1-2; see: http://www.angelfire.com/blog/ronz/Articles/999SortingNetworksReferen.html


At least in C#, the 10-step unraveled bubble sort I use is faster than the 9-step optimal sorting network, otherwise I would have promoted the sorting network. Try it yourself. Here are my results for 10 million sorts of 5 random cards in C#:

10-step sort: 1,446,602 ticks (145 ms)
9-step sorting network: 1,531,054 ticks (153 ms)

The difference is small, but noticeable. I think fewer swaps are needed with the bubble sort than with the sorting network. Sorting networks are great for hardware but not necessarily optimal for software.

I found the same to be true for 6 inputs as well - an unraveled bubble sort (15 comparators) is faster on average than the optimal sorting network (12 comparators). For 7 cards however, the optimal sorting network is faster than the unraveled bubble sort.

Also, the XOR swap technique I listed earlier is not as fast (in C#) as just using a temporary variable. So this is the fastest 5-card sorting code I know of for C#:

int t;

if (c1 > c2) { t = c1; c1 = c2; c2 = t; }
if (c1 > c3) { t = c1; c1 = c3; c3 = t; }
if (c1 > c4) { t = c1; c1 = c4; c4 = t; }
if (c1 > c5) { t = c1; c1 = c5; c5 = t; }
if (c2 > c3) { t = c2; c2 = c3; c3 = t; }
if (c2 > c4) { t = c2; c2 = c4; c4 = t; }
if (c2 > c5) { t = c2; c2 = c5; c5 = t; }
if (c3 > c4) { t = c3; c3 = c4; c4 = t; }
if (c3 > c5) { t = c3; c3 = c5; c5 = t; }
if (c4 > c5) { t = c4; c4 = c5; c5 = t; }
DRich
DRich
  • Threads: 89
  • Posts: 12855
Joined: Jul 6, 2012
June 25th, 2014 at 7:09:46 PM permalink
I just want to thank you all for sharing. This is the type of thread I love to see.
At my age, a "Life In Prison" sentence is not much of a deterrent.
AxiomOfChoice
AxiomOfChoice
  • Threads: 32
  • Posts: 5761
Joined: Sep 12, 2012
June 25th, 2014 at 7:20:08 PM permalink
I have not used C# in a long, long time (many, many jobs ago) so I don't really know what I'm talking about, but couldn't you get better performance for this sort of thing using C or C++?

(I honestly have no idea what compilation tricks C# uses, but isn't it basically running over top of a RTE? That has to be slow)
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 25th, 2014 at 7:34:33 PM permalink
Quote: AxiomOfChoice

I have not used C# in a long, long time (many, many jobs ago) so I don't really know what I'm talking about, but couldn't you get better performance for this sort of thing using C or C++?

(I honestly have no idea what compilation tricks C# uses, but isn't it basically running over top of a RTE? That has to be slow)


I don' t know either, but the current benchmark game results would suggest this, http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=csharp&lang2=gcc&data=u32
JB
Administrator
JB
  • Threads: 334
  • Posts: 2089
Joined: Oct 14, 2009
June 25th, 2014 at 11:20:47 PM permalink
Quote: AxiomOfChoice

I have not used C# in a long, long time (many, many jobs ago) so I don't really know what I'm talking about, but couldn't you get better performance for this sort of thing using C or C++?

(I honestly have no idea what compilation tricks C# uses, but isn't it basically running over top of a RTE? That has to be slow)


I'm not a Linux fan; I use Microsoft's C#.NET implementation, not the Mono implementation. C# on Windows is probably a little slower than C or C++ on Windows, but I don't know because I don't write in C or C++.
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 26th, 2014 at 1:50:47 AM permalink
Quote: JB

At least in C#, the 10-step unraveled bubble sort I use is faster than the 9-step optimal sorting network, otherwise I would have promoted the sorting network. Try it yourself. Here are my results for 10 million sorts of 5 random cards in C#:

10-step sort: 1,446,602 ticks (145 ms)
9-step sorting network: 1,531,054 ticks (153 ms)

The difference is small, but noticeable. I think fewer swaps are needed with the bubble sort than with the sorting network. Sorting networks are great for hardware but not necessarily optimal for software.

I tried it. I ran both over the full 311875201 permutations. The optimal sorting network was about 16% faster. The actual # of swaps performed was:

unrolled bubble sort(10-step): 1559376000
sorting network(9-step): 1372250880
JB
Administrator
JB
  • Threads: 334
  • Posts: 2089
Joined: Oct 14, 2009
June 26th, 2014 at 2:32:19 AM permalink
Quote: socks

I tried it. I ran both over the full 311875201 permutations. The optimal sorting network was about 16% faster. The actual # of swaps performed was:

unrolled bubble sort(10-step): 1559376000
sorting network(9-step): 1372250880


What language did you use? I just ran the same and the optimal sorting network was a little faster over the entire set, but I show a different total number of swaps for the sorting network:

Bubble sort: time = 25,653,402 ticks (8,776 ms); swaps = 1,559,376,000
Optimal sort: time = 24,953,106 ticks (8,536 ms); swaps = 1,497,000,960

How did you implement the sorting network? I used the one on this page.
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
June 26th, 2014 at 11:24:25 AM permalink
Quote: JB

What language did you use? I just ran the same and the optimal sorting network was a little faster over the entire set, but I show a different total number of swaps for the sorting network:

Bubble sort: time = 25,653,402 ticks (8,776 ms); swaps = 1,559,376,000
Optimal sort: time = 24,953,106 ticks (8,536 ms); swaps = 1,497,000,960

How did you implement the sorting network? I used the one on this page.


I'm still using clojure. I duplicated your results using the network from that page 1497000960 swaps. Then I tested 1M random hands on ea algorithm to make sure they were actually sorting. I don't know enough about sorting to explain why the 2 networks use different #'s of swaps, but they both seem to work. Looks like maybe I just got lucky and stumbled onto a better one early on? Again, for reference, I'm swapping 1-2, 3-4, 1-3, 0-2, 2-4, 0-3, 0-1, 2-3, 1-2
AcesAndEights
AcesAndEights
  • Threads: 67
  • Posts: 4300
Joined: Jan 5, 2012
June 27th, 2014 at 10:37:43 PM permalink
Quote: JB

I'm not a Linux fan; I use Microsoft's C#.NET implementation, not the Mono implementation. C# on Windows is probably a little slower than C or C++ on Windows, but I don't know because I don't write in C or C++.


BUT WHO PAYS FOR YOUR .NET LICENCES AND STUFF
"So drink gamble eat f***, because one day you will be dust." -ontariodealer
  • Jump to: