Wizard
Administrator
Wizard
Joined: Oct 14, 2009
  • Threads: 1332
  • Posts: 21938
November 24th, 2013 at 5:46:59 AM permalink
I'm in the process of analyzing Heads Up Hold 'Em. This is a Ultimate Texas Hold 'Em variant by Galaxy Gaming that was seen at G2E and Raving, and already spreading fast in Washington state. There are COMBIN(52,2)*COMBIN(50,3)*COMBIN(47,2)*COMBIN(45,2) = 27,813,810,024,000 pairings of seven-card hands to work through and score. Even with all the short cuts I put in, it will take my program three to four weeks to go through them all.

Meanwhile, JB has been working on pai gow poker, and we hope to publish some ground-breaking stuff on that soon. That game involves combin(53,7)*combin(46,7) = 8,250,459,031,214,400 combinations, or 297 times as many as Heads Up Hold 'Em. JB wrote to me that he is using multi-core processing to cut down on the time it takes.

What is that, you might ask? Both my computers, and his, are actually four computers, or cores. So, it isn't a big sacrifice in speed to break the program into four parts, each doing 25% of the situations, and running them simultaneously. How do you do that? I wrote to JB to ask. I'm going to just copy and paste what he wrote below. So, give JB full credit for the following. I'm posting this for the edification of the forum, and to open to discussion if there might even be better ways.

Meanwhile, my Heads Up Hold 'Em results should results should be finished on Nov. 30.

****

First you should determine how many cores your CPU has, as this will tell you how many threads you should run. If you right-click an empty area of your taskbar, a menu should appear; click on Start Task Manager.

Go to the Processes tab, right-click any entry in the list, and choose "Set Affinity..." (if you get an Access Denied message, keep trying with different entries until it lets you)



Something like this will appear, indicating how many cores your CPU has:



The above indicates that there are four cores.

Cancel that and then close the Task Manager.

Next you need to figure out how to go about splitting up the work among the cores you have available. I wrote a multi-threaded program with four threads, but learning how to make a multi-threaded program takes a lot of time to get right. If you've never done it before, then the easier approach would be to compile multiple versions of your program, with each version working on its own specific range of hands. Then start each separate version independently. Have them display the combination counts when they have finished running, and manually add them all together when all copies have finished.

Since you have two computers, figure out how many cores you have on each computer, and make that many copies. So if one computer has 2 cores and the other has 4 cores, divide the hands up into 6 and make 6 separate versions of your analyzer, with each one working on its own range of 1/6th of the hands. Launch each copy after all of them have been compiled. If the single program (before being split up) reads or writes to one or more files, you'll need to make sure that you either have each program work with its own set of files, or have them open the files in shared-access mode so that none of them conflict with each other.

On the computer that you will be using for day-to-day stuff, after you have launched the programs, you should set them to run on low priority using the Task Manager -- right-click the taskbar and select Start Task Manager, then right-click on the program name, choose Set Priority, and select Low. It will confirm if you really want to change its priority; click Yes.



If you don't set them to run on low priority, your computer will become frustratingly sluggish when trying to do normal stuff like open a web browser.
It's not whether you win or lose; it's whether or not you had a good bet.
CRMousseau
CRMousseau
Joined: Oct 19, 2009
  • Threads: 2
  • Posts: 117
November 24th, 2013 at 6:07:59 AM permalink
This is more or less what I've been doing for a couple years, except I do use a program to act as overhead and divide up the tasks, and rely on Mac OS's C++ extensions that make multi-threading an absolute breeze.

An even more extreme solution is to use elastic cloud computing, where you basically use the approach Mike has described except you farm out several thousand virtual machines to do all the processing. Say, one for each starting hand. It costs money, of course. Depending on what plans you use and whether you pay up front or on an ad-hoc basis, you can rent an 8 core machine for between $0.50 - $10 / hr.

Divide up your work among enough of them, and even some of the most ludicrously complex games (double draw poker, anyone?) could be analyzed for a cost.

I have yet to explore the cloud solution in more detail other than Microsoft Azure lets you bill by the minute and Amazon's EC2 requires you to bill by the hour, and that there's a bit more overhead involved in setting up a common storage area to store the individual results and collate them. As well, how I would pass the parameter to each spawned instance isn't entirely obvious to me yet, besides entering them in manually and that might be problematic.
s2dbaker
s2dbaker
Joined: Jun 10, 2010
  • Threads: 51
  • Posts: 3259
November 24th, 2013 at 6:47:12 AM permalink
I'm back down to a quad core but I used to have a hex core Intel monster. I do all of my calculatin' in MS/SQL and I know exactly what you're doing here. My first approach was to divide the work among different threads by having one thread work on hands that ended with an A,K or Q, another for J, T or 9 etc.. but I wasn't happy with the efficiency since you still had to wait for the last thread to finish, whichever was slowest.

Then I went to a different system where I had a table designed to hold information about what hands were being worked on and another tracking the SPIDs that were doing the work. I could allocate work from what was next in the queue without having to reprogram and I could shut down a core without losing data. That way I could start and stop as many threads as I wanted to. In MS/SQL, each thread can be allocated an idle core. With 12 threads, I could crunch some numbers!

I found that the laws of diminishing returns would kick in when I reached 8 or 9 processes. MS/SQL uses one core to do its internal cleanup work and that work can't be divided. Once you max out the central clean-up core then adding more threads just makes the whole thing slow down. That's okay though, I still had two or three idle cores to use for other things.

The advantage of having a work allocation table is that, if you start a calculation thread and it is allocated to one of the cores that is already in use, you can stop it and kill the SPID and restart it to try to get an unallocated core. Because you stop it cleanly, you do not lose any information.

I hope that was readable.
Someday, joor goin' to see the name of Googie Gomez in lights and joor goin' to say to joorself, "Was that her?" and then joor goin' to answer to joorself, "That was her!" But you know somethin' mister? I was always her yuss nobody knows it! - Googie Gomez
thecesspit
thecesspit
Joined: Apr 19, 2010
  • Threads: 53
  • Posts: 5936
November 24th, 2013 at 8:53:07 AM permalink
It would suggest if there's a REALLY big problem that 6 cores for 4 weeks wouldn't do it, you crowd source the problem across many machines. Especially if the problem could be divided by starting hand, or distinct shuffles.

Of course, processing power might catch up before it's worth doing it that way over using cloud sources.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
CRMousseau
CRMousseau
Joined: Oct 19, 2009
  • Threads: 2
  • Posts: 117
November 24th, 2013 at 9:12:17 AM permalink
Quote: thecesspit

It would suggest if there's a REALLY big problem that 6 cores for 4 weeks wouldn't do it, you crowd source the problem across many machines. Especially if the problem could be divided by starting hand, or distinct shuffles.

Of course, processing power might catch up before it's worth doing it that way over using cloud sources.



Cloud computing is so cheap and reliable right now, I doubt there could ever be advances in processor power that could compare.
beachbumbabs
Administrator
beachbumbabs
Joined: May 21, 2013
  • Threads: 99
  • Posts: 14229
November 24th, 2013 at 9:57:55 AM permalink
Quote: Wizard

I'm in the process of analyzing Heads Up Hold 'Em. This is a Ultimate Texas Hold 'Em variant by Galaxy Gaming that was seen at G2E and Raving, and already spreading fast in Washington state. There are COMBIN(52,2)*COMBIN(50,3)*COMBIN(47,2)*COMBIN(45,2) = 27,813,810,024,000 pairings of seven-card hands to work through and score. Even with all the short cuts I put in, it will take my program three to four weeks to go through them all.

Meanwhile, JB has been working on pai gow poker, and we hope to publish some ground-breaking stuff on that soon. That game involves combin(53,7)*combin(46,7) = 8,250,459,031,214,400 combinations, or 297 times as many as Heads Up Hold 'Em. JB wrote to me that he is using multi-core processing to cut down on the time it takes.

What is that, you might ask? Both my computers, and his, are actually four computers, or cores. So, it isn't a big sacrifice in speed to break the program into four parts, each doing 25% of the situations, and running them simultaneously. How do you do that? I wrote to JB to ask. I'm going to just copy and paste what he wrote below. So, give JB full credit for the following. I'm posting this for the edification of the forum, and to open to discussion if there might even be better ways.

Meanwhile, my Heads Up Hold 'Em results should results should be finished on Nov. 30.

****

First you should determine how many cores your CPU has, as this will tell you how many threads you should run. If you right-click an empty area of your taskbar, a menu should appear; click on Start Task Manager.

Go to the Processes tab, right-click any entry in the list, and choose "Set Affinity..." (if you get an Access Denied message, keep trying with different entries until it lets you)



Something like this will appear, indicating how many cores your CPU has:



The above indicates that there are four cores.

Cancel that and then close the Task Manager.

Next you need to figure out how to go about splitting up the work among the cores you have available. I wrote a multi-threaded program with four threads, but learning how to make a multi-threaded program takes a lot of time to get right. If you've never done it before, then the easier approach would be to compile multiple versions of your program, with each version working on its own specific range of hands. Then start each separate version independently. Have them display the combination counts when they have finished running, and manually add them all together when all copies have finished.

Since you have two computers, figure out how many cores you have on each computer, and make that many copies. So if one computer has 2 cores and the other has 4 cores, divide the hands up into 6 and make 6 separate versions of your analyzer, with each one working on its own range of 1/6th of the hands. Launch each copy after all of them have been compiled. If the single program (before being split up) reads or writes to one or more files, you'll need to make sure that you either have each program work with its own set of files, or have them open the files in shared-access mode so that none of them conflict with each other.

On the computer that you will be using for day-to-day stuff, after you have launched the programs, you should set them to run on low priority using the Task Manager -- right-click the taskbar and select Start Task Manager, then right-click on the program name, choose Set Priority, and select Low. It will confirm if you really want to change its priority; click Yes.



If you don't set them to run on low priority, your computer will become frustratingly sluggish when trying to do normal stuff like open a web browser.



This is great information for me, thanks! I was able to solve a couple of long-running irritating problems using your methods on cleaning up processes and priorities, and I REALLY appreciate it!
If the House lost every hand, they wouldn't deal the game.
Nareed
Nareed
Joined: Nov 11, 2009
  • Threads: 373
  • Posts: 11413
November 24th, 2013 at 10:26:11 AM permalink
If your PC has a high end video card, you can use its CPU to run other programs as well (assuming you're not taxing it by running graphic-intensive programs at the same time). I've no idea how to do this, but programs like BOINC use it up (and made my video card crash, too).
Donald Trump is a fucking criminal
teliot
teliot
Joined: Oct 19, 2009
  • Threads: 37
  • Posts: 1956
November 24th, 2013 at 10:53:48 AM permalink
I run all my code in a Linux environment. I am currently running Texas Hold'em Bonus Poker that has a similar cycle size. The full basic-strategy cycle took 84 hours using 4 processors. I am now running the hole-card cycle, this will take about 82 hours. Using "the cloud" for this (200 processors) would take under 2 hours, which is what Stephen How does. But Stephen also confessed to me that he uses generic classes, like "map" in C++, for his code. This is highly inefficient. When I asked Stephen about this issue, he stated that the "cloud makes efficiency unnecessary for small games" like THBP and UTH. Maybe he was a bit more efficient with Lunar Poker and Double Draw Poker.

I also compile with the -Ofast flag under gcc or g++. You might consider optimization flags in your own compilation.

You can see in the image that I am about 864 minutes into the computation. This is equivalent to 13.8 days to run the cycle on a single processor.

MathExtremist
MathExtremist
Joined: Aug 31, 2010
  • Threads: 88
  • Posts: 6526
November 24th, 2013 at 10:57:46 AM permalink
Quote: Nareed

If your PC has a high end video card, you can use its CPU to run other programs as well (assuming you're not taxing it by running graphic-intensive programs at the same time). I've no idea how to do this, but programs like BOINC use it up (and made my video card crash, too).

Start here: http://w8isms.blogspot.com/2013/04/gpgpu-performance-tests.html?m=1

Even a 2-3 year old discrete GPU has massive parallel capacity compared to a new x86 chip. They were built to handle parallel numeric computations, and that's exactly what large poker analyses are. Anyone with an upgradeable desktop can often plug in a discrete GPU and significantly increase custom analysis code performance. If you do a lot of brute force looping, it could be a very worthwhile $300 or so.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
Wizard
Administrator
Wizard
Joined: Oct 14, 2009
  • Threads: 1332
  • Posts: 21938
November 24th, 2013 at 3:16:06 PM permalink
Great comments so far!

Quote: CRMousseau

Cloud computing is so cheap and reliable right now, I doubt there could ever be advances in processor power that could compare.



How does one go about purchasing time on the cloud? I might have analyzed Lunar Poker years ago if I knew about this option.
It's not whether you win or lose; it's whether or not you had a good bet.

  • Jump to: