MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
March 22nd, 2013 at 3:05:19 PM permalink
I have a small problem, kind of combinatorial statistics, for a computer algorithm...

There is given an alphabet of C characters, (i.e. "abcde"), and I sample up to S characters from this alphabet (with replacement), i.e. getting words like "cacb".
From the sampled characters the order is unimportant (i.e. "cacd" is the same as "dacc", but different from "daac" or "acd").

I want to "hash" each possible word, i.e. give them a unique number (disregarding to the ordering). As the hash will be used as a memory address, the hash should be efficient to calculate (from word->hash, the other direction is never used) - and more important as dense as possible (i.e. after all possible words of S characters are hashed, all numbers should fill an interval completely - to save memory).


The unimportant ordering can be dealt with, as one could normalize each word in a sequence of raising characters (i..e. "cacd" is normalized to "accd") or counting the occurences of each character (i.e. "cacd" would be then the tuple (1,0,2,1,0) or something similar).

Does someone know of some simple scheme like this ?
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
March 22nd, 2013 at 6:22:28 PM permalink
It's just a base C number system. 8 letters would map to octal numbers, etc., using the tuple construct you already have, and then you can convert to decimal to work with (for array indexing, etc.) Each base C number is S digits.

Edit: oops, I missed the efficient hash and limited length requirement. Look into combinatorial number systems and partitions.

Edit 2: And look into multisets. I recall this was mentioned recently here somewhere. See:

http://en.wikipedia.org/wiki/Multiset
http://en.wikipedia.org/wiki/Combinatorial_number_system

I think what you want to do is (a) select your string, which is really a multiset, (b) map the multiset to a combination (see the cites), then (c) use a combinatorial number system to convert the combination to an integer. That should generate a perfect hash.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
s2dbaker
s2dbaker
  • Threads: 51
  • Posts: 3259
Joined: Jun 10, 2010
March 22nd, 2013 at 7:32:49 PM permalink
This may be way above my pay grade but as a database programmer, I created a table called HASH with the columns HASH_PK and LETTER and populated 26 rows with a through z in the LETTER column and HASH_PK as an identity column, basically 1 though 26. I then created a Cartesian product using the following query:
Select	row_number() over(order by a.letter,b.letter,c.letter,d.letter) rownum,
a.letter c1,
b.letter c2,
c.letter c3,
d.letter c4
from HASH a
inner join HASH b
on a.letter <= b.letter
inner join HASH c
on b.letter <= c.letter
inner join HASH d
on c.letter <= d.letter
Where d.letter < 'f'
Order By 1,2,3,4
with the following results:
Rownumc1c2c3c4
1aaaa
2aaab
3aaac
4aaad
5aaae
6aabb
7aabc
8aabd
9aabe
10aacc
11aacd
12aace
13aadd
14aade
15aaee
16abbb
17abbc
18abbd
19abbe
20abcc
21abcd
22abce
23abdd
24abde
25abee
26accc
27accd
28acce
29acdd
30acde
31acee
32addd
33adde
34adee
35aeee
36bbbb
37bbbc
38bbbd
39bbbe
40bbcc
41bbcd
42bbce
43bbdd
44bbde
45bbee
46bccc
47bccd
48bcce
49bcdd
50bcde
51bcee
52bddd
53bdde
54bdee
55beee
56cccc
57cccd
58ccce
59ccdd
60ccde
61ccee
62cddd
63cdde
64cdee
65ceee
66dddd
67ddde
68ddee
69deee
70eeee
Someday, joor goin' to see the name of Googie Gomez in lights and joor goin' to say to joorself, "Was that her?" and then joor goin' to answer to joorself, "That was her!" But you know somethin' mister? I was always her yuss nobody knows it! - Googie Gomez
teliot
teliot
  • Threads: 43
  • Posts: 2871
Joined: Oct 19, 2009
March 22nd, 2013 at 7:38:34 PM permalink
Quote: MangoJ

I have a small problem

Does your problem have anything to do with quick poker-hand evaluation?
Climate Casino: https://climatecasino.net/climate-casino/
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
March 22nd, 2013 at 7:58:21 PM permalink
Quote: MathExtremist

It's just a base C number system. 8 letters would map to octal numbers, etc., using the tuple construct you already have, and then you can convert to decimal to work with (for array indexing, etc.) Each base C number is S digits.



Yes this would be the brute force method, but will require space of C^S, which is ... large, and most of the space will be empty due to the ordering identity when S is comparable to C.

I'm currently thinking about some kind of tree with C links at each node. One would traverse the tree following the lowest characters in the given word, discard the character from the word, and then proceed from there recursively until the word is empty. the good point would be: access time would be linear to S, but addressing is slow, with S reads access across the whole memory. Space efficiency would depend on C compared to the size of the data stored for each word.
Not sure if I like it enough to settle for this approach - need to sleep over it.

Edit:

@MathExtremist: Wow, this is spot on. The word is indeed a mutiset. I didn't fully understood your second article as there are many unfamiliar terms used - Seems I will just play around with the stated bijective formula.

Edit: Argh.... the ordering bijection is only for distinct characters....
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
March 22nd, 2013 at 8:40:40 PM permalink
Quote: teliot

Does your problem have anything to do with quick poker-hand evaluation?


Not sure if it could be applied to addressing poker hands.

The project is about money management in tournaments for a game of chance, where your performance would depends on how well you bet in relation to each others current bankrolls. It is pretty simple to calculate the optimum betsize for the final round of the tournament, and giving you good estimate of success.

The possible bankrolls form the alphabet, each opponent is thus represented by a character, and all opponents left form the word. "character ordering" is discarded as you don't care about the specific identity of your opponents. I just picked "words", "alphabet" and "characters" for a clearer concept of what the problem is.

Anyway, once you know all EV estimates of success for the final round for all (reasonable) bankroll configurations, you can calculate optimum betsize for the round before the final. Do it again for any bankroll configuration - and in principle you could calculate the overall tournament strategy down to the very first bet.
Fast caching is crucial here, but once you have a good cache routine you can calculate any round from the round following. Cache must also be small enough so you can make the bankroll binning finer (i.e. more accurate). The last issue is calculating time preparing the caches...
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
March 22nd, 2013 at 9:07:34 PM permalink
Quote: MangoJ

@MathExtremist: Wow, this is spot on. The word is indeed a mutiset. I didn't fully understood your second article as there are many unfamiliar terms used - Seems I will just play around with the stated bijective formula.

Edit: Argh.... the ordering bijection is only for distinct characters....


No, distinct characters makes it a regular set, not a multiset. You're going from right-to-left per the Wikipedia picture. The bijection is from a first multiset (which you already have) to a second set of combinations (of same size, just from a larger range). So you just need to write a conversion function from your multiset item to a corresponding combination item, then run the combination through the combinadic to return its index.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
Buzzard
Buzzard
  • Threads: 90
  • Posts: 6814
Joined: Oct 28, 2012
March 22nd, 2013 at 9:23:14 PM permalink
Quote: MathExtremist

No, distinct characters makes it a regular set, not a multiset. You're going from right-to-left per the Wikipedia picture. The bijection is from a first multiset (which you already have) to a second set of combinations (of same size, just from a larger range). So you just need to write a conversion function from your multiset item to a corresponding combination item, then run the combination through the combinadic to return its index.




I thought only English was to be spoken in the forum ?
Shed not for her the bitter tear Nor give the heart to vain regret Tis but the casket that lies here, The gem that filled it Sparkles yet
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
March 25th, 2013 at 10:11:18 PM permalink
Quote: MathExtremist

No, distinct characters makes it a regular set, not a multiset. You're going from right-to-left per the Wikipedia picture. The bijection is from a first multiset (which you already have) to a second set of combinations (of same size, just from a larger range). So you just need to write a conversion function from your multiset item to a corresponding combination item, then run the combination through the combinadic to return its index.



Thanks, I missed the bijection between the multiset and the combination. In the meanwhile I found a much simpler approach in some research paper. If someone cares, here is the relevant code (for an alphabet of 9 characters and words up to 6 characters).


const unsigned long int magic[6][9] = {
{3003, 1726, 924, 462, 210, 84, 28, 7, 1},
{ 1287, 792, 462, 252, 126, 56, 21, 6, 1},
{ 495, 330, 210, 126, 70, 35, 15, 5, 1},
{ 165, 120, 84, 56, 35, 20, 10, 4, 1},
{ 45, 36, 28, 21, 15, 10, 6, 3, 1},
{ 9, 8, 7, 6, 5, 4, 3, 2, 1}
};

unsigned long int address = 0;
for(int i=0, int k=0; k<9; k++) {
for(int c = 0; c < word[k]; c++) {
address += magic[i++][k];
}
}


where word[k] is the word as a normalized tuple, i.e. "bacc" is the array with elements {1,1,2,0,0,0}, and "address" will be the resulting natural number.

The magic numbers table is constructed quite simple: The last row and rightmost column is trivial, and all other entries follow magic[n][m] = magic[n][m+1] + magic[n+1][m]. (Of course this can be initialized to any given fixed dimensions in a loop.)
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
March 25th, 2013 at 10:22:23 PM permalink
Those are the binomial coefficients. I recall the combinadic method for mapping a combination to an ordinal uses a similar routine. Do you have a link to the paper?
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
socks
socks
  • Threads: 15
  • Posts: 364
Joined: Jul 13, 2011
April 8th, 2013 at 6:53:17 AM permalink
Quote: MangoJ

Not sure if it could be applied to addressing poker hands.



I suspect he asked because that is a common usage: http://www.codingthewheel.com/archives/poker-hand-evaluator-roundup
dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
April 8th, 2013 at 6:53:29 PM permalink
I just want to interject to say I have a Master's degree in Combinatorics & Optimization, and I have never seen the word combinadic before. In fact, I have always been amused by our ability to just make up words, like "Combinatorics", and then give out degrees in those topics.
Wisdom is the quality that keeps you out of situations where you would otherwise need it
CrystalMath
CrystalMath
  • Threads: 8
  • Posts: 1909
Joined: May 10, 2011
April 9th, 2013 at 9:54:23 AM permalink
Quote: MangoJ



{3003, 1726, 924, 462, 210, 84, 28, 7, 1},


The 1726 should be 1716.

I love when this site opens my eyes to new things. Thanks MangoJ and ME.
I heart Crystal Math.
CrystalMath
CrystalMath
  • Threads: 8
  • Posts: 1909
Joined: May 10, 2011
April 9th, 2013 at 10:52:50 AM permalink
I've reworked this to use just combinatorials.


int calcMultiSetAddress2(int[] word, int maxWordLength)
{
int address = 0;
int alphabetSize = word.length;

for(int k=0; k<alphabetSize; k++)
{
int i = maxWordLength;
for(int c = word[k]-1; c >=0; c--)
{
address += combin(alphabetSize-k+i-1,i--);
}
}
return address;
}
I heart Crystal Math.
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
April 9th, 2013 at 11:32:33 AM permalink
Thanks for the correction!

Did you test your code ? I see no problem iterating i and c backwards, but it seems in your code i gets initialized for every iteration of k, while the original code doesn't do it. Also, I'm more of a fan of precomputed lookup-tables than live function calls when it comes to performance, especially when the parameter space is well defined and small enough. Of course unless there are mistakes in the table^^

Since you understand the algorithm much better than I do, is it possible to have addiitional properties like: for n<m the address of a n-character word is always less than the address of am m-character word ? I.e. in address space address 0 would be assigned to empty word, address range 1..S would be assigned by the single-character word, address (S+1)...(something) is assigned by all 2-character words ?

This way the "maxWordLength" would not be fixed beforehand, and the increase in maxWordLength would only increase the total address space, but no rearrangement. This could be a very handy property....
CrystalMath
CrystalMath
  • Threads: 8
  • Posts: 1909
Joined: May 10, 2011
April 9th, 2013 at 12:12:35 PM permalink
Quote: MangoJ

Thanks for the correction!

Did you test your code ?


I messed up the code with declaring i as I did. It should be as you had it. The results of my function match yours.

Quote:


I see no problem iterating i and c backwards, but it seems in your code i gets initialized for every iteration of k, while the original code doesn't do it.


see above

Quote:


Also, I'm more of a fan of precomputed lookup-tables than live function calls when it comes to performance, especially when the parameter space is well defined and small enough. Of course unless there are mistakes in the table^^


I completely agree. If you know the size of the alphabet and the max word size, you should precompute.

Quote:


Since you understand the algorithm much better than I do, is it possible to have addiitional properties like: for n<m the address of a n-character word is always less than the address of am m-character word ? I.e. in address space address 0 would be assigned to empty word, address range 1..S would be assigned by the single-character word, address (S+1)...(something) is assigned by all 2-character words ?

This way the "maxWordLength" would not be fixed beforehand, and the increase in maxWordLength would only increase the total address space, but no rearrangement. This could be a very handy property....


I'll look at this.
I heart Crystal Math.
CrystalMath
CrystalMath
  • Threads: 8
  • Posts: 1909
Joined: May 10, 2011
April 9th, 2013 at 4:15:54 PM permalink
OK, here's a function that does what you want. I ran it with all 0-3 letter words from a 6 letter alphabet, and the results are shown below.


int calcMultiSetAddress(int[] word)
{
int address = 0;
int alphabetSize = word.length-1;
int wordSize = 0;
for(int i = 0; i < word.length; i++) wordSize+=word;
for(int i = 0; i < wordSize; i++) address += combin(alphabetSize+i,i);
for(int i = wordSize, k=0; k<alphabetSize; k++)
{
for(int c = 0; c < word[k]; c++)
{
address += combin(alphabetSize-k+i-1,i--);
}
}
return address;
}






word address
[0, 0, 0, 0, 0, 0] 0
[0, 0, 0, 0, 0, 1] 1
[0, 0, 0, 0, 1, 0] 2
[0, 0, 0, 1, 0, 0] 3
[0, 0, 1, 0, 0, 0] 4
[0, 1, 0, 0, 0, 0] 5
[1, 0, 0, 0, 0, 0] 6
[0, 0, 0, 0, 0, 2] 7
[0, 0, 0, 0, 1, 1] 8
[0, 0, 0, 0, 2, 0] 9
[0, 0, 0, 1, 0, 1] 10
[0, 0, 0, 1, 1, 0] 11
[0, 0, 0, 2, 0, 0] 12
[0, 0, 1, 0, 0, 1] 13
[0, 0, 1, 0, 1, 0] 14
[0, 0, 1, 1, 0, 0] 15
[0, 0, 2, 0, 0, 0] 16
[0, 1, 0, 0, 0, 1] 17
[0, 1, 0, 0, 1, 0] 18
[0, 1, 0, 1, 0, 0] 19
[0, 1, 1, 0, 0, 0] 20
[0, 2, 0, 0, 0, 0] 21
[1, 0, 0, 0, 0, 1] 22
[1, 0, 0, 0, 1, 0] 23
[1, 0, 0, 1, 0, 0] 24
[1, 0, 1, 0, 0, 0] 25
[1, 1, 0, 0, 0, 0] 26
[2, 0, 0, 0, 0, 0] 27
[0, 0, 0, 0, 0, 3] 28
[0, 0, 0, 0, 1, 2] 29
[0, 0, 0, 0, 2, 1] 30
[0, 0, 0, 0, 3, 0] 31
[0, 0, 0, 1, 0, 2] 32
[0, 0, 0, 1, 1, 1] 33
[0, 0, 0, 1, 2, 0] 34
[0, 0, 0, 2, 0, 1] 35
[0, 0, 0, 2, 1, 0] 36
[0, 0, 0, 3, 0, 0] 37
[0, 0, 1, 0, 0, 2] 38
[0, 0, 1, 0, 1, 1] 39
[0, 0, 1, 0, 2, 0] 40
[0, 0, 1, 1, 0, 1] 41
[0, 0, 1, 1, 1, 0] 42
[0, 0, 1, 2, 0, 0] 43
[0, 0, 2, 0, 0, 1] 44
[0, 0, 2, 0, 1, 0] 45
[0, 0, 2, 1, 0, 0] 46
[0, 0, 3, 0, 0, 0] 47
[0, 1, 0, 0, 0, 2] 48
[0, 1, 0, 0, 1, 1] 49
[0, 1, 0, 0, 2, 0] 50
[0, 1, 0, 1, 0, 1] 51
[0, 1, 0, 1, 1, 0] 52
[0, 1, 0, 2, 0, 0] 53
[0, 1, 1, 0, 0, 1] 54
[0, 1, 1, 0, 1, 0] 55
[0, 1, 1, 1, 0, 0] 56
[0, 1, 2, 0, 0, 0] 57
[0, 2, 0, 0, 0, 1] 58
[0, 2, 0, 0, 1, 0] 59
[0, 2, 0, 1, 0, 0] 60
[0, 2, 1, 0, 0, 0] 61
[0, 3, 0, 0, 0, 0] 62
[1, 0, 0, 0, 0, 2] 63
[1, 0, 0, 0, 1, 1] 64
[1, 0, 0, 0, 2, 0] 65
[1, 0, 0, 1, 0, 1] 66
[1, 0, 0, 1, 1, 0] 67
[1, 0, 0, 2, 0, 0] 68
[1, 0, 1, 0, 0, 1] 69
[1, 0, 1, 0, 1, 0] 70
[1, 0, 1, 1, 0, 0] 71
[1, 0, 2, 0, 0, 0] 72
[1, 1, 0, 0, 0, 1] 73
[1, 1, 0, 0, 1, 0] 74
[1, 1, 0, 1, 0, 0] 75
[1, 1, 1, 0, 0, 0] 76
[1, 2, 0, 0, 0, 0] 77
[2, 0, 0, 0, 0, 1] 78
[2, 0, 0, 0, 1, 0] 79
[2, 0, 0, 1, 0, 0] 80
[2, 0, 1, 0, 0, 0] 81
[2, 1, 0, 0, 0, 0] 82
[3, 0, 0, 0, 0, 0] 83
I heart Crystal Math.
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
April 9th, 2013 at 5:03:46 PM permalink
Hey, that's pretty awesome man!

As far as I see, the second loop could be eliminated by "address += combin(alphabetSize + wordSize, wordSize - 1)".
Then it would be efficient to keep track of wordSize in the word array itself (i.e. as [0] index, with the character frequencies starting at [1])

Then the code could look like

int calcMultiSetAddress(int[] word) {
int address = combin(word.length + word[0], word[0]-1);
for(int i = word[0], k=1; k<=word.length; k++) {
for(int c = 0; c < word[k]; c++) {
address += combin(word.length-k+i,i--);
}
}
return address;
}


What do you think ?
CrystalMath
CrystalMath
  • Threads: 8
  • Posts: 1909
Joined: May 10, 2011
April 10th, 2013 at 9:03:30 AM permalink
Quote: MangoJ

Hey, that's pretty awesome man!

What do you think ?



Thanks!

I like removing the extra loops. Your comment about removing the address+= loop jogged my memory that I don't need to do the loop at all (but it ended up a little different from your suggestion), I just had to realize that c(n,m) + c(n-1,m-1) + c(n-2,m-2) ... = c(n+1,m).

So, here is what I came up with and tested, with identical results as before:


public static int calcMultiSetAddress(int[] word) {
int address = (int) combin(word.length - 2 + word[0], word[0] - 1);
for (int i = word[0], k = 1; k < word.length - 1; k++) {
for (int c = 0; c < word[k]; c++) {
address += combin(word.length - 2 - k + i, i--);
}
}
return address;
}


I'm happy to help, but I'm sure it will help me out some day too.
I heart Crystal Math.
  • Jump to: