Tuesday, February 19, 2008

Google dice question

A fellow grad student in my lab is determined to figure out the dice question Google asked me in my phone interview. The basic idea is to accurately simulate a seven-sided die using any number of five-sided dice.

His first attempt was to roll seven five-sided dice, take the modulus by seven and then add one to the result. During simulation this comes close to a uniform distribution, but the actual math works out to be non-uniform. I wrote this program for testing:

#include
#include

#define COUNT 1000

int real[7];
int jun[7];

// Couldn't see anything wrong from this:
int test () {
int i;
srand( time(NULL) );
for (i = 0; i < COUNT; i++) {
float max = RAND_MAX;
int sum=0;
int j;
for (j=0; j < 7; j++) {
float r = rand();
int d5 = ((r/max)*5) +1;
sum+=d5;
}
int calculate = (sum % 7) +1;

float r = rand();
int d7 = ((r/max)*7) +1;
//printf("%d %d\n", d7, calculate);
real[d7-1]++;
jun[calculate-1]++;
}
int suma=0;
int sumb=0;
for (i=0;i<7;i++){
printf("%d) real %10d jun %10d\n", (i+1), real[i], jun[i]);
suma+=real[i];
sumb+=jun[i];
}
printf("%d %d\n", suma, sumb);
return 0;
}

// This comes from http://en.wikipedia.org/wiki/Dice
float probability(int s, int i, int k) {
int n;

if (i==1)
if ((k>=1) && (k<=s))
return 1/((float) s);
else
return 0;

float sum = 0;
for (n=1;n<=s;n++) {
sum += probability(s,1,n) * probability(s,i-1,(k-n));
}
return sum;
}

float sums[7];

int main () {
int x;
for (x=7;x<=35;x++){
float prob = probability (5,7,x);
//printf("%d %f\n", x, prob);
sums[x%7]+=prob;
}
for (x=0;x<7;x++)
printf ("%d %f\n",x+1,sums[x]);
}
// since probabilities are not all even, your algorithm does not work


He countered with an argument using six-sided die that was actually a degenerate case. My probability skills are fairly weak, so I had to answer with an enumeration of the possibilities in an example using three-sided dice to emulate a five-sided die.

His final answer was to look at sequences of die-rolls. First prime the pump by rolling the five-sided die twice and taking the modulos seven of the result. Then, for each seven-sided die roll we want after that, roll a five-sided die add it to the previous result, and mululo seven. His argument is that, in the long run, this produces a uniform distribution.

Simulating this algorithm seems to produce fairly uniform distributions, but so did the last one as well. Unfortunately, the probability math is beyond me. Well, it is beyond the time I have to put into it. I still do not believe this is a correct answer, but I cannot prove it. My best argument is a proof by contradiction.

Assume the algorithm is right.
After a certain number of rolls, the last roll accurately simulates a seven-sided die.
To get the next roll of the seven-sided die, we add a five-sided roll and modulo seven.
That means seven-sided equals seven-sided plus five-sided modulo seven.
Which is a contradiction.


Any other thoughts? How can I prove the algorithm wrong? Or is it right?

4 comments:

Anonymous said...

Here is an idea: Throw seven dice, numbered from one to seven, and pick the one with maximum number face-up. If there is a tie, throw tied ones again to break the tie. Since all dice are identical, the winner die is distributed uniformly from 1 to 7.

Nathan said...

That makes sense to me, although it is not how I did it in my interview. In the interview I took three five sided dice and turned them into bits. The binary interpretation is a uniformly random number from 1 to 7.

Anonymous said...

Nathan your answer in the interview was wrong: 3 random bits would give a uniform distribution from 1 to 8.

Nathan said...

The answer in the interview was right, but your summary of it was wrong. See http://ncooprider.blogspot.com/2008/02/google-phone-interview.html
for my answer. The key to change it from 0-7 to 1-7 is to have 0 be a re-roll.