In the previous IMAP upload post, I asked about the probability of each possible length of the resulting string in the random words generator. The loop had this structure:

for ($i=0;$i<7+mt_rand(0,4);$i++)/* add one character */;

The key is that the loop exit condition is checked against (probably) different values each time. It's different from a similar loop in which the random number is fixed beforehand, because here the intersection of events comes into play, Let's see why and how.

As soon as `$i`

reaches 7, if the value of `mt_rand`

is 0 then the loop finishes; if it's any other value, it goes on. It's obvious that the probability of the length being 7 is 1/5 = 20%.

When the value of `$i`

is 8, it's necessary for `mt_rand`

to return either 0 or 1 for the loop to end, which will happen ~~one~~ two out of five times. However, for the length to reach the value 8 it's a condition that first `mt_rand`

returns a value between 1 and 4, which will happen 4 out of 5 times. They are not independent events. Therefore, the combined probability is 4/5 · 2/5 = 8/25 = 32%.

For `$i`

to reach 9, it must happen simultaneously that `mt_rand`

returns a value between 1 and 4 the first time, and a value between 2 and 4 the second time. Furthermore, the loop will end when `mt_rand`

is between 0 and 2, which wil happen 3 out of 5 times. Combining the probabilities, we have P(length 9) = 4/5 · 3/5 · 3/5 = 36/125 = 28.8%.

By the same reasoning, the probability for it to reach 10 is 4/5 · 3/5 · 2/5 · 4/5 = 96/625 = 15.36%.

The remaining one can be calculated either following the same reasoning, now that we got the trick, or by subtracting all the previous probabilities from 1: 4/5 · 3/5 · 2/5 · 1/5 · 5/5 = 24/625 = 3.84% = 100% - 20% - 32% - 28.8% - 15.36%.

To sum up:

- P(length 7) = 20%
- P(length 8) = 32%
- P(length 9) = 28.8%
- P(length 10) = 15.36%
- P(length 11) = 3.84%

That is, it will be 8 almost one out of three times, closely followed by 9, then 7, then 10, and a few times it will reach 11. As I have always said, probability is not intuitive.

This program helps verifying the correctness of the numbers:

<?php $MAX=10000000;$arr= array(0,0,0,0,0); for ($n=0;$n<$MAX;$n++) { for ($i=0;$i<mt_rand(0,4);$i++) ;$arr[$i]++; }$l=strlen($MAX); for ($i=0;$i<5;$i++) {printf("%2d -> %{$l}d/$MAX = %f%%\n",$i+7,$arr[$i],$arr[$i]/$MAX*100); }?>

Sample output:

7 -> 2000626/10000000 = 20.006260% 8 -> 3198181/10000000 = 31.981810% 9 -> 2879481/10000000 = 28.794810% 10 -> 1538148/10000000 = 15.381480% 11 -> 383564/10000000 = 3.835640%

## No comments:

Post a Comment