In the previous IMAP upload post, I asked about the probability of each possible length of the resulting string in the random words generator. The loop had this structure:
for ($i = 0; $i < 7 + mt_rand(0, 4); $i++) /* add one character */ ;
The key is that the loop exit condition is checked against (probably) different values each time. It's different from a similar loop in which the random number is fixed beforehand, because here the intersection of events comes into play, Let's see why and how.
As soon as $i
reaches 7, if the value of mt_rand
is 0 then the loop finishes; if it's any other value, it goes on. It's obvious that the probability of the length being 7 is 1/5 = 20%.
When the value of $i
is 8, it's necessary for mt_rand
to return either 0 or 1 for the loop to end, which will happen one two out of five times. However, for the length to reach the value 8 it's a condition that first mt_rand
returns a value between 1 and 4, which will happen 4 out of 5 times. They are not independent events. Therefore, the combined probability is 4/5 · 2/5 = 8/25 = 32%.
For $i
to reach 9, it must happen simultaneously that mt_rand
returns a value between 1 and 4 the first time, and a value between 2 and 4 the second time. Furthermore, the loop will end when mt_rand
is between 0 and 2, which wil happen 3 out of 5 times. Combining the probabilities, we have P(length 9) = 4/5 · 3/5 · 3/5 = 36/125 = 28.8%.
By the same reasoning, the probability for it to reach 10 is 4/5 · 3/5 · 2/5 · 4/5 = 96/625 = 15.36%.
The remaining one can be calculated either following the same reasoning, now that we got the trick, or by subtracting all the previous probabilities from 1: 4/5 · 3/5 · 2/5 · 1/5 · 5/5 = 24/625 = 3.84% = 100% - 20% - 32% - 28.8% - 15.36%.
To sum up:
- P(length 7) = 20%
- P(length 8) = 32%
- P(length 9) = 28.8%
- P(length 10) = 15.36%
- P(length 11) = 3.84%
That is, it will be 8 almost one out of three times, closely followed by 9, then 7, then 10, and a few times it will reach 11. As I have always said, probability is not intuitive.
This program helps verifying the correctness of the numbers:
<?php $MAX = 10000000; $arr = array(0, 0, 0, 0, 0); for ($n = 0; $n < $MAX; $n++) { for ($i = 0; $i < mt_rand(0, 4); $i++) ; $arr[$i]++; } $l = strlen($MAX); for ($i = 0; $i < 5; $i++) { printf("%2d -> %{$l}d/$MAX = %f%%\n", $i+7, $arr[$i], $arr[$i]/$MAX*100); } ?>
Sample output:
7 -> 2000626/10000000 = 20.006260% 8 -> 3198181/10000000 = 31.981810% 9 -> 2879481/10000000 = 28.794810% 10 -> 1538148/10000000 = 15.381480% 11 -> 383564/10000000 = 3.835640%
No comments:
Post a Comment