[latexpage]
Nowadays, in most countries, everyone has a surname. Some names are very common, like “de Jong” in the Netherlands, or for example “Nguyễn” in Vietnam, where almost 40% of the population has this surname. On the other hand, some surnames are extremely rare. For instance in Thailand, where surnames were largely introduced in 1913 and no family was allowed to duplicate any existing surnames. Perhaps, you have a very rare surname yourself. Then you might have asked yourself this question: will my family name ever go extinct? In this article, I will demonstrate a mathematical model to answer this question.
In this mathematical model, we assume that family names are only passed through by men and that the number of boys that a family gets is randomly distributed. We could model this randomness as rolling a dice. Consider, for example, the following dice: it contains one zero, two ones, two twos, and one three. The outcome of the dice roll will then be the number of sons that the father gets. We could now use this dice to generate a sample family tree. An example is shown below.
In the sample family tree seen above, the family name did actually go extinct. With the dice of this example, this will not always be the case. This is because the expected value of a dice roll is given by $0 * 1/6 + 1 * 2/6 + 2 * 2/6 + 3 * 1/6 = 9/6$. When this expected value is larger than one, there will on average be more boys every generation and the family does not need to go extinct, but it is still possible, as we saw in the example. However, when this expected value is smaller than one, the family name must go extinct at some point. Since the size of the family decreases on average, there must be some point where the name goes extinct.
But now that we saw the above example, we might be wondering what would be the probability of extinction, as extinction is not guaranteed in that case. Suppose that we just want to calculate the probability of extinction by generation $n$, where $n$ is an arbitrary number. To compute this, we will take a better look at the four situations that might happen in the first period.
By conditioning on these four situations, we can split up the probability in a sum of four terms which is given by the following expression:
\begin{align*}
P(\text{extinct by gen.} n) = &1/6 + 2/6P(\text{extinct by gen. } n-1) \\
&+ 2/6P(\text{extinct by gen. } n-1)^2 \\
& + 1/6P(\text{extinct by gen. } n-1)^3.
\end{align*}
This can be explained in the following way. First of all, it might happen that no boy is born: in this case the family name immediately goes extinct. For our dice, this adds $1/6$ to the probability of going extinct by period $n$. Now, it also might be the case that one boy is born. It is now important to notice that for this new boy, the same process restarts again. The probability only changes to going extinct in $n-1$ periods. Next to that, we multiply this by $2/6$ as that is the probability of one boy being born. Moreover, it can be the case that two boys are born. In this case, the process restarts again two times. To go extinct by generation n, both of these processes need to be terminated in $n-1$ generations. Since these processes are independent we can just multiply them. This yields the $P(\text{extinct by gen. } n-1)^2$ term. Lastly, for the final term, the same explanation holds.
Now, to give a simpler expression of the formula above, we introduce the function $G(x) = 1/6 + 2/6x + 2/6x^2 + 1/6x^3$. With this function, the formula can be written as:
$$
P(\text{extinct by gen. } n) = G(P(\text{extinct by gen. } n-1)).
$$
However, what we are most interested in is the probability of ever going extinct, so when $n$ goes to infinity. This actually simplifies the above formula even more as $n$ and $n-1$ both go to infinity:
$$
P(\text{ultimate extinction}) = G(P(\text{ultimate extinction})).
$$
This leads us to a very simple problem. To find the probability of ultimate extinction, we only have to solve the expression $x = 1/6 + 2/6x + 2/6x^2 + 1/6x^3$ for $x$. This can be done by a simple root solver. On the left of the figure below, we have also depicted this graphically.
For this equation, $x = 1$ is always a solution, as the probabilities need to add to one. Next to that, we also see in the figure on the left that we have a root at $x = 0.30$. This tells us that the probability that the family name of the example will go extinct is 30%. Furthermore, on the right, we have depicted the equation for a dice with an expected value smaller than one. Here, we do not see a root smaller than one, and therefore, the family must go extinct in this case.
Of course, this model has some strong assumptions that could be questioned and therefore it is not entirely realistic. Nonetheless, it provides some interesting insights. Perhaps the next time that you talk with your family about your family name, you should take a critical look at how many sons people in your family usually have and use this model!
Dit artikel is geschreven door Stan Koobs