Abstract
Construct recursively a long string of words $w_1$,…$w_n$, such that at each step $k$, $w_{k+1}$ is a new word with a fixed probability $p\in(0,1)$, and repeats some preceding word with complementary probability $1−p$. More precisely, given a repetition occurs, $w_{k+1}$ repeats the jth word with probability proportional to $j^α$ for $j=1$,…,$k$. We show that the proportion of distinct words occurring exactly $ℓ$ times converges as the length $n$ of the string goes to infinity to some probability mass function in the variable $ℓ≥1$, whose tail decays as a power function when $p<1/(1+α)$, and exponentially fast when $p>1/(1+α)$.