Generating values that fit a probability distribution

Jan 31, 2023  

Constrained randomness is useful in adding a dash of variety to computer composed music. Picking values that fit a parametric probabilty distribution whose parameters are used for control is therefore a common case. Here I describe how to do that given access to only a uniform random number generator in the unit interval.

This is standard fare in probability theory, but for some reason does not come easy to students.

So you have a random number generator that can produce a value in the interval \([0,1]\) with a uniform distribution - i.e. the likelihood of picking a value in any sub-interval of this interval is proportional to the size of the sub-interval.

\[ P(x_1 \leq x \leq x_2) = x_2 - x_1 \text{ for } x_1, x_2 \in [0,1] \text{ and } x_2 \geq x_1 \]

Suppose I want to generate a number \(x\) that is distributed instead according to the density function \(h(x)\). I’m using \(h\) here instead of the conventional \(p\) to suggest that this could be given by a histogram in the discrete case (implying \(h(x) \geq 0\)). While it is conventional to assume that such a \(h(x)\) is normalized – i.e. \(\int_{-\infty}^{+\infty}h(x)dx = 1\) – that is not strictly needed for this algorithm. We can just call that integral \(N\) for “normalization” and do these calculations.

Now consider how we expressed the idea of a “uniform distribution” above – that the likelihood of picking a value in a sub-interval is proportional to the size of the subinterval. This idea actually generalizes to the case of \(h(x)\) too, where we calculate the “size” of an interval \([x_1,x_2]\) using \(P(x_1,x_2) = \int_{x_1}^{x_2}h(x)dx\).

So if we have the function \(H(a) = \int_{-\infty}^{a}h(x)dx\) available to us, we can calculate \(P\) using \(P(x_1,x_2) = H(x_2) - H(x_1)\). Note an important property of \(H\): \(x_2 \geq x_1 \implies H(x_2) \geq H(x_1)\).

Furthermore, we know that \(H(+\infty) = N\). Therefore \(H(a)\) is always in the range \([0,N]\) since \(h(x) \geq 0\). If we then pick a uniformly distributed random number \(r\) in the interval \([0,1]\), we need to set things up so that

\[ P(r_2 \leq r \leq r_1) = r_2 - r_1 = \frac{H(x_2) - H(x_1)}{N} \]

That is another way to say \(H(x(r)) = rN\) or equivalently –

\[x(r) = H^{-1}(rN)\]

Which directly expresses how to compute \(x\) given a particular \(r\) drawn at random from the uniform distribution in \([0,1]\).

If we differentiate w.r.t. \(r\) on both sides, and using \(H'(a) = h(a)\), we get –

\[ h(x)\frac{\text{d}x}{\text{d}r} = N \]

or written another way –

\[\frac{\text{d}x}{\text{d}r} = \frac{N}{h(x)} \]

Reading that out, if we are to run through the domain of \(r\) in small steps of \(\text{d} r\), then the amount by which we’ll have to step \(x\) by for a step from \(r\) to \(r+\text{d}r\) is given by \(\text{d} x = {N\text{d} r}/{h(x)}\).

Going back to our “interval” way of thinking about this, we can also write (for sufficiently small intervals) –

\[ x_2 - x_1 = N\frac{r_2 - r_1}{H(x_2) - H(x_1)} \]

… which helps us deal with cases where \(h(x) = 0\).

This way of looking at it also lets us interpret it in the discrete case where we’ll be working with histograms instead of continuous probability distributions. In the case of histograms, we can use \(x_2 - x_1 = 1\) since the \(x\)s are just bin indices, to get –

\[ N \times {(r_{k+1} - r_k)} = H(x_k+1) - H(x_k) = h(x_k + 1)\]

Illustration of how to sample from a given probabilty distribution