Hypergeometric Distribution
Published:
This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.
Basic Ideas
The Hypergeometric Distribution
When a finite population contains two types of items. As each item is selected, the proportion of successes in the remaining population decreases or increases, depending on whether the sampled
item was a success or a failure. For this reason the trials are not independent, so the number of successes in the sample does not follow a binomial distribution
The distribution that properly describes the number of successes in this situation is
called the hypergeometric distribution
Assume a finite population contains $N$ items, of which $R$ are classified as successes and $N−R$ are classified as failures. Assume that n items are sampled from this population, and let $X$ represent the number of successes in the sample. Then X has the hypergeometric distribution with parameters $N$, $R$, and $n$, which can be denoted $X \sim H(N, R, n)$.
The probability mass function of $X$ is
- \[p(x) = P(X = x) = \left\{ \begin{array}{ll} \frac{ \binom Rx \binom (N-R)(n-x) }{ \binom Nn } & \mbox{if}~ max(0, R + n − N) \leq x \leq min(n, R)\\ 0 & otherwise \end{array} \right.\]
Of $50$ buildings in an industrial park, $12$ have electrical code violations. If $10$ buildings
are selected at random for inspection, what is the probability that exactly $3$ of
the $10$ have code violations?
Mean and Variance of the Hypergeometric Distribution
- If $X \sim H(N, R, n)$, then
- $𝜇_{X} = \frac{nR} {N}$
- $\sigma^{2}_{X}=n ( \frac{R}{N}) (1 − \frac{R}{N}) (\frac{N − n}{N − 1})$
- If $X \sim H(N, R, n)$, then
Comparison with the Binomial Distribution
Each sampled item being returned to the population after it is drawn. Then the sampled items
result from a sequence of independent Bernoulli trials, binomial distribution $Bin(n, \frac{R}{N})$ is a good approximation to the hypergeometric distribution $H(N, R, n)$.
- the difference between sampling with and without replacement is slight, and the binomial distribution $Bin(n, \frac{R}{N})$ is a good approximation to the hypergeometric distribution $H(N, R, n)$.
A lot of $20$ items contains $6$ that are defective, and that $5$ items are sampled from this lot at random, what is the probability that exactly exactly $2$ defectives?