Hypergeometric Distribution

2 minute read

Published:

This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.

Basic Ideas

  • The Hypergeometric Distribution

    • When a finite population contains two types of items. As each item is selected, the proportion of successes in the remaining population decreases or increases, depending on whether the sampled

      item was a success or a failure. For this reason the trials are not independent, so the number of successes in the sample does not follow a binomial distribution

    • The distribution that properly describes the number of successes in this situation is

    called the hypergeometric distribution

    • Assume a finite population contains $N$ items, of which $R$ are classified as successes and $N−R$ are classified as failures. Assume that n items are sampled from this population, and let $X$ represent the number of successes in the sample. Then X has the hypergeometric distribution with parameters $N$, $R$, and $n$, which can be denoted $X \sim H(N, R, n)$.

    • The probability mass function of $X$ is

    • \[p(x) = P(X = x) = \left\{ \begin{array}{ll} \frac{ \binom Rx \binom (N-R)(n-x) }{ \binom Nn } & \mbox{if}~ max(0, R + n − N) \leq x \leq min(n, R)\\ 0 & otherwise \end{array} \right.\]
    • Of $50$ buildings in an industrial park, $12$ have electrical code violations. If $10$ buildings

      are selected at random for inspection, what is the probability that exactly $3$ of

      the $10$ have code violations?

  • Mean and Variance of the Hypergeometric Distribution

    • If $X \sim H(N, R, n)$, then
      • $𝜇_{X} = \frac{nR} {N}$
      • $\sigma^{2}_{X}=n ( \frac{R}{N}) (1 − \frac{R}{N}) (\frac{N − n}{N − 1})$
  • Comparison with the Binomial Distribution

    • Each sampled item being returned to the population after it is drawn. Then the sampled items

      result from a sequence of independent Bernoulli trials, binomial distribution $Bin(n, \frac{R}{N})$ is a good approximation to the hypergeometric distribution $H(N, R, n)$.

      • the difference between sampling with and without replacement is slight, and the binomial distribution $Bin(n, \frac{R}{N})$ is a good approximation to the hypergeometric distribution $H(N, R, n)$.
  • A lot of $20$ items contains $6$ that are defective, and that $5$ items are sampled from this lot at random, what is the probability that exactly exactly $2$ defectives?