inference-of-normal-proportions

3 minute read

Published:

This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.

Basic Ideas

  • Confidence Intervals for Proportions

    • Let $p$ represent the proportion of success i.e. success probability in the population that will meet the specification. We wish to find a $95\%$ confidence interval for $p$.

      • Let $X$ be the number of successes in $n$ independent Bernoulli trials with success probability $p$, so that $X ∼ Bin(n, p)$. The estimate for $p$ is $ \hat p = \frac{X}{n}$.

      • The uncertainty, or standard deviation of $ \hat p$, is $ \hat \sigma_p = \sqrt { \frac {p(1 − p)}{n}}$.

      • Since the sample size is large, it follows from the Central Limit Theorem

        ​ $ \hat p ∼ N (p, \sqrt {\frac{p(1 − p)} {n}})$

      • The reasoning shows that for $95\%$ of all possible samples, the population proportion $p$ satisfies the following inequality:

        ​ $\hat p − 1.96 \sqrt{\frac{ p(1 − p)} {n}} < p < \hat p + 1.96 \sqrt{\frac{ p(1 − p)} {n}} $

      • The limits $ \hat p \pm 1.96 \sqrt{\frac{ p(1 − p)} {n}} $ contain the unknown $p$, and so cannot be computed.

      • Define $ \hat{n} = n + 4$, and $ \overline{p} = \frac {X+ 2}{\hat n}$. Then a level $100(1 − \alpha)\%$ confidence interval for $p$ is $ \overline{p} \pm z_\frac{\alpha}{2} \sqrt{\frac{ \overline{p} (1 − \overline{p} )} {\hat n}}$. If the lower limit is less than $0$, replace it with $0$. If the upper limit is greater than $1$, replace it with $1$.
      • We note that even for large samples, the distribution of $X$ is only approximately normal, rather than exactly normal. Therefore the levels stated for confidence intervals are approximate.
    • Example: A sample of $144$ microdrills is tested, and $120$, or $ 83.3\%$​, meet this specification. In this example, $X = 120$, so $ \hat{p} = \frac{120} {144} = 0.833$.

      • $ \hat n = 148$ and $ \hat p = 122∕148 = 0.8243$, so the $95\%$ confidence interval is $0.8243 \pm 0.0613$, or $(0.763, 0.886)$.
  • The Traditional Method for Computing Confidence Intervals for a Proportion (widely used but not recommended). Let $ \hat p$ be the proportion of successes in a large number $n$ of independent Bernoulli trials with success probability $p$. Then the traditional level $100(1−𝛼)\%$ confidence interval for $p$ is $\hat p \pm z_\frac{\alpha}{2} \sqrt{\frac{ \hat p(1 − \hat p)} { n}}$

  • Many people still use a more traditional method. The traditional method uses the actual sample size $n$ in place of $ \hat n$

  • Although this method is still widely used, it fails to achieve its stated coverage probability even for some fairly large values of $n$.

  • The traditional method cannot be used for small samples at all;

  • The method cannot be used unless the sample contains at least $10$ successes and

    $10$ failures.

  • How large a sample is needed to guarantee that the width of the $95\%$ confidence interval will be no greater than $ \pm 0.08$, if no preliminary sample has been taken?