inference-of-normal-proportions
Published:
This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.
Basic Ideas
Confidence Intervals for Proportions
Let $p$ represent the proportion of success i.e. success probability in the population that will meet the specification. We wish to find a $95\%$ confidence interval for $p$.
Let $X$ be the number of successes in $n$ independent Bernoulli trials with success probability $p$, so that $X ∼ Bin(n, p)$. The estimate for $p$ is $ \hat p = \frac{X}{n}$.
The uncertainty, or standard deviation of $ \hat p$, is $ \hat \sigma_p = \sqrt { \frac {p(1 − p)}{n}}$.
Since the sample size is large, it follows from the Central Limit Theorem
$ \hat p ∼ N (p, \sqrt {\frac{p(1 − p)} {n}})$
The reasoning shows that for $95\%$ of all possible samples, the population proportion $p$ satisfies the following inequality:
$\hat p − 1.96 \sqrt{\frac{ p(1 − p)} {n}} < p < \hat p + 1.96 \sqrt{\frac{ p(1 − p)} {n}} $
The limits $ \hat p \pm 1.96 \sqrt{\frac{ p(1 − p)} {n}} $ contain the unknown $p$, and so cannot be computed.
- Define $ \hat{n} = n + 4$, and $ \overline{p} = \frac {X+ 2}{\hat n}$. Then a level $100(1 − \alpha)\%$ confidence interval for $p$ is $ \overline{p} \pm z_\frac{\alpha}{2} \sqrt{\frac{ \overline{p} (1 − \overline{p} )} {\hat n}}$. If the lower limit is less than $0$, replace it with $0$. If the upper limit is greater than $1$, replace it with $1$.
- We note that even for large samples, the distribution of $X$ is only approximately normal, rather than exactly normal. Therefore the levels stated for confidence intervals are approximate.
Example: A sample of $144$ microdrills is tested, and $120$, or $ 83.3\%$, meet this specification. In this example, $X = 120$, so $ \hat{p} = \frac{120} {144} = 0.833$.
- $ \hat n = 148$ and $ \hat p = 122∕148 = 0.8243$, so the $95\%$ confidence interval is $0.8243 \pm 0.0613$, or $(0.763, 0.886)$.
The Traditional Method for Computing Confidence Intervals for a Proportion (widely used but not recommended). Let $ \hat p$ be the proportion of successes in a large number $n$ of independent Bernoulli trials with success probability $p$. Then the traditional level $100(1−𝛼)\%$ confidence interval for $p$ is $\hat p \pm z_\frac{\alpha}{2} \sqrt{\frac{ \hat p(1 − \hat p)} { n}}$
Many people still use a more traditional method. The traditional method uses the actual sample size $n$ in place of $ \hat n$
Although this method is still widely used, it fails to achieve its stated coverage probability even for some fairly large values of $n$.
The traditional method cannot be used for small samples at all;
The method cannot be used unless the sample contains at least $10$ successes and
$10$ failures.
How large a sample is needed to guarantee that the width of the $95\%$ confidence interval will be no greater than $ \pm 0.08$, if no preliminary sample has been taken?