Inference for regression coefficients

3 minute read

Published:

This post covers Introduction to probability from Statistics for Engineers and Scientists by William Navidi.

Basic Idea

  • Uncertainties in the Least-Squares Coefficients

    • $\epsilon_i$ can be affected by errors in measuring the length of the spring, errors in measuring the weights of the loads placed on the spring, variations in the elasticity of the spring due to changes in ambient temperature or metal fatigue, and so on.

    • If there were no error, the points would lie exactly on the least-squares line, and the slope. $ \hat \beta_1$ and intercept $ \hat \beta_0$ of the least-squares line would equal the true values $ \beta_0$ and $ \beta_1$.

    • the errors $\epsilon_i$ create uncertainty in the estimates $ \hat \beta_0$ and $ \hat \beta_1$.

    • Assume we have $n$ data points $(x_1, y_1),…, (x_n, y_n)$, and we plan to fit the leastsquares line. In order for the estimates $ \hat \beta_1$ and $ \hat \beta_0$ to be useful, we need to estimate just how large their uncertainties are.

    • Therefore, in order to estimate the uncertainties in $ \hat \beta_0$ and $ \hat \beta_1$, we must first estimate the error variance $\sigma^2$.

    • The spread of the points around the line can be measured by the sum of

      the squared residuals $ \Sigma^n_{i=1} e^2_i$

    • The estimate of the error variance $ \sigma^2$ is the quantity $s^2$ given by

      \[\begin{align*} s^2 = \frac {\Sigma_n^{i =1} e^2_i}{ n − 2} = \frac {\Sigma_n^{i=1}(y_i - \hat y_i)^2}{n − 2}\\ \end{align*}\] \[s^2 = \frac {(1 − r^2)\Sigma^n_{i=1}(y_i − \hat y)^2}{ n − 2}\\\]
      • The estimate of the error variance is thus the average of the squared residuals

      • The least-squares line minimizes the sum $ \Sigma^n_{i=1} e^2_i$, the residuals tend to be a little smaller than the errors $ \epsilon_i$.

      • It turns out that dividing by $n − 2$ rather than $n$ appropriately compensates

      • Both $ \hat \beta_0$ and $ \hat \beta_1$ can be expressed as linear combinations of the $y_i$,

\[\begin{align*} \hat \beta_1 = \Sigma_{i=1}^{n}[ \frac {(x_i − x)}{\Sigma^n_{i=1}(xi − x)^2}]y_i \end{align*}\] \[\hat \beta_0 = \Sigma^{i=1}_n[\frac{1}{n}− \frac {x(xi − x)}{\Sigma_n^{i=1}(x_i − x)^2}]y_i\]
  • Using the fact that each of the $y_i$ has mean $\beta_0 + \beta_1x_i$ and variance $\sigma^2$, Equations (2.51)and (2.55) yield the following results, after further manipulation:
\[\hat \mu_{\beta_0} = \beta_0 \\ \hat \mu_{\beta_1}= \beta_1\] \[\hat \sigma_{\beta_0}= \sigma \sqrt{ \frac{1}{n} + \frac{ \overline x^2}{ \Sigma^n_{i=1}(x_i − \overline x)^2}}\\ \hat \sigma_{\beta_1} = \frac {\sigma}{ \sqrt{ \Sigma^n_{i=1}(x_i − \overline x)^2}}\\\]
  • Inferences on the Slope and Intercept

    • Given a scatterplot with points $(x_1, y_1),…, (x_n, y_n)$, we can compute the slope $ \hat \beta_1$ and

      intercept $ \hat \beta_0$ of the least-squares line.

    • Confidence intervals for $\beta_0$ and $\beta_1$ can be derived in exactly the same way as the Student’s t based confidence interval for a population mean. Let $t_{n−2,\alpha∕2}$ denote the point

    • on the Student’s t curve with $n−2$ degrees of freedom that cuts off an area of $\alpha∕2$ in the right-hand tail.

    • Level $100(1 − \alpha)\%$ confidence intervals for $\beta_0$ and $ \beta_1$ are given by

​ $ \beta_0 \pm t_{n−2,\alpha∕2} ⋅ \hat s_{\beta_0}$

​ $ \beta_1 \pm t_{n−2,\alpha∕2} ⋅ \hat s_{\beta_1}$

\[\hat s_{\beta_0}= s \sqrt{ \frac{1}{n} + \frac{ \overline x^2}{ \Sigma^n_{i=1}(x_i − \overline x)^2}}\\ \hat s_{\beta_1} = \frac {s}{ \sqrt{ \Sigma^n_{i=1}(x_i − \overline x)^2}}\\\]
  • Find a 95% confidence interval for the spring constant in the Hooke’s law data. We have previously computed $ \hat \beta_1 = 0.2046$ and $\hat s_{\beta_1} = 0.0111$

  • The manufacturer of the spring in the Hooke’s law data claims that the spring constant $\beta_1$ is at least $0.215$ in./lb. We have estimated the spring constant to be $ \hat \beta_1 = 0.2046$ in./lb. Can we conclude that the manufacturer’s claim is false?