M Checking Regression Assumptions

by Doc P, 10 Jun 2020



Reference: Marin Video 5.2

WHile we will not be doing much with this information, there are several easy tests we can perform to deteremine whether a linear regression is a proper approach for eplaining a relationship.

First, Import and attach the LungCapData set.

Next, run the code we used in the last Crib Sheet to generate the regression analysis.

plot(Age, LungCap)
mod<-lm(LungCap ~Age) 
summary(mod) 
## 
## Call:
## lm(formula = LungCap ~ Age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7799 -1.0203 -0.0005  0.9789  4.2650 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.14686    0.18353   6.249 7.06e-10 ***
## Age          0.54485    0.01416  38.476  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.526 on 723 degrees of freedom
## Multiple R-squared:  0.6719, Adjusted R-squared:  0.6714 
## F-statistic:  1480 on 1 and 723 DF,  p-value: < 2.2e-16
abline(mod)

At this point “plot(mod)” will produce the diagnostic plots, hit return to cycle through the plots.

plot(mod)

1st plot is a residual plot - the red line should be straight and residuals randomly distributed.

2nd plot is a Q-Q plot (Quantile - Quantile) and will be a diagonal straight line if residuals are normally distributed.

3rd and 4th identify other problems with which we are not concerned.

Non constant variance will show up in the plots with a “megaphone” shape, while non-linearity will show up with a curved line on the diagnostic plots. Our plots look good, suggesting that a linear regression analysis is a reasonable approach for these data.