This notebook demonstrates how to plot data and fit a linear regression.

Linear Regression

Step 1: Download data for the “Introduction to Statistical Learning” (you may need to do this manually on non-linux operating systems)

cd /tmp
wget http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## --2017-01-25 13:34:12--  http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## Resolving www-bcf.usc.edu (www-bcf.usc.edu)... 68.181.201.24
## Connecting to www-bcf.usc.edu (www-bcf.usc.edu)|68.181.201.24|:80... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 5166 (5.0K) [text/csv]
## Saving to: ‘Advertising.csv.12’
## 
##      0K .....                                                 100% 26.9K=0.2s
## 
## 2017-01-25 13:34:14 (26.9 KB/s) - ‘Advertising.csv.12’ saved [5166/5166]

Step 2: Load the dataset from the CSV

ads <- read.csv("/tmp/Advertising.csv")

Step 3: Fit a linear regression model

lm.fit = lm(Sales ~ TV, data=ads)
summary(lm.fit)
## 
## Call:
## lm(formula = Sales ~ TV, data = ads)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Step 4: Plot the fit

#pdf("/tmp/sales_tv_reg.pdf",7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")
abline(lm.fit)

#dev.off()

Step 5: Plot advanced properties of the fit:

plot(lm.fit)

Residuals for Advertising Sales

Now lets look at the regression of the sales as a function of the TV, Radio, and Newspaper advertising.

TV

#pdf("/tmp/sales_tv.pdf", 7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")
lm.fit <- lm(Sales ~ TV, data=ads)
abline(lm.fit)

#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.611875"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$TV)^2)
## [1] "Correlation^2: 0.611875"

Radio

#pdf("/tmp/sales_radio.pdf", 7,5)
plot(ads$Radio, ads$Sales,col='red',pch=20,xlab = "Radio", ylab = "Sales")
lm.fit <- lm(Sales ~ Radio, data=ads)
abline(lm.fit)

#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.332032"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$Radio)^2)
## [1] "Correlation^2: 0.332032"

Newspaper

#pdf("/tmp/sales_newspaper.pdf", 7,5)
plot(ads$Newspaper, ads$Sales,col='red',pch=20,xlab = "Newspaper", ylab = "Sales")
lm.fit <- lm(Sales ~ Newspaper, data=ads)
abline(lm.fit)

#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.052120"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$Newspaper)^2)
## [1] "Correlation^2: 0.052120"