This notebook demonstrates how to plot data and fit a linear regression.
Step 1: Download data for the “Introduction to Statistical Learning” (you may need to do this manually on non-linux operating systems)
cd /tmp
wget http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## --2017-01-25 13:34:12-- http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## Resolving www-bcf.usc.edu (www-bcf.usc.edu)... 68.181.201.24
## Connecting to www-bcf.usc.edu (www-bcf.usc.edu)|68.181.201.24|:80... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 5166 (5.0K) [text/csv]
## Saving to: ‘Advertising.csv.12’
##
## 0K ..... 100% 26.9K=0.2s
##
## 2017-01-25 13:34:14 (26.9 KB/s) - ‘Advertising.csv.12’ saved [5166/5166]
Step 2: Load the dataset from the CSV
ads <- read.csv("/tmp/Advertising.csv")
Step 3: Fit a linear regression model
lm.fit = lm(Sales ~ TV, data=ads)
summary(lm.fit)
##
## Call:
## lm(formula = Sales ~ TV, data = ads)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3860 -1.9545 -0.1913 2.0671 7.2124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.032594 0.457843 15.36 <2e-16 ***
## TV 0.047537 0.002691 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
Step 4: Plot the fit
#pdf("/tmp/sales_tv_reg.pdf",7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")
abline(lm.fit)
#dev.off()
Step 5: Plot advanced properties of the fit:
plot(lm.fit)
Now lets look at the regression of the sales as a function of the TV, Radio, and Newspaper advertising.
#pdf("/tmp/sales_tv.pdf", 7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")
lm.fit <- lm(Sales ~ TV, data=ads)
abline(lm.fit)
#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.611875"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$TV)^2)
## [1] "Correlation^2: 0.611875"
#pdf("/tmp/sales_radio.pdf", 7,5)
plot(ads$Radio, ads$Sales,col='red',pch=20,xlab = "Radio", ylab = "Sales")
lm.fit <- lm(Sales ~ Radio, data=ads)
abline(lm.fit)
#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.332032"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$Radio)^2)
## [1] "Correlation^2: 0.332032"
#pdf("/tmp/sales_newspaper.pdf", 7,5)
plot(ads$Newspaper, ads$Sales,col='red',pch=20,xlab = "Newspaper", ylab = "Sales")
lm.fit <- lm(Sales ~ Newspaper, data=ads)
abline(lm.fit)
#dev.off()
sprintf('R-squared: %f', summary(lm.fit)$r.squared)
## [1] "R-squared: 0.052120"
sprintf('Correlation^2: %f', cor(ads$Sales,ads$Newspaper)^2)
## [1] "Correlation^2: 0.052120"