This notebook demonstrates how to plot data.
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing code chunks below by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
LaTeX math notation also works: \(Y \approx \beta_0 + \beta_1 \times X\). Or if you want to have an equation on a line all by itself: \[ Y \approx \beta_0 + \beta_1 \times X \]
Step 1: Download data for the “Introduction to Statistical Learning” (you may need to do this manually on non-linux operating systems)
cd /tmp
wget http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## --2017-01-25 13:34:19-- http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## Resolving www-bcf.usc.edu (www-bcf.usc.edu)... 68.181.201.24
## Connecting to www-bcf.usc.edu (www-bcf.usc.edu)|68.181.201.24|:80... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 5166 (5.0K) [text/csv]
## Saving to: ‘Advertising.csv.13’
##
## 0K ..... 100% 26.7K=0.2s
##
## 2017-01-25 13:34:20 (26.7 KB/s) - ‘Advertising.csv.13’ saved [5166/5166]
Step 2: Load the dataset from the CSV
ads <- read.csv("/tmp/Advertising.csv")
Step 3: Summarize the data
summary(ads)
## X TV Radio Newspaper
## Min. : 1.00 Min. : 0.70 Min. : 0.000 Min. : 0.30
## 1st Qu.: 50.75 1st Qu.: 74.38 1st Qu.: 9.975 1st Qu.: 12.75
## Median :100.50 Median :149.75 Median :22.900 Median : 25.75
## Mean :100.50 Mean :147.04 Mean :23.264 Mean : 30.55
## 3rd Qu.:150.25 3rd Qu.:218.82 3rd Qu.:36.525 3rd Qu.: 45.10
## Max. :200.00 Max. :296.40 Max. :49.600 Max. :114.00
## Sales
## Min. : 1.60
## 1st Qu.:10.38
## Median :12.90
## Mean :14.02
## 3rd Qu.:17.40
## Max. :27.00
Step 5: Plot the dataset
#pdf("/tmp/sales_tv.pdf",7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")
#dev.off()
Step 6: Or prettier plots can be generated with ggplot2
if("ggplot2" %in% rownames(installed.packages()) == FALSE) {install.packages("ggplot2")}
library(ggplot2)
ggplot2::qplot(TV, Sales,data=ads,xlab="TV",ylab="Sales")