This notebook demonstrates how to plot data.

R Markdown

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing code chunks below by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

LaTeX math notation also works: \(Y \approx \beta_0 + \beta_1 \times X\). Or if you want to have an equation on a line all by itself: \[ Y \approx \beta_0 + \beta_1 \times X \]

Plotting

Step 1: Download data for the “Introduction to Statistical Learning” (you may need to do this manually on non-linux operating systems)

cd /tmp
wget http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## --2017-01-25 13:34:19--  http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
## Resolving www-bcf.usc.edu (www-bcf.usc.edu)... 68.181.201.24
## Connecting to www-bcf.usc.edu (www-bcf.usc.edu)|68.181.201.24|:80... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 5166 (5.0K) [text/csv]
## Saving to: ‘Advertising.csv.13’
## 
##      0K .....                                                 100% 26.7K=0.2s
## 
## 2017-01-25 13:34:20 (26.7 KB/s) - ‘Advertising.csv.13’ saved [5166/5166]

Step 2: Load the dataset from the CSV

ads <- read.csv("/tmp/Advertising.csv")

Step 3: Summarize the data

summary(ads)
##        X                TV             Radio          Newspaper     
##  Min.   :  1.00   Min.   :  0.70   Min.   : 0.000   Min.   :  0.30  
##  1st Qu.: 50.75   1st Qu.: 74.38   1st Qu.: 9.975   1st Qu.: 12.75  
##  Median :100.50   Median :149.75   Median :22.900   Median : 25.75  
##  Mean   :100.50   Mean   :147.04   Mean   :23.264   Mean   : 30.55  
##  3rd Qu.:150.25   3rd Qu.:218.82   3rd Qu.:36.525   3rd Qu.: 45.10  
##  Max.   :200.00   Max.   :296.40   Max.   :49.600   Max.   :114.00  
##      Sales      
##  Min.   : 1.60  
##  1st Qu.:10.38  
##  Median :12.90  
##  Mean   :14.02  
##  3rd Qu.:17.40  
##  Max.   :27.00

Step 5: Plot the dataset

#pdf("/tmp/sales_tv.pdf",7,5)
plot(ads$TV, ads$Sales,col='red',pch=20,xlab = "TV", ylab = "Sales")

#dev.off()

Step 6: Or prettier plots can be generated with ggplot2

if("ggplot2" %in% rownames(installed.packages()) == FALSE) {install.packages("ggplot2")}
library(ggplot2)
ggplot2::qplot(TV, Sales,data=ads,xlab="TV",ylab="Sales")