CS780 / CS880: Introduction to Machine Learning

When and Where

Tue & Thu, 12:40 pm - 2:00 pm Kingsbury N133

See class overview for more information on textbooks, syllabus, assignments, office hours, and grading.

Assignments

Please use Piazza for questions about assignments.

Assignment Due Date
Assignment 1 2/14/17 at 12:40PM
Assignment 2 2/21/17 at 12:40PM
Assignment 3 3/09/17 at 12:40PM
Assignment 4 4/06/17 at 12:40PM
Assignment 5 4/20/17 at 12:40PM

Syllabus

Date Slides Reading Notebooks
126 Statistical learning ISL 1,2 (html) (RMD)
131 Linear regression I ISL 3.1-2 (html) (RMD)
202 No class
207 Linear regression II ISL 3.3-6
209 No class
214 Logistic regression ISL 4.1-3 (html)(RMD)
216 LDA, QDA, Bayes ISL 4.4-6
221 Cross-validation ISL 5
223 Model selection ISL 6.1-6.2
228 Dimensionality ISL 6.3-6.4
32 PCA ML/MAP ISL 10.1-2 ML PCA
36 Clustering and EM ISL 10.3-5 kmeans
39 Midterm Review ISL 1-6, 10
321 ** Midterm **
323 Linear algebra LAO 1.1-2,2,3
328 LA in ML LAR linear algebra
330 LA in ML LAR linear algebra
404 SVM ISL 9
406 Decision trees and boosting ISL 8
411 Nonlinear methods ISL 7
413 Recommender systems
418 Bayes nets MLP 10
420 Reinforcement learning RL
425 Final exam review
427 Project presentations (Graduate)
502 Deep learning and big data DL
504 Project presentations (Undergraduate)

Project

See the project overview for details on the details of deliverables. The deliverable are due by the end of the day (midnight).

Date Deliverable Page Limit
224 Project description and data sources 1
307 Evaluation methodology 1
323 Method and literature overview 2
406 Preliminary results 3
427 Final report 7

Exams

See practice questions for questions you should be able to answer to be ready for the midterm and final exams.

Date Exam
321 Midterm (take home)

Textbooks

Main reference:

ISL: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning

More in-depth material:

ESL: Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics (2nd ed.)

See class overview for more information on the textbook.

Class Content

The goal of this class is to teach you how to use machine learning to understand data and make predictions in practice. The class will cover the fundamental concepts and algorithms in machine learning and data science as well as a wide variety of practical algorithms. The main topics we will cover are:

  1. The maximum likelihood principle
  2. Regression: Linear regression
  3. Classification: Logistic regression and linear discriminant analysis
  4. Cross-validation, bootstrap, and over-fitting
  5. Model selection: Regularization, Lasso
  6. Nonlinear models: Decision trees, Support vector machines
  7. Unsupervised: Principal component analysis, k-means
  8. Advanced topics: Bayes nets and deep learning

The graduate version of the class will cover the same topics in greater depth.

Programming Language

The class will involve hand-on data analysis using machine learning methods. The recommended language for programming assignments is R which is an excellent tool for statistical analysis and machine learning. No prior knowledge of R is needed or expected; the book and lecture will cover a gentle introduction to the language. Experienced students may also choose other alternatives, such as Python or Matlab.

Pre-requisites

Basic programming skills (scripting languages like Python are OK) and some familiarity with statistics and calculus. If in doubt, please email me.