CS750 / CS850: Machine Learning
Please see the main class website for detailed up-to-date information.
When and Where
Mon & Wed, 11:10 am - 12:40 pm in Spaulding Life Sciences Hall G26 Fri: 11:00 am (recitation)
Class Content
The goal of this class is to teach you how to use machine learning to understand data and make predictions in practice. The class will cover the fundamental concepts and algorithms in machine learning and data science as well as a wide variety of practical algorithms. The main topics we will cover are:
- The maximum likelihood principle
- Regression: Linear regression
- Classification: Logistic regression and linear discriminant analysis
- Cross-validation, bootstrap, and over-fitting
- Model selection: Regularization, Lasso
- Nonlinear models: Decision trees, Support vector machines
- Unsupervised: Principal component analysis, k-means
- Advanced topics: Bayes nets and deep learning
The graduate version of the class will cover the same topics in greater depth.
Programming Language
The class will involve hand-on data analysis using machine learning methods. The recommended language for programming assignments is R which is an excellent tool for statistical analysis and machine learning. No prior knowledge of R is needed or expected; the book and lecture will cover a gentle introduction to the language. Experienced students may also choose other alternatives, such as Python or Matlab.
We recommend using the free R Studio for completing programming assignments. R Notebooks are very convenient for producing reproducible reports and we encourage you to use them. Jupyter is a similar alternative for Python.
Pre-requisites
Basic programming skills (scripting languages like Python are OK) and some familiarity with statistics and calculus. If in doubt, please email me.
See class overview for more information on textbooks, syllabus, assignments, office hours, and grading.
Textbooks
Main Reference:
ISL: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning