Search Close Search
Search Close Search
Page Menu

An Empirical Introduction to Statistical Modeling

Course: BBS706 - An Empirical Introduction to Statistical Modeling
Professor: Manuel Garber, manuel.garber@umassmed.edu
Semester Offered: Fall 2021, Fall 2023

Last Taught: Fall 2019

Syllabus: Lecture and textbook based course on statistical modeling and machine learning, with exercises on analyzing real data.

Course Summary and Objectives: This course covers the most common approaches to modeling high dimensional data. We begin with a brief introduction to linear algebra and methods that heavily rely on linear algebra—clustering and dimensionality reduction. We then focus on regression (linear, non-linear and logistic) models as well as non-linear classification (support vector machines, neural networks). The goal is twofold: i) To understand both conceptually and mathematically, how and why the approach works and ii) to be able to apply the technique to a real dataset.

Methodology: Students will present take turns to lead the discussion based on the book chapter scheduled for the week. The course will include an experimental dataset that will be the bases for applying and comparing different modeling and classification methods. Following the completion of the method theory we will discuss its applicability to the datasets in the book and finally to the course dataset. Because of the emphasis on student leading discussions no audits will be allowed, only students taking the course for credit will be able to participate in class.

Course Topics:

  • Introduction to statistical Modeling
    • What can it be used for
    • Examples
  • Mathematical background
    • Linear algebra
  • Data pre-processing
  • Linear Regression Models
    • Linear Regression
    • Penalized Linear Regression
  • Non-linear Regression Models
    • Support Vector Machines
    • Neural Networks
  • Regression trees
    • Regression Trees
    • Random Forests
  • Discriminant analysis
    • Logistic Regression
    • Linear Discriminant Analysis
  • Nonlinear classifiers
    • Support Vector Machines
    • K-nearest neighbors
  • Classification Trees
  • Filter Methods
  • Consequences of Non-informative features
  • Feature reduction methods

Course Materials

Textbook: Applied Predictive Modeling, Max Kuhn and Kjell Johnson, Springer, 2013

Suggested reading: An introduction to Statistical Learning with applications in R, Garret James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 2013

Evaluation

Following each section, students will be expected to apply the method cover to the class dataset. Course evaluation will be based on three parameters:

  1. Their application of the method to the dataset
  2. Their discussion leads
  3. Their class intervention