Statistical Foundations of Predictive Modeling in Biostatistics
- Course Number
- CHL5231H
- Series
- 5200 (Biostatistics)
- Format
- Lecture
- Course Instructor(s)
- Rafal Kustra
Course Description
This course will introduce the foundational concepts in predictive modeling with emphasis on blending interpretability and generalizability which are important in health sciences applications. The classical linear models, such as ordinary least squares and logistic regressions will be recast as supervised learning tools for regression and classification tasks, respectively, and then extended through common techniques of basis expansion and regularization. A second part of the course will focus on many aspects of estimating prediction performance of any method. The course will blend theoretical framework and simulation-based computational approaches to enhance the understanding of issues such as bias-variance trade-off, overfitting, generalization performance to prepare students to critically appraise the use of, and effectively deploy simple and more complex supervised learning techniques. The course will also serve as a preparation for studying more complex methods, such as in CHL5229H course for which this course is meant to be a prerequisite.
Course Objectives
At the end of the course the students should be able to understand the steps needed to formulate, execute, and evaluate a supervised learning task using simple, classical methods in statistics, and to enhance such methods using basis expansion and regularization techniques. The students will also be aware of how to define and properly assess generalizability of a supervised learning method and how to prevent overfiting (or under-fitting) from diminishing the predictive performance of a chosen method. Some familiarity with R is required but the lecture notes will provide ample opportunity to enhance student’s ability to execute necessary modeling, simulation and predictive assessments tasks. If needed optional tutorial sessions may be offered as well to help with more advanced topics in R.
Methods of Assessment
Assignment 1 (roughly 5 questions with a mixture of mathematical/methodological questions and applied/computational questions) | 30% |
Midterm | 30% |
Assignment 2 (roughly 5 questions with a mixture of mathematical/methodological questions and applied/computational questions) | 30% |
Participation | 10% |