The following is a list of electives that students in the Biomedical Informatics program have taken in the past. These courses are not guaranteed to run every year and serve only as examples of suitable elective courses. Students should choose electives with the consultation of the Program Director.
Harvard Business School
At the root of the transformation occurring in the health care industry-both in the United States and internationally-is the fundamental challenge of improving clinical outcomes while controlling costs. Addressing this challenge will require dramatic improvements in the processes by which care is delivered to patients. This will, in turn, involve changing the organization of delivery, developing new approaches to performance measurement, and reimagining the ways in which providers are paid. This course will equip students with the tools required to design and implement these improvements.
Harvard CatalystThe Harvard Clinical and Translational Science Center
*40 week course spanning Fall to Spring
The Harvard Catalyst Postgraduate Education Program in Clinical & Translational Science provides training to clinical investigators through a range of educational offerings. This course is part of the advanced curriculum and is designed for independent researchers.
This course offers a comprehensive introduction to biostatistics in medical research. The course includes a review of the most common techniques in the field, as well as the manner in which these techniques are applied in standard statistical software. At the conclusion of the course, participants will be able to choose an appropriate study design, calculate the sample size needed to complete a study, analyze the collected data, and communicate the results from their experiment.
Harvard Faculty of Arts and Sciences
In-depth study of genomics: models of evolution and population genetics; comparative genomics: analysis and comparison; structural genomics: protein structure, evolution and interactions; functional genomics, gene expression, structure and dynamics of regulatory networks.
Usability and design as keys to successful technology. Covers user observation techniques, needs assessment, low and high fidelity prototyping, usability testing methods, as well as theory of human perception and performance, and design best practices. Focuses on understanding and applying the lessons of human interaction to the design of usable systems; will also look at lessons to be learned from less usable systems. The course includes several small and one large project.
Artificial Intelligence (AI) is an exciting field that has enabled a wide range of cutting-edge technology, from driverless cars to grandmaster-beating Go programs. The goal of this course is to introduce the ideas and techniques underlying the design of intelligent computer systems. Topics covered in this course are broadly be divided into 1) planning and search algorithms, 2) probabilistic reasoning and representations, and 3) machine learning (although, as you will see, it is impossible to separate these ideas so neatly). Within each area, the course will also present practical AI algorithms being used in the wild and, in some cases, explore the relationship to state-of-the-art techniques. The class will include lectures connecting the models and algorithms we discuss to applications in robotics, computer vision, and speech processing. Recommended prep: CS 51; Stat 110 (may be taken concurrently).
Special topics course. Focus of course changes year to year.
Surveys biologically-inspired approaches to designing distributed systems. Focus is on biological models, algorithms, and programming paradigms for self-organization. Topics vary year to year, and usually include: (1) swarm intelligence: social insects and animal groups, with applications to networking and robotics, (2) cellular computing: including cellular automata/amorphous computing, and applications like self-assembling robots and programmable materials, (3) evolutionary computation and its application to optimization and design. Recommended Prep: Students should have a familiarity/experience with computer systems (e.g. software, networking) and algorithms/analysis through classes and/or internship experiences. Background in biology not required.
Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis, generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication , summarizing results through visualization, stories, and interpretable summaries. Recommended: Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).
Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.
Focus on translational medicine: the application of basic genetic discoveries to human disease. Each three-hour class will focus on a specific genetic disorder and the approaches currently used to speed the transfer of knowledge from the laboratory to the clinic. Each class will include a clinical discussion, a patient presentation if appropriate, followed by lectures, a detailed discussion of recent laboratory findings and a student led journal club. Lecturers will highlight current molecular, technological, bioinformatic and statistical approaches that are being used to advance the study of human disease. There is no exam. Students will present one paper per session in a journal club style. Attendance and active participation for the duration of all class meetings is required. If you are unable to attend class, or cannot be present for the entire session you are expected to contact the course instructor. Two incomplete or missed sessions will result in a failing grade.
This course is designed to follow CS 181 and go into further depth on the statistical aspects of supervised learning: given what we know about our data and where it came from, how can we choose the machine learning method that will predict best on future data? Topics include the ``no free lunch" theorems, linear methods for regression and classification, shrinkage and sparsity, splines and kernel smoothing, model selection and cross-validation, additive models and trees, boosting, bagging and random forests. Recommended prep: Statistics 111 and Computer Science 181
Harvard Medical School
This course will provide a firm foundation for understanding the relationship between molecular biology, developmental biology, genetics, genomics, bioinformatics, and medicine. The goal is to develop explicit connections between basic research, medical understanding, and the perspective of patients. During the course the principles of human genetics will be reviewed. Students will become familiar with the translation of clinical understanding into analysis at the level of the gene, chromosome and molecule, the concepts and techniques of molecular biology and genomics, and the strategies and methods of genetic analysis, including an introduction to bioinformatics. The course will extend beyond basic principles to current research activity in human genetics. Not Currently Offered
Harvard T.H. Chan School of Public Health
Covers basic statistical techniques that are important for analyzing data arising from epidemiology, environmental health and biomedical and other public health-related research. Major topics include descriptive statistics, elements of probability, introduction to estimation and hypothesis testing, nonparametric methods, techniques for categorical data, regression analysis, analysis of variance, and elements of study design. Applications are stressed. Designed as an alternate to BIO200, for students desiring more emphasis on theoretical developments. Background in algebra and calculus strongly recommended.
Topics include model interpretaion, model building, and model assessment for linear regression with continuous outcomes, logistic regression with binary outcomes, and proportional hazards regression with survival time outcomes. Specific topics include regression diagnostics, confounding and effect modification, goodness of fit, data transformations, splines and additive models, ordinal, multinomial, and conditional logistic regression, generalized linear models, overdispersion, Poisson regression for rate outcomes, hazard functions, and missing data. The course will provide students with the skills necessary to perform regression analyses and to critically interpret statistical issues related to regression applications in the public health literature. Prerequisites: ID 201 or BST201 or (BST202 and BST203) or (BST206 and (BST207 or BST208)) or permission of instructor.
Covers research design, sample selection, questionnaire construction, interviewing techniques, the reduction and interpretation of data, and related facets of population survey investigations. Focuses primarily on the application of survey methods to problems of health program planning and evaluation. Treatment of methodology is sufficiently broad to be suitable for students who are concerned with epidemiological, nutritional, or other types of survey research. Formerly BIO212
This course will introduce students involved with clinical research to the practical application of multiple regression analysis. Linear regression, logistic regression and proportional hazards survival models will be covered, as well as general concepts in model selection, goodness-of-fit, and testing procedures. Each lecture will be accompanied by a data analysis using SAS and a classroom discussion of the results. The course will introduce, but will not attempt to develop the underlying likelihood theory. Background in SAS programming ability required.
Designed for individuals interested in the scientific, policy, and management aspects of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, and interpretation of results. Students design a clinical investigation in their own field of interest, write a proposal for it, and critique recently published medical literature. Course Prerequisites: BIO201 or ID200 or ID201 or ID207 or BIO202&203 or BIO206&207 or BIO206&208 or BIO206&209. Formerly BIO214
This course is intended for students who are already very comfortable with fundamental techniques in statistics. The course will cover methods for building and interpreting linear regression models, including statistical assumptions and diagnostics, estimation and testing, and model building techniques. These models will be extended to handle data arising from longitudinal studies employing repeated measurement of subjects over time. Summer/Residential Course Note (Section 1): Lectures will be accompanied by computing exercises using the SAS statistical package. Online Course Note (Section 2): Lectures will be accompanied by computing exercises using the Stata statistical package. Course Prerequisites: EPI522 or BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO501
The goal of this course is to enable scientists and public health professionals who already have an introductory background in biostatistics and clinical trials to acquire the competencies in quantitative skills and systems thinking required to understand and participate in drug development and regulatory review processes. The course illustrates how statistical and quantitative methods are used to transform information into evidence demonstrating the safety, efficacy and effectiveness of drugs and devices over the course the product’s life cycle from a regulatory perspective. Content is delivered using a blended-learning approach involving lectures, web-based media and selected case study examples derived from actual FDA decision-making and regulatory assessments to highlight and describe each phase of the regulatory drug approval process. Case studies will illustrate regulatory science in action and practice and will include content publicly available from the FDA’s website that can be used in conjunction with FDA science-based guidance and decision precedents. Course Prerequisites: ID538 or [(BIO200 or ID200 or BIO201 or BIO202&203 or BIO206&207/8/9) and (EPI200 or EPI201 or EPI208 or EPI505).] Formerly BIO523
This course will provide a basic, yet thorough introduction to the probability theory and mathematical statistics that underlie many of the commonly used techniques in public health research. Topics to be covered include probability distributions (normal, binomial, Poisson), means, variances and expected values, finite sampling distributions, parameter estimation (method of moments, maximum likelihood), confidence intervals, hypothesis testing (likelihood ratio, Wald and score tests). All theoretical material will be motivated with problems from epidemiology, biostatistics, environmental health and other public health areas. This course is aimed towards second year doctoral students in fields other than Biostatistics. Background in algebra and calculus required. Course Prerequisites: BST210 or BST213. Formerly BIO222
Topics will include types of censoring, hazard, survivor, and cumulative hazard functions, Kaplan-Meier and actuarial estimation of the survival distribution, comparison of survival using log rank and other tests, regression models including the Cox proportional hazards model and the accelerated failure time model, adjustment for time-varying covariates, and the use of parametric distributions (exponential, Weibull) in survival analysis. Methods for recurrent survival outcomes and competing risks will also be discussed, as well as design of studies with survival outcomes. Class material will include presentation of statistical methods for estimation and testing along with current software (SAS, Stata) for implementing analyses of survival data. Applications to real data will be emphasized. Course Prerequisite(s): BST210 or BST213 or BST 230, or permission of instructor required. BST 213 may be taken concurrently. Formerly BIO223
This course covers modern methods for the analysis of repeated measures, correlated outcomes and longitudinal data, including the unbalanced and incomplete data sets characteristic of biomedical research. Topics include an introduction to the analysis of correlated data, analysis of response profiles, fitting parametric curves, covariance pattern models, random effects and growth curve models, and generalized linear models for longitudinal data, including generalized estimating equations (GEE) and generalized linear mixed effects models (GLMMs).Course Activities: Homework assignments will focus on data analysis in SAS using PROC GLM, PROC MIXED, PROC GENMOD, and PROC GLIMMIX. Course Note: Lab or section times will be announced at first meeting. Course Prerequisite(s): BIO210 or BIO211 or BIO213 or BIO232. Formerly BIO226
This course introduces students to the diverse statistical methods used throughout the process of statistical genetics, from familial aggregation and segregation studies to linkage scans and association studies. Topics covered include basic principles from population genetics, multipoint and model-free linkage analysis, family-based and population-based association testing, and Genome Wide Association analysis. Instructors use ongoing research into the genetics of respiratory disease, psychiatric disorders and cancer to illustrate basic principles. Weekly homework supplements reading, course lectures, discussion and section. Relevant concepts in genetics and molecular genetics will be reviewed in lectures and labs. The emphasis of the course is fundamental principles and concepts. Course Prerequisites: BST210 (concurrent enrollment allowed)Course Note: There will be a weekly lab section; the time will be scheduled at first meeting. Formerly BIO227
This course is a practical introduction to the Bayesian analysis of biomedical data. It is an intermediate Master’s level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Bayesian computing: Markov Chain Monte Carlo; STAN R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data. Programming and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST210 and BST222, or permission of the instructor. Not currently offered.
Axiomatic foundations of probability, independence, conditional probability, joint distributions, transformations, moment generating functions, characteristic functions, moment inequalities, sampling distributions, modes of convergence and their interrelationships, laws of large numbers, central limit theorem, and stochastic processes.
A fundamental course in statistical inference. Discusses general principles of data reduction: exponential families, sufficiency, ancillarity and completeness. Describes general methods of point and interval parameter estimation and the small and large sample properties of estimators: method of moments, maximum likelihood, unbiased estimation, Rao-Blackwell and Lehmann-Scheffe theorems, information inequality, asymptotic relative efficiency of estimators. Describes general methods of hypothesis testing and optimality properties of tests: Neyman-Pearson theory, likelihood ratio tests, score and Wald tests, uniformly and locally most powerful tests, asymptotic relative efficiency of tests. Course Note: Lab or section time to be announced at first meeting; cross-listed: HSPH student must register for HSPH course. Course Prerequisite(s): BIO230 (concurrent enrollment allowed). Formerly BIO231
Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed. Course Prerequisite(s): Instructor’s Permission. Formerly BIO514
An advanced course in linear models, including both classical theory and methods for high dimensional data. Topics include theory of estimation and hypothesis testing, multiple testing problems and false discovery rates, cross validation and model selection, regularization and the LASSO, principal components and dimensional reduction, and classification methods. Background in matrix algebra and linear regression required. Prerequisite: BST 231 and BST 233, or permission of instructor required. Formerly BIO235
A foundational course in measure theoretic probability. Topics include measure theory, Lebesgue integration, product measure and Fubini’s Theorem, Radon-Nikodym derivatives, conditional probability, conditional expectation, limit theorems on sequences of random stochastic processes, and weak convergence. Course Prerequisites: BST231 or permission from the instructor required. Formerly BIO250
Sequel to BIO 231. Considers several advanced topics in statistical inference. Topics include limit theorems, multivariate delta method, properties of maximum likelihood estimators, saddle point approximations, asymptotic relative efficiency, robust and rank-based procedures, resampling methods, and nonparametric curve estimation. Course Note: Cross-listed, HSPH must register for HSPH course. Course Prerequisites: BIO231 and BIO250, or permission of instructor required. Formerly BIO251
Presents classical and modern approaches to the analysis of multivariate observations, repeated measures, and longitudinal data. Topics include the multivariate normal distribution, Hotelling’s T2, MANOVA, the multivariate linear model, random effects and growth curve models, generalized estimating equations, statistical analysis of multivariate categorical outcomes, and estimation with missing data. Discusses computational issues for both traditional and new methodologies. Course Note: Cross-listed, HSPH student must register for HSPH course. Course Prerequisite: BIO231 and BIO235, or permission of the instructor are required. Formerly BIO245
BST247 is a seminar style course with readings selected from the literature in areas of expertise of the participating faculty. Content may vary from year to year. The specific objectives are (1) To train students to critically read foundational papers and current journal articles in Statistical Genetics, (2) To train students to present sophisticated ideas to an audience of peers, and (3) To prepare students to engage in doctoral level research in the area. After the course, students are expected to have an in-depth and broad understanding on important topics of statistical genetics research. Course Prerequisite(s): BIO227 and (BIO231 or EPI511). BIO231 may be taken concurrently. Formerly BIO257
General principles of the Bayesian approach, prior distributions, hierarchical models and modeling techniques, approximate inference, Markov chain Monte Carlo methods, model assessment and comparison. Bayesian approaches to GLMMs, multiple testing, nonparametrics, clinical trials, survival analysis.
This course is the second course in the foundational sequence of the School’s newly approved Master’s Degree in Health Data Science. The course will build upon our existing course, BST260 Introduction to Data Science, in presenting a set of tools for modeling and understanding complex datasets. Specifically, the course will provide practical regression and tree-based techniques for big data. Specific topics that will be covered include: linear model selection and regularization: LASSO and regularization; principal component regression and partial least squares; tree-based methods: decision trees; bagging, random forests, and boosting; unsupervised learning: principal components analysis, cluster analysis. Programming (Python and R) and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST260 or permission of instructor.
Big data is everywhere, from Omics and Health Policy to Environmental Health. Every single aspect of the Health Sciences is being transformed. However, it is hard to navigate and critically assess tools and techniques in such a fast-moving big data panorama. In this course, we are going to give a critical presentation of theoretical approaches and software implementations of tools to collect, store and process data at scale. The goal is not just to learn recipes to manipulate big data but learn how to reason in terms of big data, from software design and tool selection to implementation, optimization and maintenance.
Many systems of scientific and societal interest consist of a large number of interacting components. The structure of these systems can be represented as networks where network nodes represent the components and network edges the interactions between the components. Network analysis can be used to study how pathogens, behaviors and information spread in social networks, having important implications for our understanding of epidemics and the planning of effective interventions. In a biological context, at a molecular level, network analysis can be applied to gene regulation networks, signal transduction networks, protein interaction networks, and more. This introductory course covers some basic network measures, models, and processes that unfold on networks. The covered material applies to a wide range of networks, but we will focus on social and biological networks. To analyze and model networks, we will learn the basics of the Python programming language and its Network X module. The course contains a number of hands-on computer lab sessions. There are five homework assignments and four reading assignments that will be discussed in class. In addition, each student will complete a final project that applies network analysis techniques to study a public health problem. Course Prerequisites: BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO521
This course is an introduction to modern statistical computing techniques used to characterize and interpret cancer genome sequencing datasets. This Master’s level course will begin with a basic introduction to DNA, genes, and genomes for students with no biology background. It will then introduce cancer as an evolutionary process and review landmarks in the history of cancer genetics, and discuss the basics of sequencing technology and modern Next Generation Sequencing. The course will cover the main steps involved in turning billions of short sequencing reads into a representation of the somatic genetic alterations characterizing an individual patient’s cancer, and will build on this foundation to study topics related to identifying mutations under positive selection from multiple tumors sampled in a population. By the end of the course, students will be able to apply state-of-the art analysis to cancer genome datasets and to critically evaluate papers employing cancer genome data.
Epigenetics is a fast growing field, with increasing applicability in environmental and epidemiology studies, focusing on the alterations in chromatin structure that can stably and heritably influence gene expression. Epigenetic changes can be as profound as those exerted by mutation, but, unlike mutations, are reversible and responsive to environmental influences. The course will focus on epigenetic mechanisms and laboratory methods for DNA methylamine, his tone modifications, small non-coding RNAs, and epigenomics. Ongoing experimental, and epidemiology studies (cohort, case-control, cross-sectional and repeated measurement studies) will be presented to introduce the students to the epigenetic effects in prenatal/early and adult life of environmental factors, including air pollution, metals, pesticides, benzene, PCBs, persistent organic pollutants, and diet. The course will enable them to understand and apply epigenetic methods in multiple areas, including cardiovascular and respiratory disease, aging, reproductive health, inflammation/immunity, and cancer.
EPI201 introduces the principles and methods used in epidemiologic research. The course discusses the conceptual and practical issues encountered in the design and analysis of epidemiologic studies for description and causal inference. EPI201 is the first course in the series of methods courses designed for students majoring in Epidemiology, Biostatistics and related fields, and those interested in a detailed introduction to the design and conduct of epidemiologic studies. Students who take EPI201 are expected to take EPI202 (Methods II). Course Note: Thursday or Friday lab required.
This course will present an introduction to the methods of data mining and predictive modeling, with applications to both genetic and clinical data. Basic concepts and philosophy of supervised and unsupervised data mining as well as appropriate applications will be discussed. Topics covered will include multiple comparisons adjustment, cluster analysis, principal component analysis, and predictive model building through logistic regression, classification and regression trees (CART), multivariate adaptive splines (MARS), neural networks, random forests, and bagging and boosting. Course Activities: Computer labs. Course Note: Students should be familiar with logistic regression.
Like all living things, pathogens have evolved by natural selection. The application of evolutionary principles to infectious disease epidemiology is crucial to such diverse subjects as outbreak analysis, the understanding of how different genomic combinations of virulence and drug resistance determinants emerge, and how selection acts to produce successful pathogens that balance the costs and benefits of virulence and transmission. The goal of this course is to introduce basic evolutionary concepts, highlighting the importance of transmission to the fitness as illustrated by comparisons of the adaptive process among different sorts of pathogenic microorganisms. Students will also learn the basics of phylogenetic sequence analysis for the study of outbreaks and transmission, and the construction of simple mathematical models that probe the adaptive process.Students outside of HSPH must request instructor permission to enroll in this course.
This is an introductory level class on the analysis of mortality, fertility and population change. It is required for all master’s and doctoral students in the department of Global Health and Population. Students are introduced to the core literature in this field through lectures, and assigned readings selected from peer-reviewed journals and textbooks. Together, these provide a graduate-level introduction to the principle sources and characteristics of population data and to the essential methods used for the analysis of population problems. The emphasis throughout is on understanding the key processes, models and assumptions used primarily for the analysis of demographic components. Practical training will be given through a required weekly laboratory session, assignments, and a final examination. Examples presented in class and used in assignments are drawn from several countries, combining both developed and developing in assignments are drawn from several countries, combining both developed and developing world realities.
Designed to bring students to an intermediate-level understanding of microeconomic theory. Emphasizes the uses and limitations of the economic approach, with applications to public health.
This course is designed to introduce the student to the methods and growing range of applications of decision analysis and cost-effectiveness analysis in health technology assessment, medical and public health decision making, and health resource allocation. The objectives of the course are: (1) to provide a basic technical understanding of the methods used, (2) to give the student an appreciation of the practical problems in applying these methods to the evaluation of clinical interventions and public health policies, and (3) to give the student an appreciation of the uses and limitations of these methods in decision making at the individual, organizational, and policy level both in developed and developing countries.
Massachusetts Institute of Technology
Introduces representations, methods, and architectures used to build applications and to account for human intelligence from a computational point of view. Covers applications of rule chaining, constraint propagation, constrained search, inheritance, statistical inference, and other problem-solving paradigms. Also addresses applications of identification trees, neural nets, genetic algorithms, support-vector machines, boosting, and other learning paradigms. Considers what separates human intelligence from that of other animals. Students taking graduate version complete additional assignments.
Introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction; formulation of learning problems; representation, over-fitting, generalization; clustering, classification, probabilistic modeling; and methods such as support vector machines, hidden Markov models, and neural networks. Students taking graduate version complete additional assignments. Meets with 6.862 when offered concurrently. Recommended prerequisites: 6.006 and 18.06. Enrollment may be limited.
A guide for data scientists, engineers, and clinicians who are interested in performing retrospective research using data from electronic health records. Instruction provided in clinical decision-making and secondary use of clinical data, using the Medical Information Mart for Intensive Care (MIMIC) database and the eICU Collaborative Research Database. Covers steps in parsing a clinical question into a study design and methodology for data analysis and interpretation. Activities include review of case studies using the MIMIC and the eICU Collaborative Research Database and a team project. Student teams choose a question and clinician to work with for their project. Teams meet weekly with clinicians at the hospitals at arranged time.
Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.