The following is a list of electives that are approved to be used towards the Program Elective and/or Data Science Selective requirement. Courses are not guaranteed to run every year and serve only as examples of suitable elective courses. Enrolled students will have the opportunity to petition to add courses to this list.

  • Harvard Business School

    HBSMBA 1757 - Launching Technology Ventures (HBS: 3 credits, HMS: 4 credits)

    Fall Semester Course

    The course takes the perspective of founders struggling to achieve product market fit in their early-stage startups. Our cases focus on founder decision during this search and discovery phase, both in the experiments that they design and run as well as the organizations they try to form.

    LTV has a tactical, implementation bias rather than a strategic one and will largely avoid concepts covered in Entrepreneurial Finance and Founders' Journey. There is a modest overlap with Product Management 101/102 and Entrepreneurial Sales, and Entrepreneurial Marketing, but LTV is solely focused on pre-product-market fit and the perspective of the founder. In that regard, LTV is a complementary to Scaling Tech Ventures (STV), which focused more on post product-market fit startups.

    LTV helps students learn the playbook for finding product-market fit while learning to design business models for success, answering the question why some startups are valued at only 2x revenue while others are valued at 20x.

    Field Course: Transforming Health Care Delivery, Course Number 6219 (HBS: 3 credits, HMS: 4 credits)

    Spring Semester Course

    At the root of the transformation occurring in the health care industry-both in the United States and internationally-is the fundamental challenge of improving clinical outcomes while controlling costs. Addressing this challenge will require dramatic improvements in the processes by which care is delivered to patients. This will, in turn, involve changing the organization of delivery, developing new approaches to performance measurement, and reimagining the ways in which providers are paid. This course will equip students with the tools required to design and implement these improvements.


  • Harvard Faculty of Arts and Sciences

    APCOMP 209A - Data Science 1: Introduction to Data Science (FAS: 4 credits, HMS: 4 credits)

    Fall Semester Course

    Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis, generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication , summarizing results through visualization, stories, and interpretable summaries. Recommended: Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).

    Cross-listed as COMPSCI 109A and STAT 121A

    APCOMP 209B - Data Science 2: Advanced Topics in Data Science (FAS: 4 credits, HMS: 4 credits)

    Spring Semester Course

    Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.

    Cross-listed as COMPSCI 109B


    APCOMP 215 - Advanced Practical Data Science (FAS: 4 credits, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    In this course, we explore advanced practical data science practices. The course will be divided into three major topics:
    1) How to scale a model from a prototype (often in jupyter notebooks) to the cloud. In this module, we cover virtual environments, containers, and virtual machines before learning about microservices and Kubernetes. Along the way, students will be exposed to Dask.
    2) How to use existing models for transfer learning. Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks. This can be very important, given the vast compute and time resources required to develop neural network models on these problems and given the huge jumps in skill that these models can provide to related problems. In this part of the course we will examine various pre-existing models and techniques in transfer learning.
    3) In the third part we will be introducing a number of intuitive visualization tools for investigating properties and diagnosing issues of models. We will be demonstrating a number of visualization tools ranging from the well established (like saliency maps) to recent ones that have appeared in

    APCOMP 295 - Deep Learning for NLP (FAS: 4 credits, HMS: 4 credits)

    Also listed as COMPSCI 287R

    Fall Semester Course

    Data Science Selective

    How can computers understand and leverage text data and human language? Natural language processing (NLP) addresses this question, and in this course students study the current, best approaches to it. No prior NLP experience is needed, but it is welcomed. This course provides students with a foundation of advanced concepts and requires students to conduct significant research on an NLP topic of their choosing. The aim is to produce a short paper worthy of submitting to an NLP conference. Assessment also includes pop quizzes, homework assignments, and an exam. The course starts with language representations and modelling, followed by machine translation (converting text from one language to another). Next, students learn about transformers (e.g., BERT and GPT-2), which are incredibly powerful deep learning models that currently yield state-of-the-art results in nearly every NLP task. We end the semester by covering tasks concerning bias and fairness, adversarial approaches, coreference resolution, and commonsense reasoning.

    BIOPHYS 170 - Evolutionary and Quantitative Genomics (FAS: 4, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    In-depth study of genomics: models of evolution and population genetics; comparative genomics: analysis and comparison; structural genomics: protein structure, evolution and interactions; functional genomics, gene expression, structure and dynamics of regulatory networks.

    Cross-listed as HST.508. Harvard students should enroll through Biophysics 170. 

    BIOSTAT 234 - Introduction to Data Structures and Algorithms (FAS: 4, HMS: 4 credits)

    Spring Semester Course

    Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed.

    Cross-listed as BST 234. HMS students should enroll through BIOSTAT 234. 

    BMIF 201 - Concepts in genome analysis (FAS: 4, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    This course focuses on quantitative aspects of genetics and genomics, including computational and statistical methods of genomic analysis. We will introduce basic concepts and discuss recent progress in population and evolutionary genetics and cover principles of statistical genetics of Mendelian and complex traits. We will then introduce current genomic technologies and key algorithms in computational biology and bioinformatics. We will discuss applications of these algorithms to genome annotation and analysis of epigenomics, cancer genomics and metagenomics data. Proficiency in programming and basic knowledge of genetics and statistics will be assumed.

    CELLBIO 312QC - Deep Learning for Biomedical Image Analysis (FAS: 2, HMS: 2 credits)

    Spring Semester Course

    Biomedical image analysis is undergoing a paradigm shift due to artificial intelligence and deep learning. This course will cover basic concepts of deep learning and convolutional neural networks for biomedical image analysis as well as current challenges and opportunities. The lectures will include fundamentals of classification, characterization, detection, segmentation and enhancement in biomedical images. Using a variety of different microscopy and pathology datasets the course will follow a ‘learning-by-doing’ model where each lecture will be accompanied by hand on training in using these methods in practice. The course assumes no prior knowledge of deep learning or image analysis. Basic knowledge of python is recommended but not required. 

    COMPSCI 107 - Systems Development for Computational Science (FAS: 4, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    This is a project-based course emphasizing designing, building, testing, maintaining and modifying software for scientific computing and data sciences. Students will work in groups on a number of projects, ranging from small data-transformation utilities to large-scale systems. Students will learn to use a variety of tools and languages, as well as various techniques for organizing teams. Most important, students will learn to adapt basic tools and approaches to solve computational problems in academic or industrial environments.

    COMPSCI 179 – Design of Usable Interactive Systems (FAS: 4, HMS: 4 credits)

    Spring Semester Course

    Usability and design as keys to successful technology. Covers user observation techniques, needs assessment, low and high fidelity prototyping, usability testing methods, as well as theory of human perception and performance, and design best practices. Focuses on understanding and applying the lessons of human interaction to the design of usable systems; will also look at lessons to be learned from less usable systems. The course includes several small and one large project.

    COMPSCI 182 - Artificial Intelligence (FAS: 4, HMS: 4 credits)

    Fall Semester Course

    Artificial Intelligence (AI) is an exciting field that has enabled a wide range of cutting-edge technology, from driverless cars to grandmaster-beating Go programs. The goal of this course is to introduce the ideas and techniques underlying the design of intelligent computer systems. Topics covered in this course are broadly be divided into 1) planning and search algorithms, 2) probabilistic reasoning and representations, and 3) machine learning (although, as you will see, it is impossible to separate these ideas so neatly). Within each area, the course will also present practical AI algorithms being used in the wild and, in some cases, explore the relationship to state-of-the-art techniques. The class will include lectures connecting the models and algorithms we discuss to applications in robotics, computer vision, and speech processing. Recommended prep: CS 51; Stat 110 (may be taken concurrently).

    COMPSCI 187 - Introduction to Computational Linguistics and Natural-Language Processing

    Fall Semester Course

    Data Science Selective

    Natural-language-processing applications are ubiquitous: Alexa can set a reminder if you ask; Google Translate can make emails readable across languages; Watson outplays world Jeopardy champions; Grover can generate fake news, and recognize it as well. How do such systems work? This course provides an introduction to the field of computational linguistics, the study of human language using the tools and techniques of computer science, with applications to a variety of natural-language-processing problems such as these. You will work with ideas from linguistics, statistical modeling, and machine learning, with emphasis on their application, limitations, and implications. The course is lab- and project-based, primarily in small teams, and culminates in the building and testing of a question-answering system.

    Recommended Prep: Programming ability and computer science knowledge at the level of CS51; knowledge of discrete mathematics, including basic probability, statistics, and logic at the level of CS20; some familiarity with Python programming.

    COMPSCI 205 - Computing Foundations for Computational Science (FAS: 4 credits, HMS: 4 credits)

    Spring Semester Course

    Computational science has become a third partner, together with theory and experimentation, in advancing scientific knowledge and practice, and an essential tool for product and process development and manufacturing in industry. Big data science adds the ‘fourth pillar’ to scientific advancements, providing the methods and algorithms to extract knowledge or insights from data. The course is a journey into the foundations of Parallel Computing at the intersection of large-scale computational science and big data analytics. Many science communities are combining high performance computing and high-end data analysis platforms and methods in workflows that orchestrate large-scale simulations or incorporate them into the stages of large-scale analysis pipelines for data generated by simulations, experiments, or observations. This is an applications course highlighting the use of modern computing platforms in solving computational and data science problems, enabling simulation, modeling and real-time analysis of complex natural and social phenomena at unprecedented scales. The class emphasizes on making effective use of the diverse landscape of programming models, platforms, open-source tools, computing architectures and cloud services for high performance computing and high-end data analytics.

    COMPSCI 242 - Computing at Scale (FAS: 4 credits, HMS: 4 credits)

    Spring Semester Course

    Scaling computation over parallel and distributed computing systems is a rapidly advancing area of research receiving high levels of interest from both academia and industry. The objective can be for high-­‐performance computing and energy-­‐efficient computing (“green” data center servers as well as small embedded devices). In this course, students will learn principled methods of mapping prototypical computations used in machine learning, the Internet of Things, and scientific computing onto parallel and distributed compute nodes of various forms. These techniques will lay the foundation for future computational libraries and packages for both high-­‐performance computing and energy-­‐efficient devices. To master the subject, students will need to appreciate the close interactions between computational algorithms, software abstractions, and computer organizations. After having successfully taken this course, students will acquire an integrated understanding of these issues. The class will be organized into the following modules: Big picture: use of parallel and distributed computing to achieve high performance and energy efficiency; End-­‐to-­‐end example 1: mapping nearest neighbor computation onto parallel computing units in the forms of CPU, GPU, ASIC and FPGA; Communication and I/O: latency hiding with prediction, computational intensity, lower bounds; Computer architectures and implications to computing: multi-­‐cores, CPU, GPU, clusters, accelerators, and virtualization; End-­‐to-­‐end example 2: mapping convolutional neural networks onto parallel computing units in the forms of CPU, GPU, ASIC, FPGA and clusters; Great inner loops and parallelization for feature extraction, data clustering and dimension reduction: PCA, random projection, clustering (K-­‐means, GMM-­‐EM), sparse coding (K-­‐SVD), compressive sensing, FFT, etc.; Software abstractions and programming models: MapReduce (PageRank, etc.), GraphX/Apache Spark, OpenCL and TensorFlow; Advanced topics: autotuning and neuromorphic spike-­‐based computing.  Students will learn the subject through lectures/quizzes, programming assignments, labs, research paper presentations, and a final project.  Students will have latitude in choosing a final project they are passionate about. They will formulate their projects early in the course, so there will be sufficient time for discussion and iterations with the teaching staff, as well as for system design and implementation. Industry partners will support the course by giving guest lectures and providing resources.  The course will use server clusters at Harvard as well as external resources in the cloud. In addition, labs will have access to state-­‐of-­‐the-­‐art IoT devices and 3D cameras for data acquisition. Students will use open source tools and libraries and apply them to data analysis, modeling, and visualization problems.

    COMPSCI 282R - Topics in Machine Learning (FAS: 4 credits, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    Special topics course. Focus of course changes year to year. 

    COMPSCI 289 - Biologically Inspired Multi-agent Systems (FAS: 4 credits, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    Surveys biologically-inspired approaches to designing distributed systems. Focus is on biological models, algorithms, and programming paradigms for self-organization. Topics vary year to year, and usually include: (1) swarm intelligence: social insects and animal groups, with applications to networking and robotics, (2) cellular computing: including cellular automata/amorphous computing, and applications like self-assembling robots and programmable materials, (3) evolutionary computation and its application to optimization and design. Recommended Prep: Students should have a familiarity/experience with computer systems (e.g. software, networking) and algorithms/analysis through classes and/or internship experiences. Background in biology not required.

    Genetics 228 - Genetics in Medicine: From Bench to Bedside (FAS: 4 credits, HMS: 4 credits)

    Spring Semester Course

    Focus on translational medicine: the application of basic genetic discoveries to human disease. Each three-hour class will focus on a specific genetic disorder and the approaches currently used to speed the transfer of knowledge from the laboratory to the clinic. Each class will include a clinical discussion, a patient presentation if appropriate, followed by lectures, a detailed discussion of recent laboratory findings and a student led journal club. Lecturers will highlight current molecular, technological, bioinformatic and statistical approaches that are being used to advance the study of human disease. There is no exam. Students will present one paper per session in a journal club style. Attendance and active participation for the duration of all class meetings is required. If you are unable to attend class, or cannot be present for the entire session you are expected to contact the course instructor. Two incomplete or missed sessions will result in a failing grade. 

    MICROBI 302QC - Introduction to Infectious Disease Research: Infectious Diseases Consortium Bootcamp (FAS: 2 credits, HMS: 2 credits)

    January Term Course

    This January boot camp course provides a fun, informative introduction to the breadth of infectious disease research carried out at Harvard and beyond. Students will have the chance to meet faculty, students, and fellows in infectious disease roles across the university. The course will focus on several aspects of infectious diseases:  
    1. Underlying biology of infectious diseases and the diverse pathogens that cause them 
    2. Modern approaches to studying infectious diseases, including experimental biology, epidemiology, bioinformatics, and clinical microbiology 
    3. Modern approaches to developing new interventions, including drugs, vaccines, diagnostics, and public health measures 

    STAT 195 - Statistical Machine Learning (FAS: 4 credits, HMS: 4 credits)

    Spring Semester Course

    This course is designed to follow CS 181 and go into further depth on the statistical aspects of supervised learning: given what we know about our data and where it came from, how can we choose the machine learning method that will predict best on future data? Topics include the ``no free lunch" theorems, linear methods for regression and classification, shrinkage and sparsity, splines and kernel smoothing, model selection and cross-validation, additive models and trees, boosting, bagging and random forests. Recommended prep: Statistics 111 and Computer Science 181

  • Harvard Medical School

    BETH 704 - Neuroethics (HMS: 2 credits)

    J-Term Course

    BMI 720 and BMI 721 - Introduction to Clinical Informatics and Lab (HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    BMI 741 - Health Information Technology: From Ideation to Implementation (HMS: 4 credits)

    Spring Semester Course

  • Harvard T.H. Chan School of Public Health

    BST 201 - Introduction to Statistical Methods

    Fall Semester Course

    Covers basic statistical techniques that are important for analyzing data arising from epidemiology, environmental health and biomedical and other public health-related research. Major topics include descriptive statistics, elements of probability, introduction to estimation and hypothesis testing, nonparametric methods, techniques for categorical data, regression analysis, analysis of variance, and elements of study design. Applications are stressed. Designed as an alternate to BIO200, for students desiring more emphasis on theoretical developments. Background in algebra and calculus strongly recommended.

    BST 210 - Applied Regression Analysis (HSPH: 5 credits, HMS: 4 credits)

    Fall and Spring Semester Course

    Topics include model interpretaion, model building, and model assessment for linear regression with continuous outcomes, logistic regression with binary outcomes, and proportional hazards regression with survival time outcomes. Specific topics include regression diagnostics, confounding and effect modification, goodness of fit, data transformations, splines and additive models, ordinal, multinomial, and conditional logistic regression, generalized linear models, overdispersion, Poisson regression for rate outcomes, hazard functions, and missing data. The course will provide students with the skills necessary to perform regression analyses and to critically interpret statistical issues related to regression applications in the public health literature. Prerequisites: ID 201 or BST201 or (BST202 and BST203) or (BST206 and (BST207 or BST208)) or permission of instructor.

    BST 212 – Survey Research Methods in Community Health (HSPH: 2.5 credits, HMS: 2 credits)

    Spring Semester Course

    Covers research design, sample selection, questionnaire construction, interviewing techniques, the reduction and interpretation of data, and related facets of population survey investigations. Focuses primarily on the application of survey methods to problems of health program planning and evaluation. Treatment of methodology is sufficiently broad to be suitable for students who are concerned with epidemiological, nutritional, or other types of survey research. Formerly BIO212

    BST 213 - Applied Regression for Clinical Research (HSPH: 5 credits, HMS: 4 credits)

    Fall Semester Course

    This course will introduce students involved with clinical research to the practical application of multiple regression analysis. Linear regression, logistic regression and proportional hazards survival models will be covered, as well as general concepts in model selection, goodness-of-fit, and testing procedures. Each lecture will be accompanied by a data analysis using SAS and a classroom discussion of the results. The course will introduce, but will not attempt to develop the underlying likelihood theory. Background in SAS programming ability required.

    BST 214 - Principles of Clinical Trials (HSPH: 2.5 credits, HMS: 2 credits)

    Spring Semester Course

    Designed for individuals interested in the scientific, policy, and management aspects of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, and interpretation of results. Students design a clinical investigation in their own field of interest, write a proposal for it, and critique recently published medical literature. Course Prerequisites: BIO201 or ID200 or ID201 or ID207 or BIO202&203 or BIO206&207 or BIO206&208 or BIO206&209. Formerly BIO214

    BST 215 – Linear and Longitudinal Regression (HSPH: 2.5 credits, HMS: 2 credits)

    Spring Semester Course

    This course is intended for students who are already very comfortable with fundamental techniques in statistics. The course will cover methods for building and interpreting linear regression models, including statistical assumptions and diagnostics, estimation and testing, and model building techniques. These models will be extended to handle data arising from longitudinal studies employing repeated measurement of subjects over time. Summer/Residential Course Note (Section 1): Lectures will be accompanied by computing exercises using the SAS statistical package. Online Course Note (Section 2): Lectures will be accompanied by computing exercises using the Stata statistical package. Course Prerequisites: EPI522 or BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO501

    BST 217 - Statistical and Quantitative Methods for Pharmaceutical Regulatory Science (HSPH: 2.5 credits, HMS: 2 credits)

    Spring Semester Course

    The goal of this course is to enable scientists and public health professionals who already have an introductory background in biostatistics and clinical trials to acquire the competencies in quantitative skills and systems thinking required to understand and participate in drug development and regulatory review processes. The course illustrates how statistical and quantitative methods are used to transform information into evidence demonstrating the safety, efficacy and effectiveness of drugs and devices over the course the product’s life cycle from a regulatory perspective. Content is delivered using a blended-learning approach involving lectures, web-based media and selected case study examples derived from actual FDA decision-making and regulatory assessments to highlight and describe each phase of the regulatory drug approval process. Case studies will illustrate regulatory science in action and practice and will include content publicly available from the FDA’s website that can be used in conjunction with FDA science-based guidance and decision precedents. Course Prerequisites: ID538 or [(BIO200 or ID200 or BIO201 or BIO202&203 or BIO206&207/8/9) and (EPI200 or EPI201 or EPI208 or EPI505).] Formerly BIO523

    BST 222- Basics of Statistical Inference (HSPH: 5 credits, HMS: 4 credits)

    Fall Semester Course

    This course will provide a basic, yet thorough introduction to the probability theory and mathematical statistics that underlie many of the commonly used techniques in public health research. Topics to be covered include probability distributions (normal, binomial, Poisson), means, variances and expected values, finite sampling distributions, parameter estimation (method of moments, maximum likelihood), confidence intervals, hypothesis testing (likelihood ratio, Wald and score tests). All theoretical material will be motivated with problems from epidemiology, biostatistics, environmental health and other public health areas. This course is aimed towards second year doctoral students in fields other than Biostatistics. Background in algebra and calculus required. Course Prerequisites: BST210 or BST213. Formerly BIO222

    BST 223 - Applied Survival Analysis (HSPH: 5 credits, HMS: 4 credits)

    Spring Semester Course

    Topics will include types of censoring, hazard, survivor, and cumulative hazard functions, Kaplan-Meier and actuarial estimation of the survival distribution, comparison of survival using log rank and other tests, regression models including the Cox proportional hazards model and the accelerated failure time model, adjustment for time-varying covariates, and the use of parametric distributions (exponential, Weibull) in survival analysis. Methods for recurrent survival outcomes and competing risks will also be discussed, as well as design of studies with survival outcomes. Class material will include presentation of statistical methods for estimation and testing along with current software (SAS, Stata) for implementing analyses of survival data. Applications to real data will be emphasized. Course Prerequisite(s): BST210 or BST213 or BST 230, or permission of instructor required. BST 213 may be taken concurrently. Formerly BIO223

    BST 226 - Applied Longitudinal Analysis (HSPH: 5 credits, HMS: 4 credits)

    Spring Semester Course

    This course covers modern methods for the analysis of repeated measures, correlated outcomes and longitudinal data, including the unbalanced and incomplete data sets characteristic of biomedical research. Topics include an introduction to the analysis of correlated data, analysis of response profiles, fitting parametric curves, covariance pattern models, random effects and growth curve models, and generalized linear models for longitudinal data, including generalized estimating equations (GEE) and generalized linear mixed effects models (GLMMs).Course Activities: Homework assignments will focus on data analysis in SAS using PROC GLM, PROC MIXED, PROC GENMOD, and PROC GLIMMIX. Course Note: Lab or section times will be announced at first meeting. Course Prerequisite(s): BIO210 or BIO211 or BIO213 or BIO232. Formerly BIO226

    BST 227 - Introduction to Statistical Genetics (HSPH: 2.5 credits, HMS: 2 credits)

    Fall Semester Course

    This course introduces students to the diverse statistical methods used throughout the process of statistical genetics, from familial aggregation and segregation studies to linkage scans and association studies. Topics covered include basic principles from population genetics, multipoint and model-free linkage analysis, family-based and population-based association testing, and Genome Wide Association analysis. Instructors use ongoing research into the genetics of respiratory disease, psychiatric disorders and cancer to illustrate basic principles. Weekly homework supplements reading, course lectures, discussion and section. Relevant concepts in genetics and molecular genetics will be reviewed in lectures and labs. The emphasis of the course is fundamental principles and concepts. Course Prerequisites: BST210 (concurrent enrollment allowed)Course Note: There will be a weekly lab section; the time will be scheduled at first meeting. Formerly BIO227

    BST 228 - Applied Bayesian Analysis (HSPH: 5 credits, HMS: 4 credits)

    Fall Semester Course

    This course is a practical introduction to the Bayesian analysis of biomedical data. It is an intermediate Master’s level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Bayesian computing: Markov Chain Monte Carlo; STAN R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data. Programming and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST210 and BST222, or permission of the instructor. Not currently offered. 

    BST 230 - Probability I (HSPH: 5 credits, HMS: 4 credits)

    Axiomatic foundations of probability, independence, conditional probability, joint distributions, transformations, moment generating functions, characteristic functions, moment inequalities, sampling distributions, modes of convergence and their interrelationships, laws of large numbers, central limit theorem, and stochastic processes.

    BST 231 - Statistical Inference I (HSPH: 5 credits, HMS: 4 credits)

    A fundamental course in statistical inference. Discusses general principles of data reduction: exponential families, sufficiency, ancillarity and completeness. Describes general methods of point and interval parameter estimation and the small and large sample properties of estimators: method of moments, maximum likelihood, unbiased estimation, Rao-Blackwell and Lehmann-Scheffe theorems, information inequality, asymptotic relative efficiency of estimators. Describes general methods of hypothesis testing and optimality properties of tests: Neyman-Pearson theory, likelihood ratio tests, score and Wald tests, uniformly and locally most powerful tests, asymptotic relative efficiency of tests. Course Note: Lab or section time to be announced at first meeting; cross-listed: HSPH student must register for HSPH course. Course Prerequisite(s): BIO230 (concurrent enrollment allowed). Formerly BIO231

    BST 234 - Introduction to Data Structures and Algorithms (HSPH: 5 credits, HMS: 4 credits)

    Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed. Course Prerequisite(s): Instructor’s Permission. Formerly BIO514

    BST 235 - Advanced Regression and Statistical Learning (HSPH: 5 credits, HMS: 4 credits)

    An advanced course in linear models, including both classical theory and methods for high dimensional data. Topics include theory of estimation and hypothesis testing, multiple testing problems and false discovery rates, cross validation and model selection, regularization and the LASSO, principal components and dimensional reduction, and classification methods. Background in matrix algebra and linear regression required. Prerequisite: BST 231 and BST 233, or permission of instructor required. Formerly BIO235

    BST 240 - Probability II (HSPH: 5 credits, HMS: 4 credits)

    A foundational course in measure theoretic probability. Topics include measure theory, Lebesgue integration, product measure and Fubini’s Theorem, Radon-Nikodym derivatives, conditional probability, conditional expectation, limit theorems on sequences of random stochastic processes, and weak convergence. Course Prerequisites: BST231 or permission from the instructor required. Formerly BIO250

    BST 241 - Statistical Inference II (HSPH: 5 credits, HMS: 4 credits)

    Sequel to BIO 231. Considers several advanced topics in statistical inference. Topics include limit theorems, multivariate delta method, properties of maximum likelihood estimators, saddle point approximations, asymptotic relative efficiency, robust and rank-based procedures, resampling methods, and nonparametric curve estimation. Course Note: Cross-listed, HSPH must register for HSPH course. Course Prerequisites: BIO231 and BIO250, or permission of instructor required. Formerly BIO251

    BST 245 - Analysis of Multivariate and Longitudinal Data (HSPH: 5 credits, HMS: 4 credits)

    Presents classical and modern approaches to the analysis of multivariate observations, repeated measures, and longitudinal data. Topics include the multivariate normal distribution, Hotelling’s T2, MANOVA, the multivariate linear model, random effects and growth curve models, generalized estimating equations, statistical analysis of multivariate categorical outcomes, and estimation with missing data. Discusses computational issues for both traditional and new methodologies. Course Note: Cross-listed, HSPH student must register for HSPH course. Course Prerequisite: BIO231 and BIO235, or permission of the instructor are required. Formerly BIO245

    BST 247 - Advanced Statistical Genetics (HSPH: 2.5 credits, HMS: 2 credits)

    BST247 is a seminar style course with readings selected from the literature in areas of expertise of the participating faculty. Content may vary from year to year. The specific objectives are (1) To train students to critically read foundational papers and current journal articles in Statistical Genetics, (2) To train students to present sophisticated ideas to an audience of peers, and (3) To prepare students to engage in doctoral level research in the area. After the course, students are expected to have an in-depth and broad understanding on important topics of statistical genetics research. Course Prerequisite(s): BIO227 and (BIO231 or EPI511). BIO231 may be taken concurrently. Formerly BIO257

    BST 249 - Bayesian Methodology in Biostats (HSPH: 5 credits, HMS: 4 credits)

    General principles of the Bayesian approach, prior distributions, hierarchical models and modeling techniques, approximate inference, Markov chain Monte Carlo methods, model assessment and comparison. Bayesian approaches to GLMMs, multiple testing, nonparametrics, clinical trials, survival analysis.

    BST 261 – Data Science II (HSPH: 2.5 credits, HMS: 2 credits)

    This course is the second course in the foundational sequence of the School’s newly approved Master’s Degree in Health Data Science. The course will build upon our existing course, BST260 Introduction to Data Science, in presenting a set of tools for modeling and understanding complex datasets. Specifically, the course will provide practical regression and tree-based techniques for big data. Specific topics that will be covered include: linear model selection and regularization: LASSO and regularization; principal component regression and partial least squares; tree-based methods: decision trees; bagging, random forests, and boosting; unsupervised learning: principal components analysis, cluster analysis. Programming (Python and R) and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST260 or permission of instructor.

    BST 262 - Computing for Big Data (HSPH: 2.5 credits, HMS: 2 credits)

    Big data is everywhere, from Omics and Health Policy to Environmental Health. Every single aspect of the Health Sciences is being transformed. However, it is hard to navigate and critically assess tools and techniques in such a fast-moving big data panorama. In this course, we are going to give a critical presentation of theoretical approaches and software implementations of tools to collect, store and process data at scale. The goal is not just to learn recipes to manipulate big data but learn how to reason in terms of big data, from software design and tool selection to implementation, optimization and maintenance. 

    BST 267 - Introduction to Social and Biological Networks (HSPH: 2.5 credits, HMS: 2 credits)

    Many systems of scientific and societal interest consist of a large number of interacting components. The structure of these systems can be represented as networks where network nodes represent the components and network edges the interactions between the components. Network analysis can be used to study how pathogens, behaviors and information spread in social networks, having important implications for our understanding of epidemics and the planning of effective interventions. In a biological context, at a molecular level, network analysis can be applied to gene regulation networks, signal transduction networks, protein interaction networks, and more. This introductory course covers some basic network measures, models, and processes that unfold on networks. The covered material applies to a wide range of networks, but we will focus on social and biological networks. To analyze and model networks, we will learn the basics of the Python programming language and its Network X module. The course contains a number of hands-on computer lab sessions. There are five homework assignments and four reading assignments that will be discussed in class. In addition, each student will complete a final project that applies network analysis techniques to study a public health problem. Course Prerequisites: BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO521

    BST 272 - Computing Environments for Biology 

    BST 283 - Cancer Genome Analysis (HSPH: 5 credits, HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    This course is an introduction to modern statistical computing techniques used to characterize and interpret cancer genome sequencing datasets. This Master’s level course will begin with a basic introduction to DNA, genes, and genomes for students with no biology background. It will then introduce cancer as an evolutionary process and review landmarks in the history of cancer genetics, and discuss the basics of sequencing technology and modern Next Generation Sequencing. The course will cover the main steps involved in turning billions of short sequencing reads into a representation of the somatic genetic alterations characterizing an individual patient’s cancer, and will build on this foundation to study topics related to identifying mutations under positive selection from multiple tumors sampled in a population. By the end of the course, students will be able to apply state-of-the art analysis to cancer genome datasets and to critically evaluate papers employing cancer genome data.

    EH 298 - Environmental Epigenetics (HSPH: 2.5 credits, HMS: 2 credits)

    Epigenetics is a fast growing field, with increasing applicability in environmental and epidemiology studies, focusing on the alterations in chromatin structure that can stably and heritably influence gene expression. Epigenetic changes can be as profound as those exerted by mutation, but, unlike mutations, are reversible and responsive to environmental influences. The course will focus on epigenetic mechanisms and laboratory methods for DNA methylamine, his tone modifications, small non-coding RNAs, and epigenomics. Ongoing experimental, and epidemiology studies (cohort, case-control, cross-sectional and repeated measurement studies) will be presented to introduce the students to the epigenetic effects in prenatal/early and adult life of environmental factors, including air pollution, metals, pesticides, benzene, PCBs, persistent organic pollutants, and diet. The course will enable them to understand and apply epigenetic methods in multiple areas, including cardiovascular and respiratory disease, aging, reproductive health, inflammation/immunity, and cancer.

    EPI 201 - Introduction to Epidemiology Methods I (HSPH: 2.5 credits, HMS: 2 credits)

    EPI201 introduces the principles and methods used in epidemiologic research. The course discusses the conceptual and practical issues encountered in the design and analysis of epidemiologic studies for description and causal inference. EPI201 is the first course in the series of methods courses designed for students majoring in Epidemiology, Biostatistics and related fields, and those interested in a detailed introduction to the design and conduct of epidemiologic studies. Students who take EPI201 are expected to take EPI202 (Methods II). Course Note: Thursday or Friday lab required.

    EPI 288 – Data Mining and Prediction (HSPH: 2.5 credits, HMS: 2 credits)

    This course will present an introduction to the methods of data mining and predictive modeling, with applications to both genetic and clinical data. Basic concepts and philosophy of supervised and unsupervised data mining as well as appropriate applications will be discussed. Topics covered will include multiple comparisons adjustment, cluster analysis, principal component analysis, and predictive model building through logistic regression, classification and regression trees (CART), multivariate adaptive splines (MARS), neural networks, random forests, and bagging and boosting. Course Activities: Computer labs. Course Note: Students should be familiar with logistic regression.

    EPI 519 - Evolutionary Epidemiology of Infectious Disease (HSPH: 2.5 credits, HMS: 2 credits)

    Like all living things, pathogens have evolved by natural selection. The application of evolutionary principles to infectious disease epidemiology is crucial to such diverse subjects as outbreak analysis, the understanding of how different genomic combinations of virulence and drug resistance determinants emerge, and how selection acts to produce successful pathogens that balance the costs and benefits of virulence and transmission. The goal of this course is to introduce basic evolutionary concepts, highlighting the importance of transmission to the fitness as illustrated by comparisons of the adaptive process among different sorts of pathogenic microorganisms. Students will also learn the basics of phylogenetic sequence analysis for the study of outbreaks and transmission, and the construction of simple mathematical models that probe the adaptive process.Students outside of HSPH must request instructor permission to enroll in this course.

    GHP 220 - Intro to Demographic Methods (HSPH: 2.5 credits, HMS: 2 credits)

    This is an introductory level class on the analysis of mortality, fertility and population change. It is required for all master’s and doctoral students in the department of Global Health and Population. Students are introduced to the core literature in this field through lectures, and assigned readings selected from peer-reviewed journals and textbooks. Together, these provide a graduate-level introduction to the principle sources and characteristics of population data and to the essential methods used for the analysis of population problems. The emphasis throughout is on understanding the key processes, models and assumptions used primarily for the analysis of demographic components. Practical training will be given through a required weekly laboratory session, assignments, and a final examination. Examples presented in class and used in assignments are drawn from several countries, combining both developed and developing in assignments are drawn from several countries, combining both developed and developing world realities.

    HPM 206 - Economic Analysis (HSPH: 5 credits, HMS: 4 credits)

    Designed to bring students to an intermediate-level understanding of microeconomic theory. Emphasizes the uses and limitations of the economic approach, with applications to public health.

    HPM 261 - Health Care Information Technology Management

    RDS 280 - Decision Analysis For Health/Medical Practice (HSPH: 2.5 credits, HMS: 2 credits)

    This course is designed to introduce the student to the methods and growing range of applications of decision analysis and cost-effectiveness analysis in health technology assessment, medical and public health decision making, and health resource allocation. The objectives of the course are: (1) to provide a basic technical understanding of the methods used, (2) to give the student an appreciation of the practical problems in applying these methods to the evaluation of clinical interventions and public health policies, and (3) to give the student an appreciation of the uses and limitations of these methods in decision making at the individual, organizational, and policy level both in developed and developing countries.

    SBS 509 - Health Communication in the 21st Century (HSPH: 2.5 credits, HMS: 2 credits)

    This course is designed to provide students in public health and social science with an overview of the theory and research on the role of communication in health in the 21st century. The role of communication in public health will be examined both as a product of everyday interaction with communication platforms including mass media and messages, and its planned use to accomplish particular public health goals. Research examined here looks both at planned and unplanned effects of communication in a variety of health situations representing a range of public health topical concerns.

  • Massachusetts Institute of Technology

    6 .439 - Statistics, Computation, and Applications

    Fall Semester Course

    Data Science Selective

    6.844 - Artificial Intelligence

    Fall Semester Course

    Introduces representations, methods, and architectures used to build applications and to account for human intelligence from a computational point of view. Covers applications of rule chaining, constraint propagation, constrained search, inheritance, statistical inference, and other problem-solving paradigms. Also addresses applications of identification trees, neural nets, genetic algorithms, support-vector machines, boosting, and other learning paradigms. Considers what separates human intelligence from that of other animals. Students taking graduate version complete additional assignments. 

    6.862 - Applied Machine Learning

    Fall and Spring Semester Course

    Data Science Selective

    Introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction; formulation of learning problems; representation, over-fitting, generalization; classification, regression, reinforcement learning; and methods such as linear classifiers, feed-forward, convolutional, and recurrent networks. Students taking graduate version complete different assignments. Meets with 6.036 when offered concurrently. Recommended prerequisites: 18.06 and 6.006.

    6.864 - Advanced Natural Language Processing

    Fall Semester Course

    Introduces the study of human language from a computational perspective, including syntactic, semantic and discourse processing models. Emphasizes machine learning methods and algorithms. Uses these methods and models in applications such as syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization. Students taking graduate version complete additional assignments.

    6.867 - Machine Learning

    Fall Semester Course

    Data Science Selective

    6.884 - Advanced Topics in Artificial Intelligence

    NOT OFFERED 2021 - 2022

    Advanced study of topics in artificial intelligence. Specific focus varies from year to year. Consult department for details.

    6.878 - Advanced Computational Biology: Genomes, Networks, Evolution

    Fall Semester Course

    Data Science Selective

    CSB .100 - Topics in Computational and Systems Biology

    Fall Semester Course

    Seminar based on research literature. Papers covered are selected to illustrate important problems and varied approaches in the field of computational and systems biology, and to provide students a framework from which to evaluate new developments.

    HST 504 - Topics in Computational Molecular Biology

    Also listed as MIT 18 .418

    Fall and Spring Semester Course

    Covers current research topics in computational molecular biology. Recent research papers presented from leading conferences such as the International Conference on Computational Molecular Biology (RECOMB) and the Conference on Intelligent Systems for Molecular Biology (ISMB). Topics include original research (both theoretical and experimental) in comparative genomics, sequence and structure analysis, molecular evolution, proteomics, gene expression, transcriptional regulation, biological networks, drug discovery, and privacy. Recent research by course participants also covered. Participants will be expected to present individual projects to the class.

    HST 506 - Computational Systems Biology: Deep Learning in the Life Sciences

    Also listed as MIT  6 .874 and 20 .390

    Spring Semester Course

    Presents innovative approaches to computational problems in the life sciences, focusing on deep learning-based approaches with comparisons to conventional methods. Topics include protein-DNA interaction, chromatin accessibility, regulatory variant interpretation, medical image understanding, medical record understanding, therapeutic design, and experiment design (the choice and interpretation of interventions). Focuses on machine learning model selection, robustness, and interpretation. Teams complete a multidisciplinary final research project using TensorFlow or other framework. Provides a comprehensive introduction to each life sciences problem, but relies upon students understanding probabilistic problem formulations. Students taking graduate version complete additional assignments.

    HST .953 - Collaborative Data Science in Medicine

    Fall Semester Course

    Data Science Selective

    A guide for data scientists, engineers, and clinicians who are interested in performing retrospective research using data from electronic health records. Instruction provided in clinical decision-making and secondary use of clinical data, using the Medical Information Mart for Intensive Care (MIMIC) database and the eICU Collaborative Research Database. Covers steps in parsing a clinical question into a study design and methodology for data analysis and interpretation. Activities include review of case studies using the MIMIC and the eICU Collaborative Research Database and a team project. Student teams choose a question and clinician to work with for their project. Teams meet weekly with clinicians at the hospitals at arranged time.

    HST .956 - Machine Learning for Healthcare

    Spring Semester Course

    Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.

    HST .971 - Strategic Decision Making in the Life Sciences

    Spring Semester Course

    Surveys key strategic decisions faced by managers, investors and scientists at each stage in the value chain of the life science industry. Aims to develop students' ability to understand and effectively assess these strategic challenges. Focuses on the biotech sector, with additional examples from the pharmaceutical and medical device sectors. Includes case studies, analytical models, and detailed quantitative analysis. Intended for students interested in building a life science company or working in the sector as a manager, consultant, analyst, or investor. Provides analytical background to the industry for biological and biomedical scientists, engineers and physicians with an interest in understanding the commercial dynamics of the life sciences or the commercial potential of their research.

    HST  .978 - Healthcare Ventures

    Also listed as MIT 15 .367

    Spring Semester Course

    Addresses healthcare entrepreneurship with an emphasis on startups bridging care re-design, digital health, medical devices, and high-tech. Includes prominent speakers and experts from key domains across medicine, pharma, med devices, regulatory, insurance, software, design thinking, entrepreneurship, and investing. Provides practical experiences in venture validation/creation through team-based work around themes. Illustrates best practices in identifying and validating health venture opportunities amid challenges of navigating healthcare complexity, team dynamics, and venture capital raising process. Intended for students from engineering, medicine, public health, and MBA programs. Video conference facilities provided to facilitate remote participation by Executive MBA and traveling students.

  • Harvard Kennedy School

    API 222 - Machine Learning and Big Data Analytics 

    Fall Semester Course

    In the last couple of decades, the amount of data available to organizations has significantly increased. Individuals who can use this data together with appropriate analytical techniques can discover new facts and provide new solutions to various existing problems. This course provides an introduction to the theory and applications of some of the most popular machine learning techniques. It is designed for students interested in using machine learning and related analytical techniques to make better decisions in order to solve policy and societal level problems.

    We will cover various recent techniques and their applications from supervised, unsupervised, and reinforcement learning. In addition, students will get the chance to work with some data sets using software and apply their knowledge to a variety of examples from a broad array of industries and policy domains. Some of the intended course topics (time permitting) include: K-Nearest Neighbors, Naive Bayes, Logistic Regression, Linear and Quadratic Discriminant Analysis, Model Selection (Cross Validation, Bootstrapping), Support Vector Machines, Smoothing Splines, Generalized Additive Models, Shrinkage Methods (Lasso, Ridge), Dimension Reduction Methods (Principal Component Regression, Partial Least Squares), Decision Trees, Bagging, Boosting, Random Forest, K-Means Clustering, Hierarchical Clustering, Neural Networks, Deep Learning, and Reinforcement Learning.

    DPI 617 - Law, Order, and Algorithms (HKS: 4 credits; HMS: 4 credits)

    Fall Semester Course

    Data Science Selective

    Data and algorithms are rapidly transforming law enforcement and the criminal legal system, including how police officers are deployed, how discrimination is detected, and how sentencing, probation, and parole terms are set. Modern computational and statistical methods offer the promise of greater efficiency, equity, and transparency, but their use also raises complex legal, social, and ethical questions. In this course, we examine the often subtle relationship between law, public policy, and technology, drawing on recent court decisions, and applying methods from machine learning and game theory. We survey the legal and ethical principles for assessing the equity of algorithms, describe computational techniques for designing fairer systems, and consider how anti-discrimination law and the design of algorithms may need to evolve to account for machine bias. Concepts will be developed in part through guided in-class coding exercises, though prior programming experience is not necessary.