The following is a list of electives that students in the Biomedical Informatics program have taken in the past. These courses are not guaranteed to run every year and serve only as examples of suitable elective courses. Students should choose electives with the consultation of the Program Director.
Harvard Business School
Field Course: Transforming Health Care Delivery, Course Number 6219 (HBS: 3 credits, HMS: 2 credits)
NOT OFFERED 2020 - 2021
At the root of the transformation occurring in the health care industry-both in the United States and internationally-is the fundamental challenge of improving clinical outcomes while controlling costs. Addressing this challenge will require dramatic improvements in the processes by which care is delivered to patients. This will, in turn, involve changing the organization of delivery, developing new approaches to performance measurement, and reimagining the ways in which providers are paid. This course will equip students with the tools required to design and implement these improvements.
Harvard Faculty of Arts and Sciences
Fall Semester Course
Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis, generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication , summarizing results through visualization, stories, and interpretable summaries. Recommended: Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).
Cross-listed as COMPSCI 109A and STAT 121A
Spring Semester Course
Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.
Cross-listed as COMPSCI 109B
Fall Semester Course
Data Science Selective
In-depth study of genomics: models of evolution and population genetics; comparative genomics: analysis and comparison; structural genomics: protein structure, evolution and interactions; functional genomics, gene expression, structure and dynamics of regulatory networks.
Also listed as HST.508. Harvard students should enroll through Biophysics 170.
Fall Semester Course
Data Science Selective
This course focuses on quantitative aspects of genetics and genomics, including computational and statistical methods of genomic analysis. We will introduce basic concepts and discuss recent progress in population and evolutionary genetics and cover principles of statistical genetics of Mendelian and complex traits. We will then introduce current genomic technologies and key algorithms in computational biology and bioinformatics. We will discuss applications of these algorithms to genome annotation and analysis of epigenomics, cancer genomics and metagenomics data. Proficiency in programming and basic knowledge of genetics and statistics will be assumed.
Spring Semester Course
Biomedical image analysis is undergoing a paradigm shift due to artificial intelligence and deep learning. This course will cover basic concepts of deep learning and convolutional neural networks for biomedical image analysis as well as current challenges and opportunities. The lectures will include fundamentals of classification, characterization, detection, segmentation and enhancement in biomedical images. Using a variety of different microscopy and pathology datasets the course will follow a ‘learning-by-doing’ model where each lecture will be accompanied by hand on training in using these methods in practice. The course assumes no prior knowledge of deep learning or image analysis. Basic knowledge of python is recommended but not required.
Spring Semester Course
Usability and design as keys to successful technology. Covers user observation techniques, needs assessment, low and high fidelity prototyping, usability testing methods, as well as theory of human perception and performance, and design best practices. Focuses on understanding and applying the lessons of human interaction to the design of usable systems; will also look at lessons to be learned from less usable systems. The course includes several small and one large project.
Fall Semester Course
Artificial Intelligence (AI) is an exciting field that has enabled a wide range of cutting-edge technology, from driverless cars to grandmaster-beating Go programs. The goal of this course is to introduce the ideas and techniques underlying the design of intelligent computer systems. Topics covered in this course are broadly be divided into 1) planning and search algorithms, 2) probabilistic reasoning and representations, and 3) machine learning (although, as you will see, it is impossible to separate these ideas so neatly). Within each area, the course will also present practical AI algorithms being used in the wild and, in some cases, explore the relationship to state-of-the-art techniques. The class will include lectures connecting the models and algorithms we discuss to applications in robotics, computer vision, and speech processing. Recommended prep: CS 51; Stat 110 (may be taken concurrently).
Spring Semester Course
Computational science has become a third partner, together with theory and experimentation, in advancing scientific knowledge and practice, and an essential tool for product and process development and manufacturing in industry. Big data science adds the ‘fourth pillar’ to scientific advancements, providing the methods and algorithms to extract knowledge or insights from data. The course is a journey into the foundations of Parallel Computing at the intersection of large-scale computational science and big data analytics. Many science communities are combining high performance computing and high-end data analysis platforms and methods in workflows that orchestrate large-scale simulations or incorporate them into the stages of large-scale analysis pipelines for data generated by simulations, experiments, or observations. This is an applications course highlighting the use of modern computing platforms in solving computational and data science problems, enabling simulation, modeling and real-time analysis of complex natural and social phenomena at unprecedented scales. The class emphasizes on making effective use of the diverse landscape of programming models, platforms, open-source tools, computing architectures and cloud services for high performance computing and high-end data analytics.
Spring Semester Course
Scaling computation over parallel and distributed computing systems is a rapidly advancing area of research receiving high levels of interest from both academia and industry. The objective can be for high-‐performance computing and energy-‐efficient computing (“green” data center servers as well as small embedded devices). In this course, students will learn principled methods of mapping prototypical computations used in machine learning, the Internet of Things, and scientific computing onto parallel and distributed compute nodes of various forms. These techniques will lay the foundation for future computational libraries and packages for both high-‐performance computing and energy-‐efficient devices. To master the subject, students will need to appreciate the close interactions between computational algorithms, software abstractions, and computer organizations. After having successfully taken this course, students will acquire an integrated understanding of these issues. The class will be organized into the following modules: Big picture: use of parallel and distributed computing to achieve high performance and energy efficiency; End-‐to-‐end example 1: mapping nearest neighbor computation onto parallel computing units in the forms of CPU, GPU, ASIC and FPGA; Communication and I/O: latency hiding with prediction, computational intensity, lower bounds; Computer architectures and implications to computing: multi-‐cores, CPU, GPU, clusters, accelerators, and virtualization; End-‐to-‐end example 2: mapping convolutional neural networks onto parallel computing units in the forms of CPU, GPU, ASIC, FPGA and clusters; Great inner loops and parallelization for feature extraction, data clustering and dimension reduction: PCA, random projection, clustering (K-‐means, GMM-‐EM), sparse coding (K-‐SVD), compressive sensing, FFT, etc.; Software abstractions and programming models: MapReduce (PageRank, etc.), GraphX/Apache Spark, OpenCL and TensorFlow; Advanced topics: autotuning and neuromorphic spike-‐based computing. Students will learn the subject through lectures/quizzes, programming assignments, labs, research paper presentations, and a final project. Students will have latitude in choosing a final project they are passionate about. They will formulate their projects early in the course, so there will be sufficient time for discussion and iterations with the teaching staff, as well as for system design and implementation. Industry partners will support the course by giving guest lectures and providing resources. The course will use server clusters at Harvard as well as external resources in the cloud. In addition, labs will have access to state-‐of-‐the-‐art IoT devices and 3D cameras for data acquisition. Students will use open source tools and libraries and apply them to data analysis, modeling, and visualization problems.
Spring Semester Course
Special topics course. Focus of course changes year to year.
Not offered 2020 - 2021
Surveys biologically-inspired approaches to designing distributed systems. Focus is on biological models, algorithms, and programming paradigms for self-organization. Topics vary year to year, and usually include: (1) swarm intelligence: social insects and animal groups, with applications to networking and robotics, (2) cellular computing: including cellular automata/amorphous computing, and applications like self-assembling robots and programmable materials, (3) evolutionary computation and its application to optimization and design. Recommended Prep: Students should have a familiarity/experience with computer systems (e.g. software, networking) and algorithms/analysis through classes and/or internship experiences. Background in biology not required.
Spring Semester Course
Focus on translational medicine: the application of basic genetic discoveries to human disease. Each three-hour class will focus on a specific genetic disorder and the approaches currently used to speed the transfer of knowledge from the laboratory to the clinic. Each class will include a clinical discussion, a patient presentation if appropriate, followed by lectures, a detailed discussion of recent laboratory findings and a student led journal club. Lecturers will highlight current molecular, technological, bioinformatic and statistical approaches that are being used to advance the study of human disease. There is no exam. Students will present one paper per session in a journal club style. Attendance and active participation for the duration of all class meetings is required. If you are unable to attend class, or cannot be present for the entire session you are expected to contact the course instructor. Two incomplete or missed sessions will result in a failing grade.
MICROBI 302QC - Introduction to Infectious Disease Research: Infectious Diseases Consortium Bootcamp (FAS: 2 credits, HMS: 2 credits)
January Term Course
This January boot camp course provides a fun, informative introduction to the breadth of infectious disease research carried out at Harvard and beyond. Students will have the chance to meet faculty, students, and fellows in infectious disease roles across the university. The course will focus on several aspects of infectious diseases:
1. Underlying biology of infectious diseases and the diverse pathogens that cause them
2. Modern approaches to studying infectious diseases, including experimental biology, epidemiology, bioinformatics, and clinical microbiology
3. Modern approaches to developing new interventions, including drugs, vaccines, diagnostics, and public health measures
Fall Semester Course
This course is designed to follow CS 181 and go into further depth on the statistical aspects of supervised learning: given what we know about our data and where it came from, how can we choose the machine learning method that will predict best on future data? Topics include the ``no free lunch" theorems, linear methods for regression and classification, shrinkage and sparsity, splines and kernel smoothing, model selection and cross-validation, additive models and trees, boosting, bagging and random forests. Recommended prep: Statistics 111 and Computer Science 181
Harvard Medical School
BMI 720 - Introduction to Clinical Informatics (HMS: 4 credits)
BMI 741 - Health Information Technology: From Ideation to Implementation (HMS: 4 credits)
Harvard T.H. Chan School of Public Health
Covers basic statistical techniques that are important for analyzing data arising from epidemiology, environmental health and biomedical and other public health-related research. Major topics include descriptive statistics, elements of probability, introduction to estimation and hypothesis testing, nonparametric methods, techniques for categorical data, regression analysis, analysis of variance, and elements of study design. Applications are stressed. Designed as an alternate to BIO200, for students desiring more emphasis on theoretical developments. Background in algebra and calculus strongly recommended.
Topics include model interpretaion, model building, and model assessment for linear regression with continuous outcomes, logistic regression with binary outcomes, and proportional hazards regression with survival time outcomes. Specific topics include regression diagnostics, confounding and effect modification, goodness of fit, data transformations, splines and additive models, ordinal, multinomial, and conditional logistic regression, generalized linear models, overdispersion, Poisson regression for rate outcomes, hazard functions, and missing data. The course will provide students with the skills necessary to perform regression analyses and to critically interpret statistical issues related to regression applications in the public health literature. Prerequisites: ID 201 or BST201 or (BST202 and BST203) or (BST206 and (BST207 or BST208)) or permission of instructor.
Covers research design, sample selection, questionnaire construction, interviewing techniques, the reduction and interpretation of data, and related facets of population survey investigations. Focuses primarily on the application of survey methods to problems of health program planning and evaluation. Treatment of methodology is sufficiently broad to be suitable for students who are concerned with epidemiological, nutritional, or other types of survey research. Formerly BIO212
This course will introduce students involved with clinical research to the practical application of multiple regression analysis. Linear regression, logistic regression and proportional hazards survival models will be covered, as well as general concepts in model selection, goodness-of-fit, and testing procedures. Each lecture will be accompanied by a data analysis using SAS and a classroom discussion of the results. The course will introduce, but will not attempt to develop the underlying likelihood theory. Background in SAS programming ability required.
Designed for individuals interested in the scientific, policy, and management aspects of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, and interpretation of results. Students design a clinical investigation in their own field of interest, write a proposal for it, and critique recently published medical literature. Course Prerequisites: BIO201 or ID200 or ID201 or ID207 or BIO202&203 or BIO206&207 or BIO206&208 or BIO206&209. Formerly BIO214
This course is intended for students who are already very comfortable with fundamental techniques in statistics. The course will cover methods for building and interpreting linear regression models, including statistical assumptions and diagnostics, estimation and testing, and model building techniques. These models will be extended to handle data arising from longitudinal studies employing repeated measurement of subjects over time. Summer/Residential Course Note (Section 1): Lectures will be accompanied by computing exercises using the SAS statistical package. Online Course Note (Section 2): Lectures will be accompanied by computing exercises using the Stata statistical package. Course Prerequisites: EPI522 or BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO501
The goal of this course is to enable scientists and public health professionals who already have an introductory background in biostatistics and clinical trials to acquire the competencies in quantitative skills and systems thinking required to understand and participate in drug development and regulatory review processes. The course illustrates how statistical and quantitative methods are used to transform information into evidence demonstrating the safety, efficacy and effectiveness of drugs and devices over the course the product’s life cycle from a regulatory perspective. Content is delivered using a blended-learning approach involving lectures, web-based media and selected case study examples derived from actual FDA decision-making and regulatory assessments to highlight and describe each phase of the regulatory drug approval process. Case studies will illustrate regulatory science in action and practice and will include content publicly available from the FDA’s website that can be used in conjunction with FDA science-based guidance and decision precedents. Course Prerequisites: ID538 or [(BIO200 or ID200 or BIO201 or BIO202&203 or BIO206&207/8/9) and (EPI200 or EPI201 or EPI208 or EPI505).] Formerly BIO523
This course will provide a basic, yet thorough introduction to the probability theory and mathematical statistics that underlie many of the commonly used techniques in public health research. Topics to be covered include probability distributions (normal, binomial, Poisson), means, variances and expected values, finite sampling distributions, parameter estimation (method of moments, maximum likelihood), confidence intervals, hypothesis testing (likelihood ratio, Wald and score tests). All theoretical material will be motivated with problems from epidemiology, biostatistics, environmental health and other public health areas. This course is aimed towards second year doctoral students in fields other than Biostatistics. Background in algebra and calculus required. Course Prerequisites: BST210 or BST213. Formerly BIO222
Topics will include types of censoring, hazard, survivor, and cumulative hazard functions, Kaplan-Meier and actuarial estimation of the survival distribution, comparison of survival using log rank and other tests, regression models including the Cox proportional hazards model and the accelerated failure time model, adjustment for time-varying covariates, and the use of parametric distributions (exponential, Weibull) in survival analysis. Methods for recurrent survival outcomes and competing risks will also be discussed, as well as design of studies with survival outcomes. Class material will include presentation of statistical methods for estimation and testing along with current software (SAS, Stata) for implementing analyses of survival data. Applications to real data will be emphasized. Course Prerequisite(s): BST210 or BST213 or BST 230, or permission of instructor required. BST 213 may be taken concurrently. Formerly BIO223
This course covers modern methods for the analysis of repeated measures, correlated outcomes and longitudinal data, including the unbalanced and incomplete data sets characteristic of biomedical research. Topics include an introduction to the analysis of correlated data, analysis of response profiles, fitting parametric curves, covariance pattern models, random effects and growth curve models, and generalized linear models for longitudinal data, including generalized estimating equations (GEE) and generalized linear mixed effects models (GLMMs).Course Activities: Homework assignments will focus on data analysis in SAS using PROC GLM, PROC MIXED, PROC GENMOD, and PROC GLIMMIX. Course Note: Lab or section times will be announced at first meeting. Course Prerequisite(s): BIO210 or BIO211 or BIO213 or BIO232. Formerly BIO226
This course introduces students to the diverse statistical methods used throughout the process of statistical genetics, from familial aggregation and segregation studies to linkage scans and association studies. Topics covered include basic principles from population genetics, multipoint and model-free linkage analysis, family-based and population-based association testing, and Genome Wide Association analysis. Instructors use ongoing research into the genetics of respiratory disease, psychiatric disorders and cancer to illustrate basic principles. Weekly homework supplements reading, course lectures, discussion and section. Relevant concepts in genetics and molecular genetics will be reviewed in lectures and labs. The emphasis of the course is fundamental principles and concepts. Course Prerequisites: BST210 (concurrent enrollment allowed)Course Note: There will be a weekly lab section; the time will be scheduled at first meeting. Formerly BIO227
This course is a practical introduction to the Bayesian analysis of biomedical data. It is an intermediate Master’s level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Bayesian computing: Markov Chain Monte Carlo; STAN R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data. Programming and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST210 and BST222, or permission of the instructor. Not currently offered.
Axiomatic foundations of probability, independence, conditional probability, joint distributions, transformations, moment generating functions, characteristic functions, moment inequalities, sampling distributions, modes of convergence and their interrelationships, laws of large numbers, central limit theorem, and stochastic processes.
A fundamental course in statistical inference. Discusses general principles of data reduction: exponential families, sufficiency, ancillarity and completeness. Describes general methods of point and interval parameter estimation and the small and large sample properties of estimators: method of moments, maximum likelihood, unbiased estimation, Rao-Blackwell and Lehmann-Scheffe theorems, information inequality, asymptotic relative efficiency of estimators. Describes general methods of hypothesis testing and optimality properties of tests: Neyman-Pearson theory, likelihood ratio tests, score and Wald tests, uniformly and locally most powerful tests, asymptotic relative efficiency of tests. Course Note: Lab or section time to be announced at first meeting; cross-listed: HSPH student must register for HSPH course. Course Prerequisite(s): BIO230 (concurrent enrollment allowed). Formerly BIO231
Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed. Course Prerequisite(s): Instructor’s Permission. Formerly BIO514
An advanced course in linear models, including both classical theory and methods for high dimensional data. Topics include theory of estimation and hypothesis testing, multiple testing problems and false discovery rates, cross validation and model selection, regularization and the LASSO, principal components and dimensional reduction, and classification methods. Background in matrix algebra and linear regression required. Prerequisite: BST 231 and BST 233, or permission of instructor required. Formerly BIO235
A foundational course in measure theoretic probability. Topics include measure theory, Lebesgue integration, product measure and Fubini’s Theorem, Radon-Nikodym derivatives, conditional probability, conditional expectation, limit theorems on sequences of random stochastic processes, and weak convergence. Course Prerequisites: BST231 or permission from the instructor required. Formerly BIO250
Sequel to BIO 231. Considers several advanced topics in statistical inference. Topics include limit theorems, multivariate delta method, properties of maximum likelihood estimators, saddle point approximations, asymptotic relative efficiency, robust and rank-based procedures, resampling methods, and nonparametric curve estimation. Course Note: Cross-listed, HSPH must register for HSPH course. Course Prerequisites: BIO231 and BIO250, or permission of instructor required. Formerly BIO251
Presents classical and modern approaches to the analysis of multivariate observations, repeated measures, and longitudinal data. Topics include the multivariate normal distribution, Hotelling’s T2, MANOVA, the multivariate linear model, random effects and growth curve models, generalized estimating equations, statistical analysis of multivariate categorical outcomes, and estimation with missing data. Discusses computational issues for both traditional and new methodologies. Course Note: Cross-listed, HSPH student must register for HSPH course. Course Prerequisite: BIO231 and BIO235, or permission of the instructor are required. Formerly BIO245
BST247 is a seminar style course with readings selected from the literature in areas of expertise of the participating faculty. Content may vary from year to year. The specific objectives are (1) To train students to critically read foundational papers and current journal articles in Statistical Genetics, (2) To train students to present sophisticated ideas to an audience of peers, and (3) To prepare students to engage in doctoral level research in the area. After the course, students are expected to have an in-depth and broad understanding on important topics of statistical genetics research. Course Prerequisite(s): BIO227 and (BIO231 or EPI511). BIO231 may be taken concurrently. Formerly BIO257
General principles of the Bayesian approach, prior distributions, hierarchical models and modeling techniques, approximate inference, Markov chain Monte Carlo methods, model assessment and comparison. Bayesian approaches to GLMMs, multiple testing, nonparametrics, clinical trials, survival analysis.
This course is the second course in the foundational sequence of the School’s newly approved Master’s Degree in Health Data Science. The course will build upon our existing course, BST260 Introduction to Data Science, in presenting a set of tools for modeling and understanding complex datasets. Specifically, the course will provide practical regression and tree-based techniques for big data. Specific topics that will be covered include: linear model selection and regularization: LASSO and regularization; principal component regression and partial least squares; tree-based methods: decision trees; bagging, random forests, and boosting; unsupervised learning: principal components analysis, cluster analysis. Programming (Python and R) and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST260 or permission of instructor.
Big data is everywhere, from Omics and Health Policy to Environmental Health. Every single aspect of the Health Sciences is being transformed. However, it is hard to navigate and critically assess tools and techniques in such a fast-moving big data panorama. In this course, we are going to give a critical presentation of theoretical approaches and software implementations of tools to collect, store and process data at scale. The goal is not just to learn recipes to manipulate big data but learn how to reason in terms of big data, from software design and tool selection to implementation, optimization and maintenance.
Many systems of scientific and societal interest consist of a large number of interacting components. The structure of these systems can be represented as networks where network nodes represent the components and network edges the interactions between the components. Network analysis can be used to study how pathogens, behaviors and information spread in social networks, having important implications for our understanding of epidemics and the planning of effective interventions. In a biological context, at a molecular level, network analysis can be applied to gene regulation networks, signal transduction networks, protein interaction networks, and more. This introductory course covers some basic network measures, models, and processes that unfold on networks. The covered material applies to a wide range of networks, but we will focus on social and biological networks. To analyze and model networks, we will learn the basics of the Python programming language and its Network X module. The course contains a number of hands-on computer lab sessions. There are five homework assignments and four reading assignments that will be discussed in class. In addition, each student will complete a final project that applies network analysis techniques to study a public health problem. Course Prerequisites: BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208. Formerly BIO521
This course is an introduction to modern statistical computing techniques used to characterize and interpret cancer genome sequencing datasets. This Master’s level course will begin with a basic introduction to DNA, genes, and genomes for students with no biology background. It will then introduce cancer as an evolutionary process and review landmarks in the history of cancer genetics, and discuss the basics of sequencing technology and modern Next Generation Sequencing. The course will cover the main steps involved in turning billions of short sequencing reads into a representation of the somatic genetic alterations characterizing an individual patient’s cancer, and will build on this foundation to study topics related to identifying mutations under positive selection from multiple tumors sampled in a population. By the end of the course, students will be able to apply state-of-the art analysis to cancer genome datasets and to critically evaluate papers employing cancer genome data.
Epigenetics is a fast growing field, with increasing applicability in environmental and epidemiology studies, focusing on the alterations in chromatin structure that can stably and heritably influence gene expression. Epigenetic changes can be as profound as those exerted by mutation, but, unlike mutations, are reversible and responsive to environmental influences. The course will focus on epigenetic mechanisms and laboratory methods for DNA methylamine, his tone modifications, small non-coding RNAs, and epigenomics. Ongoing experimental, and epidemiology studies (cohort, case-control, cross-sectional and repeated measurement studies) will be presented to introduce the students to the epigenetic effects in prenatal/early and adult life of environmental factors, including air pollution, metals, pesticides, benzene, PCBs, persistent organic pollutants, and diet. The course will enable them to understand and apply epigenetic methods in multiple areas, including cardiovascular and respiratory disease, aging, reproductive health, inflammation/immunity, and cancer.
EPI201 introduces the principles and methods used in epidemiologic research. The course discusses the conceptual and practical issues encountered in the design and analysis of epidemiologic studies for description and causal inference. EPI201 is the first course in the series of methods courses designed for students majoring in Epidemiology, Biostatistics and related fields, and those interested in a detailed introduction to the design and conduct of epidemiologic studies. Students who take EPI201 are expected to take EPI202 (Methods II). Course Note: Thursday or Friday lab required.
This course will present an introduction to the methods of data mining and predictive modeling, with applications to both genetic and clinical data. Basic concepts and philosophy of supervised and unsupervised data mining as well as appropriate applications will be discussed. Topics covered will include multiple comparisons adjustment, cluster analysis, principal component analysis, and predictive model building through logistic regression, classification and regression trees (CART), multivariate adaptive splines (MARS), neural networks, random forests, and bagging and boosting. Course Activities: Computer labs. Course Note: Students should be familiar with logistic regression.
Like all living things, pathogens have evolved by natural selection. The application of evolutionary principles to infectious disease epidemiology is crucial to such diverse subjects as outbreak analysis, the understanding of how different genomic combinations of virulence and drug resistance determinants emerge, and how selection acts to produce successful pathogens that balance the costs and benefits of virulence and transmission. The goal of this course is to introduce basic evolutionary concepts, highlighting the importance of transmission to the fitness as illustrated by comparisons of the adaptive process among different sorts of pathogenic microorganisms. Students will also learn the basics of phylogenetic sequence analysis for the study of outbreaks and transmission, and the construction of simple mathematical models that probe the adaptive process.Students outside of HSPH must request instructor permission to enroll in this course.
This is an introductory level class on the analysis of mortality, fertility and population change. It is required for all master’s and doctoral students in the department of Global Health and Population. Students are introduced to the core literature in this field through lectures, and assigned readings selected from peer-reviewed journals and textbooks. Together, these provide a graduate-level introduction to the principle sources and characteristics of population data and to the essential methods used for the analysis of population problems. The emphasis throughout is on understanding the key processes, models and assumptions used primarily for the analysis of demographic components. Practical training will be given through a required weekly laboratory session, assignments, and a final examination. Examples presented in class and used in assignments are drawn from several countries, combining both developed and developing in assignments are drawn from several countries, combining both developed and developing world realities.
Designed to bring students to an intermediate-level understanding of microeconomic theory. Emphasizes the uses and limitations of the economic approach, with applications to public health.
This course is designed to introduce the student to the methods and growing range of applications of decision analysis and cost-effectiveness analysis in health technology assessment, medical and public health decision making, and health resource allocation. The objectives of the course are: (1) to provide a basic technical understanding of the methods used, (2) to give the student an appreciation of the practical problems in applying these methods to the evaluation of clinical interventions and public health policies, and (3) to give the student an appreciation of the uses and limitations of these methods in decision making at the individual, organizational, and policy level both in developed and developing countries.
This course is designed to provide students in public health and social science with an overview of the theory and research on the role of communication in health in the 21st century. The role of communication in public health will be examined both as a product of everyday interaction with communication platforms including mass media and messages, and its planned use to accomplish particular public health goals. Research examined here looks both at planned and unplanned effects of communication in a variety of health situations representing a range of public health topical concerns.
Massachusetts Institute of Technology
Introduces representations, methods, and architectures used to build applications and to account for human intelligence from a computational point of view. Covers applications of rule chaining, constraint propagation, constrained search, inheritance, statistical inference, and other problem-solving paradigms. Also addresses applications of identification trees, neural nets, genetic algorithms, support-vector machines, boosting, and other learning paradigms. Considers what separates human intelligence from that of other animals. Students taking graduate version complete additional assignments.
Introduces the study of human language from a computational perspective, including syntactic, semantic and discourse processing models. Emphasizes machine learning methods and algorithms. Uses these methods and models in applications such as syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization. Students taking graduate version complete additional assignments.
Advanced study of topics in artificial intelligence. Specific focus varies from year to year. Consult department for details.
Seminar based on research literature. Papers covered are selected to illustrate important problems and varied approaches in the field of computational and systems biology, and to provide students a framework from which to evaluate new developments.
Also listed as MIT 6 .874 and 20 .390
Presents innovative approaches to computational problems in the life sciences, focusing on deep learning-based approaches with comparisons to conventional methods. Topics include protein-DNA interaction, chromatin accessibility, regulatory variant interpretation, medical image understanding, medical record understanding, therapeutic design, and experiment design (the choice and interpretation of interventions). Focuses on machine learning model selection, robustness, and interpretation. Teams complete a multidisciplinary final research project using TensorFlow or other framework. Provides a comprehensive introduction to each life sciences problem, but relies upon students understanding probabilistic problem formulations. Students taking graduate version complete additional assignments.
A guide for data scientists, engineers, and clinicians who are interested in performing retrospective research using data from electronic health records. Instruction provided in clinical decision-making and secondary use of clinical data, using the Medical Information Mart for Intensive Care (MIMIC) database and the eICU Collaborative Research Database. Covers steps in parsing a clinical question into a study design and methodology for data analysis and interpretation. Activities include review of case studies using the MIMIC and the eICU Collaborative Research Database and a team project. Student teams choose a question and clinician to work with for their project. Teams meet weekly with clinicians at the hospitals at arranged time.
Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice.
Spring Semester Course
Surveys key strategic decisions faced by managers, investors and scientists at each stage in the value chain of the life science industry. Aims to develop students' ability to understand and effectively assess these strategic challenges. Focuses on the biotech sector, with additional examples from the pharmaceutical and medical device sectors. Includes case studies, analytical models, and detailed quantitative analysis. Intended for students interested in building a life science company or working in the sector as a manager, consultant, analyst, or investor. Provides analytical background to the industry for biological and biomedical scientists, engineers and physicians with an interest in understanding the commercial dynamics of the life sciences or the commercial potential of their research.
Addresses healthcare entrepreneurship with an emphasis on startups bridging care re-design, digital health, medical devices, and high-tech. Includes prominent speakers and experts from key domains across medicine, pharma, med devices, regulatory, insurance, software, design thinking, entrepreneurship, and investing. Provides practical experiences in venture validation/creation through team-based work around themes. Illustrates best practices in identifying and validating health venture opportunities amid challenges of navigating healthcare complexity, team dynamics, and venture capital raising process. Intended for students from engineering, medicine, public health, and MBA programs. Video conference facilities provided to facilitate remote participation by Executive MBA and traveling students.
Harvard Kennedy School
In the last couple of decades, the amount of data available to organizations has significantly increased. Individuals who can use this data together with appropriate analytical techniques can discover new facts and provide new solutions to various existing problems. This course provides an introduction to the theory and applications of some of the most popular machine learning techniques. It is designed for students interested in using machine learning and related analytical techniques to make better decisions in order to solve policy and societal level problems.
We will cover various recent techniques and their applications from supervised, unsupervised, and reinforcement learning. In addition, students will get the chance to work with some data sets using software and apply their knowledge to a variety of examples from a broad array of industries and policy domains. Some of the intended course topics (time permitting) include: K-Nearest Neighbors, Naive Bayes, Logistic Regression, Linear and Quadratic Discriminant Analysis, Model Selection (Cross Validation, Bootstrapping), Support Vector Machines, Smoothing Splines, Generalized Additive Models, Shrinkage Methods (Lasso, Ridge), Dimension Reduction Methods (Principal Component Regression, Partial Least Squares), Decision Trees, Bagging, Boosting, Random Forest, K-Means Clustering, Hierarchical Clustering, Neural Networks, Deep Learning, and Reinforcement Learning.