Biostatistics – Biomedical Informatics – Big Data (B3D)

Co-organized by the Department of Biostatistics at the Harvard T.H. Chan School of Public Health and the Department of Biomedical Informatics at Harvard Medical School, the Biostatistics - Biomedical Informatics - Big Data (B3D) Seminar is a series of research talks on statistical, computational, and machine learning methods for analyzing large complex data sets, with a focus on applications in biomedical science and public health, including:

  • Genetics and genomics
  • Epidemiological and environmental health science
  • Comparative effective research
  • Electronic medical records
  • Digital health
  • Neuroscience
  • Social networks

The goal of the seminar is to provide a forum for brainstorming and exchanging ideas, and promoting interdisciplinary collaboration among researchers from a variety of disciplines such as biostatistics/statistics, biomedical informatics, computer science, computational biology, biomedicine, public health, social sciences, and other related areas. The seminar will feature local, national, and international speakers who are leaders in their field.

Selected Mondays

Minot Room / Lahey Room 
5th Floor
HMS Countway Library (scan ID or sign in at desk if no ID, take elevator to fifth floor)

For complete details, visit

Recordings of talks will be made available on our YouTube channel.

B3D Mailing List – sign up to receive emails on the B3D Seminar Series and other news and events on big data and data science.

#harvardB3data - join the conversation via Twitter


Monday, October 28 | LAHEY ROOM 
Rachel Nethery, PhD
Assistant Professor, Biostatistics, Harvard T.H Chan School of Public Health

Causal Inference and Machine Learning Approaches for Evaluation of the Health Impacts of Large-Scale Air Quality Regulations
We develop a causal inference approach to estimate the number of adverse health events prevented by large-scale air quality regulations via changes in exposure to multiple pollutants. This approach is motivated by regulations that impact pollution levels in all areas within their purview. We introduce a causal estimand called the Total Events Avoided (TEA) by the regulation, defined as the difference in the expected number of health events under the no-regulation pollution exposures and the observed number of health events under the with-regulation pollution exposures. We propose a matching method and a machine learning method that leverage high-resolution, population-level pollution and health data to estimate the TEA. Our approach improves upon traditional methods for regulation health impact analyses by clarifying the causal identifying assumptions, utilizing population-level data, minimizing parametric assumptions, and considering the impacts of multiple pollutants simultaneously. To reduce model-dependence, the TEA estimate captures health impacts only for units in the data whose anticipated no-regulation features are within the support of the observed with-regulation data, thereby providing a conservative but data-driven assessment to complement traditional parametric approaches. We apply these methods to investigate the health impacts of the 1990 Clean Air Act Amendments in the US Medicare population.