Biostatistics – Biomedical Informatics – Big Data (B3D)

Co-organized by the Department of Biostatistics at the Harvard T.H. Chan School of Public Health and the Department of Biomedical Informatics at Harvard Medical School, the Biostatistics - Biomedical Informatics - Big Data (B3D) Seminar is a series of research talks on statistical, computational, and machine learning methods for analyzing large complex data sets, with a focus on applications in biomedical science and public health, including:

  • Genetics and genomics
  • Epidemiological and environmental health science
  • Comparative effective research
  • Electronic medical records
  • Digital health
  • Neuroscience
  • Social networks

The goal of the seminar is to provide a forum for brainstorming and exchanging ideas, and promoting interdisciplinary collaboration among researchers from a variety of disciplines such as biostatistics/statistics, biomedical informatics, computer science, computational biology, biomedicine, public health, social sciences, and other related areas. The seminar will feature local, national, and international speakers who are leaders in their field.

Selected Mondays

Minot Room
5th Floor, Room 518
HMS Countway Library (scan ID or sign in at desk if no ID, take elevator to fifth floor)

For complete details, visit

Recordings of talks will be made available on our YouTube channel.

B3D Mailing List – sign up to receive emails on the B3D Seminar Series and other news and events on big data and data science.

#harvardB3data - join the conversation via Twitter


Monday, May 6 | LAHEY ROOM 
Luwan Zhang, PhD
Postdoctoral Research Fellow
Harvard T.H. Chan School of Public Health

Automating Consensus Medical Knowledge Extraction using an Ensemble of Healthcare Data from Electronic Health Records, Insurance Claims and Medical Literature

The increasingly widespread adoption of Electronic Health Records (EHR) has enabled phenotypic information collection at an unprecedented granularity and scale. Despite its great promises in advancing clinical decision making, two major challenges remain to solve to fully unleash its power. The first challenge is that a medical concept (e.g. diagnosis, prescription, symptom) is often described in various synonyms, largely hindering data integrability and analysis reproducibility. The second comes from the inherent heterogeneity across different EHR systems that calls for an efficient combining solution for a more general and unbiased representative of the underlying network linking different medical concepts. In this talk, I will discuss some recent advances to solve these two challenges, including a novel spectral clustering method for grouping synonymous codes, and a graph learning algorithm for a consensus clinical knowledge graph discovery using multiple up-to-date data sources.