4CE ("fore-see"): Consortium for Clinical Characterization of COVID-19 by EHR 
In collaboration with the i2b2 Foundation, Zak Kohane and DBMI faculty members Weber, Cai, Palmer, Brat, Gehlenborg, and Avillach have convened this international volunteer consortium across 9 countries and more than 300 hospitals to study the clinical course of COVID from early 2020 to the present. Now also investigating clinical course of "long COVID" using data from the electronic health record. View data and publications.

4D Nucleome (4DN) Data Coordination and Integration Center (DCIC) Data Portal
Funded by the NIH Common Fund and led by DBMI’s Peter Park, the DCIC Data Portal supports study of the three-dimensional organization of the nucleus in space and time (“the 4th dimension”) by collecting, storing, curating, displaying and analyzing data generated by the 4DN Network.

AI for the Community
DBMI Assistant Professor Pranav Rajpurkar has developed a series of tools and opportunities open to the scientific community for insights into state-of-the-art developments in this field:

  • Medical AI Bootcamp: A program for closely mentored research at the intersection of AI and Medicine.  Open to students at Harvard & Stanford and to medical doctors around the world.
  • Doctor Penguin Newsletter: Catch the latest research in AI + Health with a newsletter, curated weekly from the top scientific venues.
  • AI Health Podcast: Conversations with entrepreneurs, investors and scientists covering the latest in AI+Health in industry and academia.
  • MAIDA — Medical AI Data for All: Coming soon to the research community, this comprehensive data set of patient radiology images is being created by an international partnership initiative. Designed to address limitations of existing data resources, MAIDA will be a carefully curated, deidentified diverse library of data sets inclusive of relevant clinical information and granular, high-quality annotations.  41 hospitals from 17 countries have already contributed relevant image sets to this collaborative effort.

Berkowitz Living Laboratory
This collaboration between DBMI and the innovation arm of the largest health maintenance organization in Israel—Clalit—is co-led by Zak Kohane (DBMI) and Ran Balicer (Clalit). We use clinical data from health systems in Israel and the US along with the full armamentarium of AI/ML approaches, including foundation modelling, and genomic medicine to explore medical mysteries of individual patients and to investigate populations for unique findings.

CELEHS (Center for Learning Health Care Systems)
Founded by DBMI Professor Tianxi Cai in 2018, this translational data science core provides analytic and predictive tools that leverage biomedical data to improve efficiency and accuracy of healthcare delivery.

Catalyzed by Peter Kharchenko, PhD, then Omenn Associate Professor at DBMI and supported by funding from the Blavatnik Clinical Pilots Initiative, Cellenics is an open-source platform aimed at empowering research biologists to analyze their own single-cell RNA sequencing (scRNA-seq) datasets. This user-facing application for project management, data processing, quality control, and visualization was developed as part of a collaborative effort between DBMI, the Harvard Medical School Single Cell Core, and the Harvard TH Chan School of Public Health Bioinformatics Core, with input from the HMS Biopolymers Facility and Center for Computational Biomedicine (CCB). Cellenics modules enable in-depth data exploration through differential expression and pathway analyses as well as the generation of fully customizable plots for publication.

Center for Computational Biomedicine (CCB) 
Led by Professor of Practice Dr. Robert Gentleman, the Center for Computational Biomedicine (CCB) provides cutting edge computational capabilities, data analysis, and data integration technologies to support medical and biological research and has a strong mandate to serve as a hub for computational science education across HMS. We are a multi-disciplinary team of computational and quantitative scientists that develops shared data and analytic resources to serve a broad constituency. Additionally, CCB provides project collaboration and host skills workshops and guided learnings for HMS Graduate students, postdoctoral fellows, research staff and faculty across all departments of the HMS quadrangle. Check out the CCB website or send email to Center Administrator Jaclyn Mallard to learn more.

Computational Genome Analysis Platform (CGAP)
The Computational Genome Analysis Platform (CGAP) was created by DBMI Professors Peter Park and Shamil Sunyaev, Project Managers Drs. Dana Vuzman and Dominik Glodznik and team, and supported by the Blavatnik Clinical Initiatives Project. CGAP is an intuitive, open-source analysis tool designed to support complex research and clinical genomics workflows and, ideally, transform sequencing data into actionable genetic insights. DBMI’s Undiagnosed Diseases Network (UDN) Coordinating Center has utilized CGAP’s customized rare disease variant callers to analyze thousands of whole genome samples and facilitate discovery across the UDN cohort; and, under the direction of PI Dana Vuzman, PhD, is currently supporting a project funded by the Multiple Systems Atrophy (MSA) Coalition to conduct cohort analysis of an international patient data repository.

From the Farhat Lab, a user-friendly genome-based predictor for tuberculosis research powered by machine learning. (Also see news article: Got Resistance)

Integrative Multiomics-Histopathology Analysis Tools 
From the Kun Hsing-Yu Lab.

MOMA: Multi-omics Multi-cohort Assessment Platform
From the Yu Lab. (Also see news article: AI Tool Predicts Colon Cancer Survival, Treatment Response

National NLP Clinical Challenges (N2C2)
Created in 2005 by the NIH-funded i2b2 National Center for Biomedical Computing, DBMI provides a customized trove of annotated, deidentified “real world” patient data sets for use by the NLP Community for development and testing of theoretical and applied algorithms.  Led by colleague Ozlem Uzuner, Associate Professor and Chair of the Department of Information Life Sciences and Technology at George Mason University, DBMI also sponsors annual NLP Data Challenges addressing questions of current interest to the international community.  Newly annotated data from these Challenges are typically available to the community the year following the actual Challenge.

People-Powered Medicine (PPM)

  • PPM Network of Enigmatic Exceptional Responders (NEER)
    Out of the millions who suffer from cancer each year, there are a few hundred that experience miraculous outcomes: living long lives after widely metastatic lung cancer, brain cancer and others. NEER investigates these enigmatic exceptional responders with their critical help and consent by using a precision medicine approach ranging from germline and somatic genomics to immunological profiling, microbiome assays and lifestyle assessments.
  • PPM Rheumatoid Arthritis Non-responders to Treatments (RANT)
    Despite the successful impact of biologic drugs to treat this autoimmune disease a significant subset of patients do not respond to this therapeutic approach, requiring continued reliance on high dose steroids for therapy.  PI Kat Liao, Associate Professor of Biomedical Informatics and Rheumatology (BWH), leads this study which is actively recruiting “non-responders” for a comprehensive study using both phenotypic (EHRs) and genomic (WGS) data from this cohort.
  • PPM Heart
    Patients are most successful when they or their families/proxies are as aware of the best practices in care and particularly primary prevention. In PPM-Heart we use the access patients now have to their own clinical data to drive broader implementation of primary prevention in chronic diseases such as dyslipidemia, hypertension and diabetes mellitus. 

Renal Cell Carcinoma Molecular Classification and Prognostic Prediction Pipeline
From the Yu Lab

Therapeutics Data Commons
Funded by the NSF and led by DBMI Assistant Professor Marinka Zitnik, the TDC is a coordinated initiative to access and evaluate artificial intelligence capability across therapeutic modalities and stages of discovery. The Commons is a resource with AI-solvable tasks, AI-ready datasets, and curated benchmarks, providing an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All resources are integrated via an open Python library.

Undiagnosed Diseases Network (UDN) Coordinating Center
Historically led by PI Zak Kohane, this NIH-supported national initiative has successfully created a national network across 12 major US academic centers coast-to-coast dedicated to diagnosing the undiagnosed using genomics, machine-learning, transcriptomics, model organisms and expert clinicians. To date over 2,000 applicants have been evaluated and 627 have received definitive diagnoses.  The Coordinating Center has just been successfully renewed and will embrace a distributed model with partners at University of Alabama Birmingham, University of Utah, Washington University, Stanford Medical Center, and Morehouse School of Medicine. A public-facing not-for-profit partner foundation (UDNF) is now actively engaged in raising awareness and funds for this important initiative.

Visualization Tools
The Gehlenborg Lab at DBMI has generated a comprehensive library of tools designed to simplify analysis and interpretation of fine-grained genomic data.  Notable amongst these is the NIH initiative, Human Biomolecular Atlas Program (HuBMAP) Public Data Portal, that provides a central resource for discovery, visualization and download of single cell tissue data generated by the HuBMAP consortium to insure only high-quality data is released to the community.  These tools currently include the following components:

  • Vitesse – a visual integration tool for exploration of spatial single-cell experiments
  • Viv – a library for web-based visualization of highly multiplexed, high bit depth, high resolution imaging data directly from OME-TIFF files and Bio-Formats-compatible Zarr stores.
  • Pro-vis – a wrapper for 4DN provenance visualization