The DBMI Open Insights Seminars are occasional research talks related to the mission of the Department of Biomedical Informatics. This includes themes such as:

  • Provisioning big data to the scientific community
  • Getting the big picture on human health
  • Learning from each patient
  • Advancing basic science with data science
  • Understanding disease beyond heredity: environmental impact
  • Instrumenting the health enterprise for discovery and intervention

This interdisciplinary seminar provides an open forum to engage participants in a discussion about the future of quantitative methods and engineering in biomedicine. The seminar features local, national, and international speakers who are leaders in their field and have an interest in engaging with the larger community while visiting DBMI to meet with colleagues.


Upcoming Seminars

Embedding Research to Accelerate Innovation and Healthcare Change

Tuesday, January 14, 2025
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Minot Room (5th Floor)
Download flyer

Douglas Corley, MD, PhD  
Chief Research Officer, The Permanente Medical Group; Kaiser Permanente, Northern California

Doug Corley headshotAbstract
Currently, Dr. Doug Corley is Kaiser Permanente Northern California’s inaugural Chief Research Officer, which includes developing collaborations between clinicians and traditional researchers within the 700 person Division of Research and a large clinical trials program for accelerating clinical change and developing learning healthcare systems. In this domain, he co-led integration of a novel Delivery Science and Applied Research (DARE) program that has over 200 active or completed projects in recent years. These efforts identify modifiable research questions likely to influence patient and system outcomes, develop collaborations between clinicians and traditional full-time scientists/epidemiologists, use extremely large data sources for relevant research, and develop mechanisms for rapid implementation (and re-evaluation) of research findings and intervention outcomes. In this discussion, we will evaluate the brief history of innovation, why it succeeds (and why it fails), and new strategies for accelerating discovery to change, with pragmatic successful examples.

Bio
Dr. Corley is a clinician scientist with expertise in building collaborative, transdisciplinary, diverse teams; developing large clinical and translational research programs; and educational development. Through multi-center, team-based collaborations and over 50 funded awards at Kaiser Permanente, Northern California, including >30 NIH, NCI and other federal grants/contracts, his research includes fundamental risk factor epidemiology in emerging diseases; broad-based disease pathway discovery in genetic and metabolic epidemiology; complex pharmacoepidemiology, including US Food and Drug Administration collaborations that modified drug safety labeling; randomized trials; and multi-level investigations into clinical care pathways. These resulted in multiple first and senior author publications in the New England Journal of Medicine, JAMA, Annals of Internal Medicine, and other leading journals.

Cumulatively, this work reframed knowledge for esophageal and colorectal cancer disease pathways. For esophageal adenocarcinoma, one of the most rapidly increasing cancers in recent decades, this collaborative work elucidated its carcinogenic and potential prevention pathways, using foundational epidemiology, metabolomics, and genetics (including one of the first large GWAS). For colorectal cancer, it created new understandings of disease progression, natural history, genetics, and screening effects, including gene-environment interactions. These findings changed guidelines for screening quality and tested methods that led to 50% reductions in post-screening colorectal cancer incidence and mortality. They also demonstrated one of the first approaches to largely eradicating demographic differences in cancer incidence and death through equitable application of a population-level program.


Past Seminars

Mosaic: An Architecture for Scalable & Interoperable Data Views – Dominik’s Version

Monday, December 9, 2024
12:30–1:30
Countway Library, Lahey Room (5th Floor)
Virtual option

Dominik Moritz, PhD
Assistant Professor, Human-Computer Interaction Institute
Carnegie Mellon University

Dominik MoritzAbstract: Mosaic is an architecture for greater scalability, extensibility, and interoperability of interactive data views. Mosaic decouples data processing from specification logic: clients publish their data needs as declarative queries that are then managed and automatically optimized by a coordinator that proxies access to a scalable data store. Mosaic generalizes Vega-Lite’s selection abstraction to enable rich integration and linking across visualizations and components such as menus, text search, and tables. In this talk, I will demonstrate Mosaic’s expressiveness, extensibility, and interoperability through examples that compose diverse visualization, interaction, and optimization techniques—many constructed using vgplot, a grammar of interactive graphics in which graphical marks act as Mosaic clients. Benchmarks show order-of-magnitude performance improvements over existing web-based visualization systems—enabling flexible, real-time visual exploration of billion+ record datasets. I’ll conclude by discussing Mosaic’s potential as an open platform that bridges visualization languages, scalable visualization, and interactive data systems more broadly.

Bio: Dominik Moritz is on the faculty at Carnegie Mellon University where he co-directs the Data Interaction Group at the Human-Computer Interaction Institute. His group’s research develops interactive systems that empower everyone to effectively analyze and communicate data. Dominik also manages the visualization team in Apple’s machine learning organization. His systems (Vega-Lite, Falcon, Draco, Voyager, and others) have won awards at academic venues (e.g. IEEE VIS and CHI), are widely used in industry, and by the Python and JavaScript data science communities. Dominik got his PhD from the Paul G. Allen School at the University of Washington, where he was advised by Jeff Heer and Bill Howe.

Prioritizing Patients: A Geometric Point of View

Photo of Eitan BachmatEitan Bachmat, PhD
Associate Professor, Computer Science
Ben-Gurion University of the Negev

Monday, September 30, 2024
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Lahey Room (5th Floor)
Virtual option (registration required)

Abstract: In certain medical decision support systems we are faced with the challenge of prioritizing patients, based on several different criteria/measures. For example, one can imagine a risk score and a potential treatment effectiveness score (and perhaps more measures such as age) which could be partially dependent. This setting occurs for example in the C-Pi system of Clalit where millions of patients are dynamically ranked as their EMR data is accumulated. Eventually, we would like to prioritize patients, hopefully in a consistent way, meaning that if patient A has both (all) higher risk and higher potential treatment score, they will be prioritized over a patient whose scores are all lower.

In the talk, I will consider this setting from a geometric point of view, discussing the space of all potential consistent rankings, giving some examples of consistent ranking methods beyond taking a weighted average of scores and connecting all this to the evolution of the universe and to the design of sub-diffraction limit focal lenses in hyperbolic meta-materials. The talk will be self contained, no background knowledge needed, experience with boarding airplanes will be helpful.

Download flyer

Epidemiology of Type 2 Diabetes in South Asians:
Lessons Learnt Over 50 Years

photo of V. Mohan, MDV. Mohan, MD
Chairperson, Dr. Mohan's Diabetes Specialities Centre

Thursday, September 26, 2024
12:00–1:00pm Seminar 
Countway Library, Minot Room (5th Floor)

Download flyer

Data Science at Scale in the VA

Tuesday, July 2, 2024
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Minot Room (5th Floor)
Virtual option (registration required)

Saiju Pyarajan, PhD
Director, Center for Data and Computational Sciences (C-DACS)
VA Boston Healthcare System, VA Office of Research and Development

Saiju Pyarajan headshotAbstract: Since its inception, the Veterans Affairs (VA) has been performing clinical care and research. VA has also been a pioneer in establishing an electronic system for storing healthcare data since the 1990’s. With this legacy of data record keeping VA currently has large volumes of data including clinical, omics, imaging and other related health data and provides a great opportunity for performing “Big Data” science. Dr. Pyarajan will go over the richness and diversity of the data and provide an overview of the data science efforts at the VA.

Bio: Dr. Saiju Pyarajan is the inaugural Director of the Center for Data and Computational Sciences (C-DACS) at the Veterans Affairs (VA) Boston.  He received his Ph.D in immunology from UMASS, Amherst and did his post-doctoral training at the Dana Farber Cancer Institute and NYU School of Medicine. At the VA he designed the custom genotyping array for the Million Veteran Program (MVP) and his research interests include understanding the human genetic diversity and its effect on risks for disease.  His team also provides data science support for large VA programs like the Million Veteran Program (MVP), VA SHIELD, CSP and NPOP.

Revolutionizing Clinical Reasoning With Large Language Models

Tuesday, March 12, 2024
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Lahey Room (5th Floor)
Virtual option (registration required)

Adam Rodman, MD, PhD
Assistant Professor of Medicine, HMS; Co-director, iMED Initiative at BIDMC

Adam Rodman, MD, PhDAbstract: Dr. Rodman will discuss the scientific understanding of physician cognition, cognitive diagnostic errors, and how large language models are poised to revolutionize the field of clinical reasoning.

Download flyer


Collaborations to Advance Health Equity Using Real-World Data

Wednesday, October 11, 2023
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Minot Room (5th Floor) with Virtual option

William (Bill) Adams, MD
Professor of Pediatrics, Director of the BU-CTSI Biomedical Informatics Core, and Informatics Director for the Boston HealthNet

William (Bill) Adams, MDAbstract: In this presentation, Dr. Bill Adams will provide an overview of current translational informatics activities at Boston Medical Center and the BU Medical Campus with a focus on innovative approaches to measuring and advancing health equity. An important goal for the meeting will be to explore and identify new opportunities for collaborative projects and shared training opportunities.

Download flyer

Biomedicine in the Age of Generative AI

Tuesday, September 12, 2023
11:30am Free Pizza Lunch
12:00–1:00pm Seminar 
Countway Library, Lahey Room (5th Floor)

James Zou, PhD
Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering
Stanford University

James Zou, PhD — Stanford UniversityAbstract: There have been tremendous advances in generative AI such as ChatGPT and DALLE. Generative models can potentially expand researchers’ creativity while balancing complex tradeoffs. I will illustrate this with applications of generative AI to different stages of biomedical research through three examples. We will first discuss how to use generative AI to design and experimentally validate novel drugs. Then we will apply a similar generative approach to inform the design of clinical trials to make trials more efficient and inclusive. Finally, we will demonstrate how to build visual-language models to index complex biomedical data. Throughout, I will highlight some of the key open challenges with generative AI related to bias amplification and behavioral drift.

Bio: James Zou is an assistant professor of Biomedical Data Science, CS and EE at Stanford University. He is also the faculty director of Stanford AI4Health. He works on both improving the foundations of ML–-by making models more trustworthy and reliable–-as well as in-depth scientific and clinical applications. Many of his innovations are widely used in tech and biotech industries. He has received a Sloan Fellowship, an NSF CAREER Award, two Chan-Zuckerberg Investigator Awards, a Top Ten Clinical Achievement Award, several best paper awards, and faculty awards from Google, Amazon, Tencent and Adobe.

Download flyer


29 June, 2023 | 1:30–2:30pm
In person: Countway Library Room 403 (4th Floor)
Lixing Yang, PhD
Assistant Professor of Ben May Department of Cancer Research & Assistant Professor of Human Genetics, University of Chicago

The Causes and Consequences of Somatic Structural Variations in Human Cancer
Genome instability is a hallmark of cancer and somatic structural variations (SVs) are common in cancer genomes. My lab focuses on the molecular mechanisms leading to somatic SVs and their functional consequences in diseases. We develop new computational methods to delineate SV signatures that reflect distinct molecular mechanisms. Intriguingly, some SVs are extremely complex and are formed as catastrophic events. We decompose complex SV signatures based on patterns of the SV events and deconvolute simple SV signatures using non-negative matrix factorization. Several new signatures are discovered. In particular, we find that transcription and DNA replication collision is likely to cause large tandem duplications in many tumor types. The collision repair deficiency can be therapeutically targeted. Moreover, we study how SVs in non-coding regions can drive cancer. We develop a new computational algorithm to predict candidate oncogenes activated by distal enhancers due to somatic SVs and are able to validate their oncogenic functions using in vitro and in vivo experiments.


17 May, 2022 | 10:00–11:00am
In person: Countway Library Lahey Room (5th Floor)
Serena Yeung, PhD
Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering at Stanford University

Developing Computer Vision to Augment Clinician Capabilities Across Healthcare Delivery
There is an abundance of diverse visual data in healthcare. Some healthcare procedures, such as radiology imaging and video-based laparoscopic surgeries, already generate large amounts of visual data. Other aspects of healthcare, such as performing certain patient care activities and patient monitoring, offer opportunities to generate new types of image and video data that can assist clinicians in improving quality of care. In this talk, I will discuss our group’s ongoing work in developing computer vision methods to interpret and make use of diverse types of image and video data in healthcare. I will focus in particular on two types of video data: surgical videos, and videos of human behavior in healthcare contexts. First, I will present work on developing computer vision for fine-grained scene understanding in surgical videos, towards applications for surgeon training and real-time assistance. Then, I will present work on 3D human and scene understanding in clinical videos, towards applications of ambient intelligence in hospitals and behavioral studies. I will also discuss methods for learning with limited amounts of labels to enable adaptation of these complex computer vision tasks to challenging domains.


30 November 2021 | 2:00–3:00pm
In person: Countway Library Ballard Room (5th Floor)
Cristian Tomasetti, PhD
Associate Professor of Oncology and Biostatistics, Johns Hopkins University

Cancer Etiology, Evolution, and Early Detection
A fundamental question in cancer research and cancer prevention is what causes cancer. In this talk, recent findings that have challenged the core of our current understanding of cancer etiology will be presented, together with mathematical models of tumor evolution. In the final part, novel methods for the earlier detection of cancer via a simple blood test will be discussed.


11 February 2020 | Goldenson Building, Room 122 
Emma Pierson, MS
PhD Student, Jure Leskovec Lab 
Computer Science, Stanford

Data Science Methods to Improve Healthcare and Reduce Inequality
I will describe how to use data science methods to understand and reduce inequality in public health and healthcare. First, I will discuss how to use Bayesian modeling to detect racial discrimination in policing, a public health concern. Second, I will describe how to use machine learning to understand racial and socioeconomic inequality in pain.


29 January 2020 (Countway Library, 4th Floor, Room 403)
Irene Chen
PhD Student, Clinical Machine Learning Group 
Computer Science, MIT

Fairness and Robustness in Healthcare Algorithms
As machine learning models become more powerful and ubiquitous, researchers have raised concerns about bias and robustness. In sensitive applications like criminal justice or healthcare, we seek to quantify abstract concepts like fairness or robustness and improve any flawed models. Often times, researchers must think beyond the algorithm and consider the data collection process as well. In this talk, I will present two projects aimed at improving healthcare algorithms. First, I describe how we can diagnose sources of unfairness in an algorithm and decompose cost-based metrics of discrimination into bias, variance, and noise. I propose solutions aimed at estimating and reducing each component. Second, I present a health knowledge graph for diagnostic purposes and describe how to check for robust medical knowledge extraction in large datasets.


13 November 2019 (Countway Library, 5th Floor, Lahey Room) 
Estella Geraghty, MD, MS, MPH, GISP
Chief Medical Officer and Health Solutions Director
Esri

Reflections from a Topophiliac
Geography is the science that connects people, places and their interactions. By its nature, geography is a science of storytelling and deeper understanding of the human experience. When applied as a technology, a geographic information system (GIS) can help health organizations make sense of complex interactions among populations and their environments. The resulting location intelligence offers tremendous value in promoting health and well-being on a personal and a population level while also supporting major strategic plans for resource allocation and growth. This presentation will explore the speaker’s career evolution into this space and explain how geography intersects and informs health while also being central to a convergence of precision medicine and precision public health.


7 November 2019 (Countway Library, 4th Floor, Room 403) 
Yonatan Grad, MD, PhD
Melvin J. and Geraldine L. Glimcher Assistant Professor of Immunology and Infectious Diseases
Department of Immunology and Infectious Diseases
Harvard T.H. Chan School of Public Health

Antibiotic Use, Resistance, and the Example of Neisseria Gonorrhoeae
To address the challenge of antibiotic resistance, we need to understand the relationship between use and resistance at population levels and to develop new clinical and public health interventions. In this talk, I will discuss my group’s work on characterizing population trends in antibiotic use and resistance and on N. gonorrhoeae as a model system to investigate how genomics can improve surveillance and inform diagnostics.


17 May 2019 (Countway Library, 5th FL, Lahey Room) 
Peter J. Embi, MD, MS, FACP, FACMI
President & CEO, Regenstrief Institute
Leonard Betley Professor of Medicine and Associate Dean for Informatics & Health Services Research, IU School of Medicine
Associate Director of Informatics, Indiana CTSI
Vice-President for Learning Health Systems, IU Health

Leveraging Informatics to Enable Learning Health Systems: Now, it’s Personal

Dr. Embi is an internationally recognized leader in the field of biomedical informatics, and specifically clinical and translational research informatics. During this seminar, he will use his recent experiences as a patient as a launching point to discuss the need for a new research-practice paradigm, driven by informatics, to create a learning health systems. He will describe current approaches, challenges and opportunities to enable an evidence-generating-medicine model that complements evidence-based-medicine, with the goal of improving care and enabling discovery through practice. Dr. Embi will also describe how his research, experiences and roles have informed ongoing efforts to bridge academic and operational informatics to realize a learning health system.


3 May 2019 (Countway Library, 4th FL, Room 424) 
Zhiyong Lu, PhD
Senior Investigator, National Library of Medicine 

AI in Medicine: from PubMed Search to Autonomous Disease Diagnosis

The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly exists as free text in journal articles for humans to
read—presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. In this talk, I will present our work on developing large-scale, machine-learning based tools in NLP and medical imaging research. Moreover, I will demonstrate their uses in some real-world applications such as improving PubMed searches, scaling up human curation for precision medicine, and enabling image-based autonomous disease diagnosis.


21 March 2019 (Countway Library, 4th FL, Room 403) 
Dmitri Pervouchine, PhD
Assistant Professor, Center of Life Sciences 
Skolkovo Institute for Science and Technology 

Integrative Transcriptomic Analysis Suggests Novel Autoregulatory Splicing Events coupled with Nonsense-Mediated mRNA Decay

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (1) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (2) RPS3 binding activates a poison 5’-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. Taken together, these findings indicate that autoregulatory negative feedback loop of alternative splicing and NMD is a ubiquitous form of post-transcriptional control of gene expression among splicing factors.


7 March 2019, Countway Library, Room 424 (4th Floor)
Cody Dunne, PhD
Assistant Professor, College of Computer and Information Science
Northeastern University 

Temporal Event Sequence Visualization for Type 1 Diabetes Treatment Decision Support

The modern world is awash in complex data that can contain the keys to improving our lives. The scope of this data has rapidly outpaced our capabilities to analyze and comprehend, so we turn to computers to help. However, state-of-the-art technology can only supplement the human element. People assist in each stage of data science, whether it’s data cleaning, understanding algorithm design, exploring computed results, or collaborating and sharing for decision-making. To present complex information to humans, we use visualizations that leverage our extraordinary perceptual system which can detect trends, clusters, gaps, and outliers almost instantly.

My talk will focus on the specific problem of temporal event sequence visualization for treating type 1 diabetes. Type 1 diabetes is a chronic, incurable autoimmune disease affecting millions of Americans. Treatment requires frequent adjustments to insulin protocol, diet, and behavior in collaboration with a clinician. Manual logs and medical device data are collected by patients, but these multiple sources of data are presented in disparate visualization designs to the clinician—making temporal inference difficult. These issues are compounded when there is poor data quality such as missing data, erroneous data, or uncertainty in values or timestamps.

I will discuss a data and task abstraction for this problem using a novel hierarchical task abstraction approach. I will also demonstrate the interactive visualization tool we developed in this design study: IDMVis. IDMVis includes a novel technique for folding and aligning records by dual sentinel events and scaling the intermediate timeline. Using IDMVis, clinicians were able to identify issues of data quality such as missing or conflicting data, reconstruct patient records when data is missing, differentiate between days with different patterns, and promote educational interventions after identifying discrepancies.


29 November 2018, Countway Library, Ballard Room (5th Floor)
Can Alkan, PhD
Assistant Professor, Department of Computer Engineering
Bilkent University 

Algorithms to Characterize Genomic Structural Variation using High Throughput Sequencing TechnologiesStructural variation, in the broadest sense, is defined as the genomic changes among individuals that are not single nucleotide variants. Rapid computational methods are needed to comprehensively detect and characterize specific classes of structural variation using next-gen sequencing technology. We have developed a suite of tools using a set of aligners and algorithms focused on the characterization of structural variants that have been more difficult to assay, including complex rearrangements. In this talk I will summarize our work in developing combinatorial algorithms to discover of structural variation using high throughput sequencing technologies. The algorithms we have developed will provide a much needed step towards a highly reliable and comprehensive structural variation discovery framework, which, in turn will enable genomics researchers to better understand the variations in the genomes of newly sequenced human individuals including patient genomes.


20 September 2018, Countway Library, Room 403 (4th Floor)
Anamaria Crisan, MSc, PMP
PhD Candidate  
University of British Columbia 

Creating Explorable Visualization Design Spaces: An Example from Infectious Disease Genomic Epidemiology
New technologies are enabling the collection of more complex and heterogenous data that healthcare decision makers can use to inform their routine practices and to develop new policies. To address the data analysis and interpretation challenges that attend  health care big data, researchers and decision makers are increasingly turning to data visualization to help them explore their data and communicate their findings. Yet, without a notion of a visualization design space, the resulting data visualizations are generated based upon the personal preferences of the individual creator and without a notion of good or bad alternative visualization designs. We have developed a method for systematically generating a visualization design space that uses a combination of text mining and visual analysis to identify both the context for which the visualization was created and to breakdown elements of how the visualization was constructed. Our approach exposes current common practices, can identify absent visualization designs, and also provides a way for conveying notions of good and bad visualization practice. We demonstrate our approach in action by applying it to a literature corpus of nearly 18,000 articles drawn from public health infectious disease genomic epidemiology and have operationalized our findings into a Genomic Epidemiology Visualization Typology (GEViT) and an accompanying visualization design space gallery available at http://gevit.net. We are the first to take such a systematic approach and both our method and the results of its application to genomic epidemiology have important implications for bioinformaticians, researchers, and other healthcare decision makers intending to develop or use data visualization tools.


17 August 2018, Countway Library, Minot Room (5th Floor)
Igor Adameyko, PhD
Associate Professor, Department of Physiology and Pharmacology 
Karolinska Institutet, Sweden

Non-Canonical Functions of Nervous System: Insights from Single Cell Transcriptomics
Recent studies introduced a radically new concept for developmental biology in that defined precursor pools existing in a highly specialized niche use nerves as conduits to migrate and differentiate through temporally and spatially delineated nerve-Schwann cell communication. This concept carries significant weight since it also applies to the development of melanomas from subcutaneous nerve-associated cells, as well as for the genetic coding of pigmentation patterns. Can Schwann cells be genuinely multipotent? If so, this would transform the above discoveries into a global concept in which nerve-associated progenitors could generate various cell types during not only physiological development but also adulthood. Moreover, any such cell pool could be exploited to regenerate damaged tissues in tandem with regaining sensory nerve functions. This notion is plausible since nerves traverse the entire body from early embryonic development on and their in-growth coincides with the expansion of cell pools in the organs they target.


11 May 2018, Countway Library, Room 403 (4th Floor)
Peter Krawitz, MD
Professor and Director, Institute for Genomic Statistics and Bioinformatics
University of Bonn

DeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning
Facial analysis technologies have recently measured up to the capabilities of expert clinicians in syndrome identification. To date, these technologies could only identify phenotypes of a few diseases, limiting their role in clinical settings where hundreds of diagnoses must be considered. We developed a facial analysis framework, DeepGestalt, using computer vision and deep learning algorithms, that quantifies similarities to hundreds of genetic syndromes based on unconstrained 2D images. DeepGestalt is currently trained with over 26,000 patient cases from a rapidly growing phenotype-genotype database, consisting of tens of thousands of validated clinical cases, curated through a community-driven platform. DeepGestalt currently achieves 91% top-10-accuracy in identifying over 215 different genetic syndromes and has outperformed clinical experts in three separate experiments. We suggest that this form of artificial intelligence is ready to support medical genetics in clinical and laboratory practices and will play a key role in the future of precision medicine.


15 March 2018, Countway Library, Lahey Room (5th Floor)
Nicholas A. Christakis, MD, PhD, MPH
Sol Goldman Family Professor of Social and Natural Science, Yale University
Co-Director of Yale Institute for Network Science 

Social Network Interventions: Large-Scale Experiments from Global Health to Online AI-Bots

Human beings choose their friends, and often their neighbors, and co-workers, and we inherit our relatives; and each of the people to whom we are connected also does the same, such that, in the end, we humans assemble ourselves into face-to-face social networks. Why do we do this? And how might a deep understanding of human social network structure and function be used to intervene in the world to make it better? Here, I review recent research from our lab describing three classes of interventions involving both offline and online networks that can help make the world better: (1) interventions that rewire the connections between people; (2) interventions that manipulate social contagion, facilitating the flow of desirable properties within groups; and (3) interventions that manipulate the position of groups of people within network structures. I will illustrate what can be done using a variety of experiments in settings as diverse as fostering cooperation in networked groups online, to fostering health behavior change in developing world villages, to facilitating the diffusion of innovation or coordination in groups. I will also focus on recent experiments with “hybrid systems” comprised of both humans and "dumb bots," involving simple artificial intelligence (AI) agents interacting in small groups. By taking account of people's structural embeddedness in social networks, and by understanding social influence, it is possible to intervene in social systems to enhance desirable population-level properties as diverse as health, wealth, cooperation, coordination, and learning.


8 March 2018, Countway Library, Lahey Room (5th Floor)
Shilpa Kobren
Ph.D. Candidate
Princeton University 
Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics  

Data-Driven Approaches for Uncovering Functional Variation in Protein Interactions

Proteins carry out a dazzling multitude of functions by interacting with DNA, other proteins and various other molecules within our cells. Together these interactions comprise complex networks that differ naturally across cells within an organism, across individuals in a population, and across species. Although such variation is critical for normal organismal functioning, mutations affecting protein interactions are also known to underlie a wide range of human diseases. In my talk, I will present novel computational approaches that explore the extent to which specific protein interactions vary across species, across healthy individuals, and across individuals with cancer. To start, I will focus on interaction variation across species. We developed and applied a comparative genomics framework to systematically quantify changes in protein-DNA interactions across closely related species. This work demonstrates that contrary to popular convention, functional gene regulatory divergence can stem from changes in non-duplicated DNA-binding proteins; such changes were previously believed to be largely detrimental. Next, I will turn my attention to interaction variation across individuals. First, to comprehensively identify interaction sites in human proteins, we combine large-scale sequence, domain and structure information to provide a biologically relevant assessment of per-position binding potential across protein sequences. This enables us to pinpoint sites involved in binding DNA, RNA, peptides, ions, metabolites, or other small molecules in 60% of human genes, representing the largest resource of this type to date. We show that whereas inferred interaction sites are significantly depleted of natural variants across ~60,000 healthy individuals, these same sites are significantly enriched for cancer mutations across ~11,000 tumor samples. In the last part of my talk, I show how we can exploit these opposing trends to uncover genes whose interaction interfaces are significantly altered in tumors. To this end, we develop a novel analytical framework that integrates our domain binding potentials with additional sources of data. Our method recapitulates known cancer driver genes with high precision as well as discovers perturbed molecular mechanisms in relatively rarely-mutated genes, thereby enabling valuable insights that may help guide personalized cancer treatments.


1 March 2018, Countway Library, Room 403 (4th Floor)
Marc Streit, PhD, Johannes Kepler University
Wolfgang Aigner, PhD, MSc, St. Pölten University of Applied Sciences
Dominic Girardi, MSc, RISC Software GmbH

Injecting Life into Visualizations for Biomedical Research 

Visualization is an important data analysis method that allows scientists to explore a dataset without preconceived questions, and is thus crucial for hypothesis generation. Visualization is also essential in communicating research findings. Current visualization tools, however, have a crucial shortcoming: the interactive visual exploration process is not captured, which means that the analysis steps cannot be shared. Being able to reproduce visual analysis sessions and enabling third parties to understand, modify, and extend analysis sessions can have a significant impact on transparency, reproducibility, and innovation of analysis processes. In the first part of the talk we introduce our efforts towards making this vision a reality, demonstrated by means of examples from drug discovery. In the second part of the talk we will summarize our works on developing novel ways to visually analyze large and heterogeneous biomedical datasets. In particular, we will introduce solutions for exploring health-related sensor data and electronic health records, for comparing large tabular data, and for ontology-guided clinical data analysis.


15 February 2018, Countway Library, Room 403 (4th Floor)
Kees van Bochove
Founder and CEO 
The Hyve 

GO-FAIR on biomedical data with the Personal Health Train
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.


1 February 2018 , Countway Library, Minot Room (5th Floor)
David Benjamin, PhD & Samuel Lee, PhD 
Computational Biologists  
Broad Institute Data Science Platform

The New and Improved GATK 4
GATK 4 is an expanded, improved, scalable, and fully open-source version of the popular genomics software. In addition to the germline variant discovery tool HaplotypeCaller, it now contains tools for copy-number, structural, and somatic variants. A rewritten engine makes developing new GATK tools easier than ever. Furthermore, the WDL workflow language enables GATK tools to be combined in robust and maintainable pipelines that run equally well on a laptop or on the cloud. After a brief survey of GATK 4, we will discuss its new pipelines for germline and somatic copy-number variants and its updated version of Mutect2 for somatic SNVs and indels, highlighting their algorithms and upcoming developments. 


14 December 2017, Countway Library, Room 403 (4th Floor)
Mauricio Santillana, Ph.D.
Assistant Professor, Harvard Medical School
Faculty Member, Boston Children's Hospital Computational Health Informatics Program
Associate, Harvard Institute for Applied Computational Science  

Machine Learning Approaches for Early Detection of Events in Healthcare. Epidemiological and Clinical Applications

I will describe machine learning methodologies that leverage Internet-based information from search engines, twitter microblogs, crowd-sourced disease surveillance systems, and electronic medical records, to successfully monitor and forecast disease outbreaks in multiple locations around the globe in near real-time. I will also present machine learning methodologies that leverage continuous-in-time information coming from bedside monitors in Intensive Care Units (ICU) to help improve patients' health outcomes and reduce hospital costs. I will describe how these methodologies can be used to determine whether a patient in the ICU is ready to be extubated or not, or to estimate the length of stay upon a patient's admission. If time allows, I will discuss some other machine learning methodologies capable of estimating, ahead of time, the volume of daily emergency visits to Boston Children's Hospital or the likelihood that a patient may not show up to an appointment in an ambulatory clinic.


7 December 2017, Countway Library, Ballard Room (5th Floor)
Nick Loman, PhD
Professor of Microbial Genomics and Bioinformatics 
University of Birmingham  

Pore! What is it Good For? or The Sequencing Singularity? 

Sequencing may be the ultimate clinical assay, providing rich information for both diagnosis, genotyping and surveillance of pathogens in a single test. In this talk I will detail how our work with portable in-field nanopore sequencing led to new insights into Ebola evolution that were fed in real-time into outbreak response efforts. Further work on Zika demonstrated huge gaps in our knowledge of circulating pathogens in human populations, but reinforced technical difficulties in recovering whole genomes directly from clinical samples with an untargeted approach. Ultimately though, metagenomics approaches should be viable for the diagnosis and recovery of whole pathogen genomes from clinical samples. I will also discuss the role ultra-long read single molecule sequencing may have on diagnostic sensitivity, as well as the new opportunities offered by direct RNA sequencing which may also allow us to monitor host response to infection. Taken together, recent technological advances make the prospect of a ‘sequencing singularity’ a tantalising prospect.


30 November 2017, Countway Library, Room 403 (4th Floor) 
Georg K. Gerber, MD, PhD, MPH
Assistant Professor, Harvard Medical School
Co-Director, Massachusetts Host-Microbiome Center

The Dynamic Microbiome

The microbiome, or microbial organisms living in and on us, play important roles in human health and disease. There is increasing interest in harnessing the microbiome for therapeutic purposes, yet analysis of these complex and inherently dynamic host-microbial ecosystems presents numerous challenges. In this talk, I will discuss novel Bayesian machine learning methods that my lab has developed to address some of these challenges, including discovery of microbial temporal signatures associated with perturbations or disease; and for optimizing the design of bacteriotherapies and predicting their dynamic behaviors in the host. If time permits, I will also outline some experimental synthetic biology approaches for high-throughput discovery of in vivo microbial functions and for engineering bacterial consortia in the mammalian gut.


5 October 2017, Countway Library, Minot 403 (4th Floor) 
Jessica Polka, PhD
Director, ASAPbio
Visiting Scholar, Whitehead Institute

Preprints in the Life Sciences

Our traditional publication system keeps new research hidden from public view long after it is ready to be evaluated by our peers. This has adverse consequences not only for individual careers, but also for the overall speed of scientific communication and discovery.

Preprints, or manuscripts posted online before the completion of journal-organized peer review, offer a solution to this problem. In this interactive discussion, we will address the benefits of preprinting, concerns and challenges surrounding their use, and new developments - including rapidly changing funder and journal policies.


5 May 2017, Countway Library, Room 403 (4th Floor) 
Naomi Penfold, PhD
Innovation Officer, eLife Sciences

Accelerating Discovery at eLife with Open-Source Technology

Backed by the Howard Hughes Medical Institute, the Max Planck Society, and the Wellcome Trust, eLife aims to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. The online-only open-access eLife journal for outstanding advances in life sciences and biomedical research was just the first step in this mission. Now, we also actively champion the development of open-source tools, technologies and processes aimed at improving the discovery, sharing, consumption and evaluation of scientific research.

Naomi Penfold will discuss how eLife is working to accelerate discovery, increase transparency and improve incentives in the life sciences through the use and development of cutting-edge technologies. From preprints to reproducible analyses to artificial intelligence, she will present challenges and opportunities for the next innovators. We invite the community to contribute feedback and ideas for future innovations, and we welcome the opportunity to form new collaborations with the best emerging talent at the interface of research and technology.


28 March 2017, Countway Library, Lahey Room (5th Floor) 
Patricia Brennan, RN, PhD
Director, National Library of Medicine (NLM)
National Institutes of Health

DataScience@NIH: Strategies for Sustainability in an Era of Data-Driven Discoverypproaches through which individual researchers and research programs can adopt this paradigm.


16 March 2017, Countway Library, Lahey Room (5th Floor) 
Martha Gray, PhD
J. W. Kieckhefer Professor of Health Sciences and Technology
Massachusetts Institute of Technology

From Research to Impact: Increasing the Odds and Accelerating the Pace

Many of us who pursue research careers in the biomedical arena seek to have a positive impact on human health. However, the road to ‘real world’ impact is longer and more tortuous than most imagine. In this seminar, Professor Gray will describe several ongoing initiatives designed to increase the potential for work to reach ‘real world’ impact more efficiently and effectively. These initiatives are rooted in a new paradigm for academic research, one that addresses the cultural, conceptual, and methodologic gaps that impede the road to impact. In addition to showing outcomes from the ongoing initiatives, we will discuss short-term and long-term approaches through which individual researchers and research programs can adopt this paradigm.


7 March 2017, Countway Library, Lahey Room (5th Floor) 
Maria Nattestad, PhD
Bioinformatics Graduate Student
PhD, Computational Biology
Cold Spring Harbor Laboratory

Computational Methods and Analysis Tools for Studying Complex Variation in Cancer Genomics

Advances in single-molecule sequencing have produced a resurgence of reference quality genome assemblies, but the promise of these technologies to biomedical applications have yet to be realized. We initiated a pilot project between CSHL and OICR to perform long-read sequencing of a breast cancer cell line using PacBio SMRT technology, and we have since been developing a wide range of computational methods from alignment and variant-calling to genome connectivity analysis algorithms and visualization tools. This seminar will focus on the specialized visualization and interactive analysis tools I developed to further our understanding of complex variation in cancer. One of our open-source online tools, SplitThreader, enables a genome-wide view of long-range variants in a cancer genome and uses a purely client-side algorithm to search across the landscape of rearrangements for genomic evidence of gene fusions. The SplitThreader visualizer represents my answer to the question of how to intuitively show long-range variants to visually determine how they match copy number changes. This visualization combining copy number and the long-range variant breakpoints across any pair of chromosomes was an idea that progressively evolved from drawings on paper, to cumbersome static plots of graphical primitives in R, and finally to an interactive D3 visualization tool. I will also highlight my most recent work on Ribbon, an alignment visualizer that specializes in long reads and complex variants. Ribbon was created to address the challenge that I saw with our long-read cancer sequencing analysis: the true power of long reads is hidden when the tools we use are built for short reads. Unique among read alignment visualization tools, Ribbon not only shows alignments on the reference perspective but also along the read, with dot plots and a new intuitive visualization inspired by cartoons from the literature. Future work includes expanding the use of visualization and intuitive web applications to additional fields of biomedical science and someday to use them to help empower physicians and patients with the best that biomedical informatics has to offer.


28 February 2017, Countway Library, Lahey Room (5th Floor) 
Po-Ru Loh, PhD
Postdoctoral Research Associate
Department of Epidemiology
Harvard T.H. Chan School of Public Health

Fast Statistical Methods for Big Data Genetics

We are currently in a golden age of quantitative genetics. Rapid technological advances in genotyping and sequencing have exponentially increased data sizes over the past decade. National biobank efforts in the US, UK, and China are generating richly phenotyped genetic data sets with sample sizes on the order of millions. Now the challenge is analysis. A wealth of analytical approaches exist, but time has shown that whenever genetic data sizes increase by an order of magnitude – in other words, every 3–4 years – algorithms need to be re-engineered and inferential limits need to be redefined. I will describe three ongoing research thrusts in the fields of mixed model analysis, haplotype phasing and imputation, and mosaic aberration detection, each taking place at the intersection of statistics, computer science, and biology.


23 February 2017, Countway Library, Ballard Room (5th Floor) 
Andrew Beam, PhD
Research Fellow
Department of Biomedical Informatics
Harvard Medical School 

Deep Learning 101

Deep learning has revolutionized many fields of computer science and machine learning in the past 5 years. Though neural networks were once relegated to near obscurity, recent methodological and engineering advancements have resulted in rapid progress and parity with humans on an impressively broad set of tasks. These breakthroughs are now woven into our everyday lives in devices such as smart phones, messaging apps, and personal assistants. The field continues to advance at a dizzying pace and remaining current with an ever-advancing set of best practices can be daunting. In this talk I will give an overview and historical perspective of deep learning that attempts to contextualize recent progress and current state of the field. Special attention will be given to key papers that introduced methods or ideas that significantly advanced the state of the art. Illustrative examples of deep multilayer perceptrons and convolutional neural networks will be given with code to highlight key implementation details that have been crucial for the recent success of deep learning.


21 February 2017, Countway Library, Room 403 (4th Floor) 
Artem Sokolov, PhD
Director of Informatics
Harvard Program in Therapeutic Science (HiTS)
Harvard University

Characterizing Stemness Properties of 33 Tumor Types 

Some of the more agressive and treatment-resistant forms of cancers arise when cells undergo de-differentiation and acquire "stem-like" properties. Characterization of these properties can help advance our understanding of tumor recurrence and proliferation as well as guide future theraputic solutions to effectively deal with tumor initiation and self-renewal of cancer stem cells.
UsingmRNA expression and DNA methylation data, we developed machine learning predictors capable of quantifying the stemness state of cells. We applied these predictors to the TCGA PanCanAtlas dataset that spans ~10,000 patients across 33 different tumor types. The emergent stemness profiles were then compared against a wide array of molecular and clinical features, leading to key observations on tumor microenvironment, EMT transition, immuno-suppression, mutation status and patient survival as they relate to cancer de-differentiation states. 


7 February 2017, Modell Immunology Center, Modell 100A Fred S. Rosen Lecture Hall
Sandeep Robert Datta, MD, PhD
Assistant Professor of Neurobiology
Department of Neurobiology
Harvard Medical School

Using Ethology to Model Disease

The Datta lab studies how information from the outside world is detected, encoded in the brain, and transformed into meaningful behavioral outputs. Here we describe a new approach we have recently developed, which combines 3D machine vision with unsupervised machine learning, to characterize the underlying structure of mouse behavior. Using this approach we have discovered that mouse behavior can be segmented into a fundamental set of components that we call “behavioral syllables.” Each behavioral syllable is a brief and well-defined motif of 3D behavior that the brain places in into specific sequences via definable transition statistics (or behavioral “grammar”) to flexibly create complex patterns of action. By characterizing mouse behavior in terms of its component parts, we can use our behavioral characterization technique to identify subtle differences in the pattern of motor output under different experimental conditions with an unprecedented level of sensitivity, suggesting that this technology will be useful for drug development. By combining this method with in vivo imaging of corticostriatal circuits in behaving animals, we can also identity neural correlates for the sub-second structure of behavior identified by our algorithms, suggesting this behavioral analysis technique will provide direct insights into the relationship between neural circuit activity and patterns of action. Thus our method will afford insight into mechanisms that allow animals to flexibly navigate the outside world, enable better characterization of mouse models of disease, and serve as a quantitative prism through which the function of genes and neural circuits can be understood.


2 February 2017, Countway Library, Minot Room (5th Floor) 
Yilong Li, PhD 
Principal Scientist, Seven Bridges Genomics  

Patterns of Somatic Rearrangements in 2,600 Cancer Genomes 

Chromosomal rearrangements are widespread in cancer genomes. High throughput sequencing has led to the discovery of many somatic rearrangements that initiate or promote cancer development. However, so far their complexity has impeded the comprehensive characterisation of their patterns and distributions in cancer. As a consequence, the mechanistic and selective forces shaping cancer genomes are still incompletely understood.

Here I will talk about the first large-scale analysis of somatic rearrangements as part of the International Cancer Genome Consortium PanCancer Analysis of Whole Genomes project. In order to systematically describe somatic rearrangements, I developed statistical algorithms to group rearrangement junctions into distinct events, and designed a nomenclature system to unambiguously identify patterns of any given combination of copy number changes and rearrangement junctions. I implemented a general purpose object-oriented framework to computationally model these concepts, allowing me to compute quantitative metrics for studying the mechanistic nature of different rearrangement patterns.

Applying these methods to 2,600 cancer genomes revealed that while simple deletions and tandem duplications are the most common rearrangement types, a significant proportion of the total rearrangement junction load are caused by complex rearrangement events, reaching up to 80% in certain cancer types. In addition, I identified several related rearrangement patterns consisting of multiple inversion-type rearrangements and, through statistical analysis, showed that such patterns likely arise from a series of polymerase template switches rather than from multiple independent rearrangement events. These results illustrate how cancer genomes are affected by different types of DNA damage, which may be reflective of clinically relevant defects in DNA repair or checkpoint pathways. 


27 January 2017, Countway Library, Minot Room (5th Floor) 
Iain Buchan, MD 
Professor of Public Health Informatics, University of Manchester 
Director, Health e-Research Centre 
Co-Director, Farr Institute of Health Informatics Research 

Civic Informatics of Health 

Professor Buchan will argue that health(care) systems cannot be optimised (or ‘learn’) independently of the civic systems in which they operate. He will explore the need for informatics to enable fuller understanding of the links between health and place, not only to provide actionable analytics for better care but also to advance discovery science. Beyond better clinical epidemiological resolution of person and place, Professor Buchan will emphasise the triad of time-place-person, with its challenges and opportunities of combining frequent, patient/citizen-derived information with infrequent clinical observations – tapping into the rhythms of disease and daily life for better care. He will use practical examples from the UK NHS to show how ‘natural’ health systems, covering 2-7m regional populations, with deeply integrated health data, and interoperable analytics, might borrow strength from each other for better predictive modelling and surveillance. He will challenge the over-simple notion of precision medicine with a future scenario whereby a patient’s ‘health avatar’ might ‘refuse’ to integrate with a care provider’s care pathway.