The DBMI Open Insights Seminars are occasional research talks related to the mission of the Department of Biomedical Informatics. This includes themes such as:

  • Provisioning big data to the scientific community
  • Getting the big picture on human health
  • Learning from each patient
  • Advancing basic science with data science
  • Understanding disease beyond heredity: environmental impact
  • Instrumenting the health enterprise for discovery and intervention

This interdisciplinary seminar provides an open forum to engage participants in a discussion about the future of quantitative methods and engineering in biomedicine. The seminar features local, national, and international speakers who are leaders in their field and have an interest in engaging with the larger community while visiting DBMI to meet with colleagues.

Selected talks are available on our YouTube channel

Upcoming

Friday, May 17
11am-12pm
Countway Library, 5th Floor, Lahey Room

Peter J. Embi, MD, MS, FACP, FACMI
President & CEO, Regenstrief Institute
Leonard Betley Professor of Medicine and Associate Dean for Informatics & Health Services Research, IU School of Medicine
Associate Director of Informatics, Indiana CTSI
Vice-President for Learning Health Systems, IU Health

Leveraging Informatics to Enable Learning Health Systems: Now, it’s Personal

Dr. Embi is an internationally recognized leader in the field of biomedical informatics, and specifically clinical and translational research informatics. During this seminar, he will use his recent experiences as a patient as a launching point to discuss the need for a new research-practice paradigm, driven by informatics, to create a learning health systems. He will describe current approaches, challenges and opportunities to enable an evidence-generating-medicine model that complements evidence-based-medicine, with the goal of improving care and enabling discovery through practice. Dr. Embi will also describe how his research, experiences and roles have informed ongoing efforts to bridge academic and operational informatics to realize a learning health system.
 

Past Talks 

Friday, May 3 (Countway Library, 4th FL, Room 424) 

Zhiyong Lu, PhD
Senior Investigator, National Library of Medicine 

AI in Medicine: from PubMed Search to Autonomous Disease Diagnosis

The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly exists as free text in journal articles for humans to
read—presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. In this talk, I will present our work on developing large-scale, machine-learning based tools in NLP and medical imaging research. Moreover, I will demonstrate their uses in some real-world applications such as improving PubMed searches, scaling up human curation for precision medicine, and enabling image-based autonomous disease diagnosis.

_________

Thursday, March 21 (Countway Library, 4th FL, Room 403) 
Dmitri Pervouchine, PhD
Assistant Professor, Center of Life Sciences 
Skolkovo Institute for Science and Technology 

Integrative Transcriptomic Analysis Suggests Novel Autoregulatory Splicing Events coupled with Nonsense-Mediated mRNA Decay

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (1) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (2) RPS3 binding activates a poison 5’-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. Taken together, these findings indicate that autoregulatory negative feedback loop of alternative splicing and NMD is a ubiquitous form of post-transcriptional control of gene expression among splicing factors.

_________

7 March 2019, Countway Library, Room 424 (4th Floor)

Cody Dunne, PhD
Assistant Professor, College of Computer and Information Science
Northeastern University 

Temporal Event Sequence Visualization for Type 1 Diabetes Treatment Decision Support

The modern world is awash in complex data that can contain the keys to improving our lives. The scope of this data has rapidly outpaced our capabilities to analyze and comprehend, so we turn to computers to help. However, state-of-the-art technology can only supplement the human element. People assist in each stage of data science, whether it’s data cleaning, understanding algorithm design, exploring computed results, or collaborating and sharing for decision-making. To present complex information to humans, we use visualizations that leverage our extraordinary perceptual system which can detect trends, clusters, gaps, and outliers almost instantly.

My talk will focus on the specific problem of temporal event sequence visualization for treating type 1 diabetes. Type 1 diabetes is a chronic, incurable autoimmune disease affecting millions of Americans. Treatment requires frequent adjustments to insulin protocol, diet, and behavior in collaboration with a clinician. Manual logs and medical device data are collected by patients, but these multiple sources of data are presented in disparate visualization designs to the clinician—making temporal inference difficult. These issues are compounded when there is poor data quality such as missing data, erroneous data, or uncertainty in values or timestamps.

I will discuss a data and task abstraction for this problem using a novel hierarchical task abstraction approach. I will also demonstrate the interactive visualization tool we developed in this design study: IDMVis. IDMVis includes a novel technique for folding and aligning records by dual sentinel events and scaling the intermediate timeline. Using IDMVis, clinicians were able to identify issues of data quality such as missing or conflicting data, reconstruct patient records when data is missing, differentiate between days with different patterns, and promote educational interventions after identifying discrepancies.

29 November 2018, Countway Library, Ballard Room (5th Floor)

Can Alkan, PhD
Assistant Professor, Department of Computer Engineering
Bilkent University 

Algorithms to Characterize Genomic Structural Variation using High Throughput Sequencing TechnologiesStructural variation, in the broadest sense, is defined as the genomic changes among individuals that are not single nucleotide variants. Rapid computational methods are needed to comprehensively detect and characterize specific classes of structural variation using next-gen sequencing technology. We have developed a suite of tools using a set of aligners and algorithms focused on the characterization of structural variants that have been more difficult to assay, including complex rearrangements. In this talk I will summarize our work in developing combinatorial algorithms to discover of structural variation using high throughput sequencing technologies. The algorithms we have developed will provide a much needed step towards a highly reliable and comprehensive structural variation discovery framework, which, in turn will enable genomics researchers to better understand the variations in the genomes of newly sequenced human individuals including patient genomes.

_________

20 September 2018, Countway Library, Room 403 (4th Floor)

Anamaria Crisan, MSc, PMP
PhD Candidate  
University of British Columbia 

Creating Explorable Visualization Design Spaces: An Example from Infectious Disease Genomic Epidemiology
New technologies are enabling the collection of more complex and heterogenous data that healthcare decision makers can use to inform their routine practices and to develop new policies. To address the data analysis and interpretation challenges that attend  health care big data, researchers and decision makers are increasingly turning to data visualization to help them explore their data and communicate their findings. Yet, without a notion of a visualization design space, the resulting data visualizations are generated based upon the personal preferences of the individual creator and without a notion of good or bad alternative visualization designs. We have developed a method for systematically generating a visualization design space that uses a combination of text mining and visual analysis to identify both the context for which the visualization was created and to breakdown elements of how the visualization was constructed. Our approach exposes current common practices, can identify absent visualization designs, and also provides a way for conveying notions of good and bad visualization practice. We demonstrate our approach in action by applying it to a literature corpus of nearly 18,000 articles drawn from public health infectious disease genomic epidemiology and have operationalized our findings into a Genomic Epidemiology Visualization Typology (GEViT) and an accompanying visualization design space gallery available at http://gevit.net. We are the first to take such a systematic approach and both our method and the results of its application to genomic epidemiology have important implications for bioinformaticians, researchers, and other healthcare decision makers intending to develop or use data visualization tools.

_________

17 August 2018, Countway Library, Minot Room (5th Floor)

Igor Adameyko, PhD
Associate Professor, Department of Physiology and Pharmacology 
Karolinska Institutet, Sweden

Non-Canonical Functions of Nervous System: Insights from Single Cell Transcriptomics
Recent studies introduced a radically new concept for developmental biology in that defined precursor pools existing in a highly specialized niche use nerves as conduits to migrate and differentiate through temporally and spatially delineated nerve-Schwann cell communication. This concept carries significant weight since it also applies to the development of melanomas from subcutaneous nerve-associated cells, as well as for the genetic coding of pigmentation patterns. Can Schwann cells be genuinely multipotent? If so, this would transform the above discoveries into a global concept in which nerve-associated progenitors could generate various cell types during not only physiological development but also adulthood. Moreover, any such cell pool could be exploited to regenerate damaged tissues in tandem with regaining sensory nerve functions. This notion is plausible since nerves traverse the entire body from early embryonic development on and their in-growth coincides with the expansion of cell pools in the organs they target.

_________

11 May 2018, Countway Library, Room 403 (4th Floor)

Peter Krawitz, MD
Professor and Director, Institute for Genomic Statistics and Bioinformatics
University of Bonn

DeepGestalt - Identifying Rare Genetic Syndromes Using Deep Learning
Facial analysis technologies have recently measured up to the capabilities of expert clinicians in syndrome identification. To date, these technologies could only identify phenotypes of a few diseases, limiting their role in clinical settings where hundreds of diagnoses must be considered. We developed a facial analysis framework, DeepGestalt, using computer vision and deep learning algorithms, that quantifies similarities to hundreds of genetic syndromes based on unconstrained 2D images. DeepGestalt is currently trained with over 26,000 patient cases from a rapidly growing phenotype-genotype database, consisting of tens of thousands of validated clinical cases, curated through a community-driven platform. DeepGestalt currently achieves 91% top-10-accuracy in identifying over 215 different genetic syndromes and has outperformed clinical experts in three separate experiments. We suggest that this form of artificial intelligence is ready to support medical genetics in clinical and laboratory practices and will play a key role in the future of precision medicine.

_________

15 March 2018, Countway Library, Lahey Room (5th Floor)

Nicholas A. Christakis, MD, PhD, MPH
Sol Goldman Family Professor of Social and Natural Science, Yale University
Co-Director of Yale Institute for Network Science 

Social Network Interventions: Large-Scale Experiments from Global Health to Online AI-Bots

Human beings choose their friends, and often their neighbors, and co-workers, and we inherit our relatives; and each of the people to whom we are connected also does the same, such that, in the end, we humans assemble ourselves into face-to-face social networks. Why do we do this? And how might a deep understanding of human social network structure and function be used to intervene in the world to make it better? Here, I review recent research from our lab describing three classes of interventions involving both offline and online networks that can help make the world better: (1) interventions that rewire the connections between people; (2) interventions that manipulate social contagion, facilitating the flow of desirable properties within groups; and (3) interventions that manipulate the position of groups of people within network structures. I will illustrate what can be done using a variety of experiments in settings as diverse as fostering cooperation in networked groups online, to fostering health behavior change in developing world villages, to facilitating the diffusion of innovation or coordination in groups. I will also focus on recent experiments with “hybrid systems” comprised of both humans and "dumb bots," involving simple artificial intelligence (AI) agents interacting in small groups. By taking account of people's structural embeddedness in social networks, and by understanding social influence, it is possible to intervene in social systems to enhance desirable population-level properties as diverse as health, wealth, cooperation, coordination, and learning.

_________

8 March 2018, Countway Library, Lahey Room (5th Floor)

Shilpa Kobren
Ph.D. Candidate
Princeton University 
Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics  

Data-Driven Approaches for Uncovering Functional Variation in Protein Interactions

Proteins carry out a dazzling multitude of functions by interacting with DNA, other proteins and various other molecules within our cells. Together these interactions comprise complex networks that differ naturally across cells within an organism, across individuals in a population, and across species. Although such variation is critical for normal organismal functioning, mutations affecting protein interactions are also known to underlie a wide range of human diseases. In my talk, I will present novel computational approaches that explore the extent to which specific protein interactions vary across species, across healthy individuals, and across individuals with cancer. To start, I will focus on interaction variation across species. We developed and applied a comparative genomics framework to systematically quantify changes in protein-DNA interactions across closely related species. This work demonstrates that contrary to popular convention, functional gene regulatory divergence can stem from changes in non-duplicated DNA-binding proteins; such changes were previously believed to be largely detrimental. Next, I will turn my attention to interaction variation across individuals. First, to comprehensively identify interaction sites in human proteins, we combine large-scale sequence, domain and structure information to provide a biologically relevant assessment of per-position binding potential across protein sequences. This enables us to pinpoint sites involved in binding DNA, RNA, peptides, ions, metabolites, or other small molecules in 60% of human genes, representing the largest resource of this type to date. We show that whereas inferred interaction sites are significantly depleted of natural variants across ~60,000 healthy individuals, these same sites are significantly enriched for cancer mutations across ~11,000 tumor samples. In the last part of my talk, I show how we can exploit these opposing trends to uncover genes whose interaction interfaces are significantly altered in tumors. To this end, we develop a novel analytical framework that integrates our domain binding potentials with additional sources of data. Our method recapitulates known cancer driver genes with high precision as well as discovers perturbed molecular mechanisms in relatively rarely-mutated genes, thereby enabling valuable insights that may help guide personalized cancer treatments.

_________

1 March 2018, Countway Library, Room 403 (4th Floor)

Marc Streit, PhD, Johannes Kepler University
Wolfgang Aigner, PhD, MSc, St. Pölten University of Applied Sciences
Dominic Girardi, MSc, RISC Software GmbH

Injecting Life into Visualizations for Biomedical Research 

Visualization is an important data analysis method that allows scientists to explore a dataset without preconceived questions, and is thus crucial for hypothesis generation. Visualization is also essential in communicating research findings. Current visualization tools, however, have a crucial shortcoming: the interactive visual exploration process is not captured, which means that the analysis steps cannot be shared. Being able to reproduce visual analysis sessions and enabling third parties to understand, modify, and extend analysis sessions can have a significant impact on transparency, reproducibility, and innovation of analysis processes. In the first part of the talk we introduce our efforts towards making this vision a reality, demonstrated by means of examples from drug discovery. In the second part of the talk we will summarize our works on developing novel ways to visually analyze large and heterogeneous biomedical datasets. In particular, we will introduce solutions for exploring health-related sensor data and electronic health records, for comparing large tabular data, and for ontology-guided clinical data analysis.

_________

15 February 2018, Countway Library, Room 403 (4th Floor)

Kees van Bochove
Founder and CEO 
The Hyve 

GO-FAIR on biomedical data with the Personal Health Train
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.

_________

1 February 2018 , Countway Library, Minot Room (5th Floor) 

David Benjamin, PhD & Samuel Lee, PhD 
Computational Biologists  
Broad Institute Data Science Platform

The New and Improved GATK 4
GATK 4 is an expanded, improved, scalable, and fully open-source version of the popular genomics software. In addition to the germline variant discovery tool HaplotypeCaller, it now contains tools for copy-number, structural, and somatic variants. A rewritten engine makes developing new GATK tools easier than ever. Furthermore, the WDL workflow language enables GATK tools to be combined in robust and maintainable pipelines that run equally well on a laptop or on the cloud. After a brief survey of GATK 4, we will discuss its new pipelines for germline and somatic copy-number variants and its updated version of Mutect2 for somatic SNVs and indels, highlighting their algorithms and upcoming developments. 

_________

14 December 2017, Countway Library, Room 403 (4th Floor)

Mauricio Santillana, Ph.D.
Assistant Professor, Harvard Medical School
Faculty Member, Boston Children's Hospital Computational Health Informatics Program
Associate, Harvard Institute for Applied Computational Science  

Machine Learning Approaches for Early Detection of Events in Healthcare. Epidemiological and Clinical Applications

I will describe machine learning methodologies that leverage Internet-based information from search engines, twitter microblogs, crowd-sourced disease surveillance systems, and electronic medical records, to successfully monitor and forecast disease outbreaks in multiple locations around the globe in near real-time. I will also present machine learning methodologies that leverage continuous-in-time information coming from bedside monitors in Intensive Care Units (ICU) to help improve patients' health outcomes and reduce hospital costs. I will describe how these methodologies can be used to determine whether a patient in the ICU is ready to be extubated or not, or to estimate the length of stay upon a patient's admission. If time allows, I will discuss some other machine learning methodologies capable of estimating, ahead of time, the volume of daily emergency visits to Boston Children's Hospital or the likelihood that a patient may not show up to an appointment in an ambulatory clinic.

_________

7 December 2017, Countway Library, Ballard Room (5th Floor)

Nick Loman, PhD
Professor of Microbial Genomics and Bioinformatics 
University of Birmingham  

Pore! What is it Good For? or The Sequencing Singularity? 

Sequencing may be the ultimate clinical assay, providing rich information for both diagnosis, genotyping and surveillance of pathogens in a single test. In this talk I will detail how our work with portable in-field nanopore sequencing led to new insights into Ebola evolution that were fed in real-time into outbreak response efforts. Further work on Zika demonstrated huge gaps in our knowledge of circulating pathogens in human populations, but reinforced technical difficulties in recovering whole genomes directly from clinical samples with an untargeted approach. Ultimately though, metagenomics approaches should be viable for the diagnosis and recovery of whole pathogen genomes from clinical samples. I will also discuss the role ultra-long read single molecule sequencing may have on diagnostic sensitivity, as well as the new opportunities offered by direct RNA sequencing which may also allow us to monitor host response to infection. Taken together, recent technological advances make the prospect of a ‘sequencing singularity’ a tantalising prospect.

_________

30 November 2017, Countway Library, Room 403 (4th Floor) 

Georg K. Gerber, MD, PhD, MPH
Assistant Professor, Harvard Medical School
Co-Director, Massachusetts Host-Microbiome Center

The Dynamic Microbiome

The microbiome, or microbial organisms living in and on us, play important roles in human health and disease. There is increasing interest in harnessing the microbiome for therapeutic purposes, yet analysis of these complex and inherently dynamic host-microbial ecosystems presents numerous challenges. In this talk, I will discuss novel Bayesian machine learning methods that my lab has developed to address some of these challenges, including discovery of microbial temporal signatures associated with perturbations or disease; and for optimizing the design of bacteriotherapies and predicting their dynamic behaviors in the host. If time permits, I will also outline some experimental synthetic biology approaches for high-throughput discovery of in vivo microbial functions and for engineering bacterial consortia in the mammalian gut.

_________

5 October 2017, Countway Library, Minot 403 (4th Floor) 

Jessica Polka, PhD
Director, ASAPbio
Visiting Scholar, Whitehead Institute

Preprints in the Life Sciences

Our traditional publication system keeps new research hidden from public view long after it is ready to be evaluated by our peers. This has adverse consequences not only for individual careers, but also for the overall speed of scientific communication and discovery.

Preprints, or manuscripts posted online before the completion of journal-organized peer review, offer a solution to this problem. In this interactive discussion, we will address the benefits of preprinting, concerns and challenges surrounding their use, and new developments - including rapidly changing funder and journal policies.

_________

5 May 2017, Countway Library, Room 403 (4th Floor) 

Naomi Penfold, PhD
Innovation Officer, eLife Sciences

Accelerating Discovery at eLife with Open-Source Technology

Backed by the Howard Hughes Medical Institute, the Max Planck Society, and the Wellcome Trust, eLife aims to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. The online-only open-access eLife journal for outstanding advances in life sciences and biomedical research was just the first step in this mission. Now, we also actively champion the development of open-source tools, technologies and processes aimed at improving the discovery, sharing, consumption and evaluation of scientific research.

Naomi Penfold will discuss how eLife is working to accelerate discovery, increase transparency and improve incentives in the life sciences through the use and development of cutting-edge technologies. From preprints to reproducible analyses to artificial intelligence, she will present challenges and opportunities for the next innovators. We invite the community to contribute feedback and ideas for future innovations, and we welcome the opportunity to form new collaborations with the best emerging talent at the interface of research and technology.

_________

28 March 2017, Countway Library, Lahey Room (5th Floor) 

Patricia Brennan, RN, PhD
Director, National Library of Medicine (NLM)
National Institutes of Health

DataScience@NIH: Strategies for Sustainability in an Era of Data-Driven Discoverypproaches through which individual researchers and research programs can adopt this paradigm.

_________

16 March 2017, Countway Library, Lahey Room (5th Floor) 

Martha Gray, PhD
J. W. Kieckhefer Professor of Health Sciences and Technology
Massachusetts Institute of Technology

From Research to Impact: Increasing the Odds and Accelerating the Pace

Many of us who pursue research careers in the biomedical arena seek to have a positive impact on human health. However, the road to ‘real world’ impact is longer and more tortuous than most imagine. In this seminar, Professor Gray will describe several ongoing initiatives designed to increase the potential for work to reach ‘real world’ impact more efficiently and effectively. These initiatives are rooted in a new paradigm for academic research, one that addresses the cultural, conceptual, and methodologic gaps that impede the road to impact. In addition to showing outcomes from the ongoing initiatives, we will discuss short-term and long-term approaches through which individual researchers and research programs can adopt this paradigm.

_________

7 March 2017, Countway Library, Lahey Room (5th Floor) 

Maria Nattestad, PhD
Bioinformatics Graduate Student
PhD, Computational Biology
Cold Spring Harbor Laboratory

Computational Methods and Analysis Tools for Studying Complex Variation in Cancer Genomics

Advances in single-molecule sequencing have produced a resurgence of reference quality genome assemblies, but the promise of these technologies to biomedical applications have yet to be realized. We initiated a pilot project between CSHL and OICR to perform long-read sequencing of a breast cancer cell line using PacBio SMRT technology, and we have since been developing a wide range of computational methods from alignment and variant-calling to genome connectivity analysis algorithms and visualization tools. This seminar will focus on the specialized visualization and interactive analysis tools I developed to further our understanding of complex variation in cancer. One of our open-source online tools, SplitThreader, enables a genome-wide view of long-range variants in a cancer genome and uses a purely client-side algorithm to search across the landscape of rearrangements for genomic evidence of gene fusions. The SplitThreader visualizer represents my answer to the question of how to intuitively show long-range variants to visually determine how they match copy number changes. This visualization combining copy number and the long-range variant breakpoints across any pair of chromosomes was an idea that progressively evolved from drawings on paper, to cumbersome static plots of graphical primitives in R, and finally to an interactive D3 visualization tool. I will also highlight my most recent work on Ribbon, an alignment visualizer that specializes in long reads and complex variants. Ribbon was created to address the challenge that I saw with our long-read cancer sequencing analysis: the true power of long reads is hidden when the tools we use are built for short reads. Unique among read alignment visualization tools, Ribbon not only shows alignments on the reference perspective but also along the read, with dot plots and a new intuitive visualization inspired by cartoons from the literature. Future work includes expanding the use of visualization and intuitive web applications to additional fields of biomedical science and someday to use them to help empower physicians and patients with the best that biomedical informatics has to offer.

_________

28 February 2017, Countway Library, Lahey Room (5th Floor) 

Po-Ru Loh, PhD
Postdoctoral Research Associate
Department of Epidemiology
Harvard T.H. Chan School of Public Health

Fast Statistical Methods for Big Data Genetics

We are currently in a golden age of quantitative genetics. Rapid technological advances in genotyping and sequencing have exponentially increased data sizes over the past decade. National biobank efforts in the US, UK, and China are generating richly phenotyped genetic data sets with sample sizes on the order of millions. Now the challenge is analysis. A wealth of analytical approaches exist, but time has shown that whenever genetic data sizes increase by an order of magnitude – in other words, every 3–4 years – algorithms need to be re-engineered and inferential limits need to be redefined. I will describe three ongoing research thrusts in the fields of mixed model analysis, haplotype phasing and imputation, and mosaic aberration detection, each taking place at the intersection of statistics, computer science, and biology.

_________

23 February 2017, Countway Library, Ballard Room (5th Floor) 

Andrew Beam, PhD
Research Fellow
Department of Biomedical Informatics
Harvard Medical School 

Deep Learning 101

Deep learning has revolutionized many fields of computer science and machine learning in the past 5 years. Though neural networks were once relegated to near obscurity, recent methodological and engineering advancements have resulted in rapid progress and parity with humans on an impressively broad set of tasks. These breakthroughs are now woven into our everyday lives in devices such as smart phones, messaging apps, and personal assistants. The field continues to advance at a dizzying pace and remaining current with an ever-advancing set of best practices can be daunting. In this talk I will give an overview and historical perspective of deep learning that attempts to contextualize recent progress and current state of the field. Special attention will be given to key papers that introduced methods or ideas that significantly advanced the state of the art. Illustrative examples of deep multilayer perceptrons and convolutional neural networks will be given with code to highlight key implementation details that have been crucial for the recent success of deep learning.

_________

21 February 2017, Countway Library, Room 403 (4th Floor) 

Artem Sokolov, PhD
Director of Informatics
Harvard Program in Therapeutic Science (HiTS)
Harvard University

Characterizing Stemness Properties of 33 Tumor Types 

Some of the more agressive and treatment-resistant forms of cancers arise when cells undergo de-differentiation and acquire "stem-like" properties. Characterization of these properties can help advance our understanding of tumor recurrence and proliferation as well as guide future theraputic solutions to effectively deal with tumor initiation and self-renewal of cancer stem cells.
UsingmRNA expression and DNA methylation data, we developed machine learning predictors capable of quantifying the stemness state of cells. We applied these predictors to the TCGA PanCanAtlas dataset that spans ~10,000 patients across 33 different tumor types. The emergent stemness profiles were then compared against a wide array of molecular and clinical features, leading to key observations on tumor microenvironment, EMT transition, immuno-suppression, mutation status and patient survival as they relate to cancer de-differentiation states. 

_________

7 February 2017, Modell Immunology Center, Modell 100A Fred S. Rosen Lecture Hall

Sandeep Robert Datta, MD, PhD
Assistant Professor of Neurobiology
Department of Neurobiology
Harvard Medical School

Using Ethology to Model Disease

The Datta lab studies how information from the outside world is detected, encoded in the brain, and transformed into meaningful behavioral outputs. Here we describe a new approach we have recently developed, which combines 3D machine vision with unsupervised machine learning, to characterize the underlying structure of mouse behavior. Using this approach we have discovered that mouse behavior can be segmented into a fundamental set of components that we call “behavioral syllables.” Each behavioral syllable is a brief and well-defined motif of 3D behavior that the brain places in into specific sequences via definable transition statistics (or behavioral “grammar”) to flexibly create complex patterns of action. By characterizing mouse behavior in terms of its component parts, we can use our behavioral characterization technique to identify subtle differences in the pattern of motor output under different experimental conditions with an unprecedented level of sensitivity, suggesting that this technology will be useful for drug development. By combining this method with in vivo imaging of corticostriatal circuits in behaving animals, we can also identity neural correlates for the sub-second structure of behavior identified by our algorithms, suggesting this behavioral analysis technique will provide direct insights into the relationship between neural circuit activity and patterns of action. Thus our method will afford insight into mechanisms that allow animals to flexibly navigate the outside world, enable better characterization of mouse models of disease, and serve as a quantitative prism through which the function of genes and neural circuits can be understood.

_________

2 February 2017, Countway Library, Minot Room (5th Floor) 

Yilong Li, PhD 
Principal Scientist, Seven Bridges Genomics  

Patterns of Somatic Rearrangements in 2,600 Cancer Genomes 

Chromosomal rearrangements are widespread in cancer genomes. High throughput sequencing has led to the discovery of many somatic rearrangements that initiate or promote cancer development. However, so far their complexity has impeded the comprehensive characterisation of their patterns and distributions in cancer. As a consequence, the mechanistic and selective forces shaping cancer genomes are still incompletely understood.

Here I will talk about the first large-scale analysis of somatic rearrangements as part of the International Cancer Genome Consortium PanCancer Analysis of Whole Genomes project. In order to systematically describe somatic rearrangements, I developed statistical algorithms to group rearrangement junctions into distinct events, and designed a nomenclature system to unambiguously identify patterns of any given combination of copy number changes and rearrangement junctions. I implemented a general purpose object-oriented framework to computationally model these concepts, allowing me to compute quantitative metrics for studying the mechanistic nature of different rearrangement patterns.

Applying these methods to 2,600 cancer genomes revealed that while simple deletions and tandem duplications are the most common rearrangement types, a significant proportion of the total rearrangement junction load are caused by complex rearrangement events, reaching up to 80% in certain cancer types. In addition, I identified several related rearrangement patterns consisting of multiple inversion-type rearrangements and, through statistical analysis, showed that such patterns likely arise from a series of polymerase template switches rather than from multiple independent rearrangement events. These results illustrate how cancer genomes are affected by different types of DNA damage, which may be reflective of clinically relevant defects in DNA repair or checkpoint pathways. 

_________

27 January 2017, Countway Library, Minot Room (5th Floor) 

Iain Buchan, MD 
Professor of Public Health Informatics, University of Manchester 
Director, Health e-Research Centre 
Co-Director, Farr Institute of Health Informatics Research 

Civic Informatics of Health 

Professor Buchan will argue that health(care) systems cannot be optimised (or ‘learn’) independently of the civic systems in which they operate. He will explore the need for informatics to enable fuller understanding of the links between health and place, not only to provide actionable analytics for better care but also to advance discovery science. Beyond better clinical epidemiological resolution of person and place, Professor Buchan will emphasise the triad of time-place-person, with its challenges and opportunities of combining frequent, patient/citizen-derived information with infrequent clinical observations – tapping into the rhythms of disease and daily life for better care. He will use practical examples from the UK NHS to show how ‘natural’ health systems, covering 2-7m regional populations, with deeply integrated health data, and interoperable analytics, might borrow strength from each other for better predictive modelling and surveillance. He will challenge the over-simple notion of precision medicine with a future scenario whereby a patient’s ‘health avatar’ might ‘refuse’ to integrate with a care provider’s care pathway.