Datathon/Hackathon Sep 14–15

Biomedical Data Visualization and User Interfaces using the PIC-SURE API as a backend service
Collage of NHANES JSON data from PIC-SURE demo, UpSet visualization sample, and DBMI Github repo
Brought to you by DFCI, HMS DBMI, and BCH CHIP

Thursday–Friday, September 14–15, 2017
HMS Department of Biomedical Informatics (DBMI)
3rd, 4th and 5th floors of Countway Library
10 Shattuck Street

Kickoff: 9:00 am Thursday, Sep 14
Lahey Room, Countway 5th floor

Contestants in this combined datathon/hackathon are challenged with providing a user interface data exploration tool fed by PIC-SURE queries. Any data manipulations must be performed either inside the web browser or through the new PIC-SURE Scripted Query functionality (JS and possibly Python or R). Each team should contain at least one scientist (PostDoc, PhD student, etc.) and at least two developers.

Prize categories
  • Best Insight: which visualization tool delivered the most compelling discoveries? Team would present how they made the discovery using the tool. (judged by experts)
  • Best Engagement: which tool is the most engaging and motivates users to spend time on exploration and discovery (judged by "audience", i.e. all participants, judges, organizers)
  • Best Design: which tool features the best visual design and is most appealing (judged by "audience", i.e. all participants, judges, organizers)
  • Best Student-Led Team in one of the three categories above.

Names of awardees will be added to a new Datathon/Hackathon Hall of Fame wall panel in the Department of Biomedical Informatics and to the DBMI website.


This contest is focused on using the new Scripted Query beta feature of the PIC-SURE API, which allows users to submit code that can be used to perform arbitrary transformations on data retrieved through PIC-SURE as part of their query. This will not require contestants to modify the PIC-SURE API itself.

Potential Ideas
  • Visualization of available genomic and phenotypic data (patient counts, number of variables, etc. )
  • Select demographic and clinical variables to run a query and get back patient counts
  • Integration of UpSet visualization approach to visualize intersecting sets of patients
  • Create a dashboard to visualize quantitative data (lab values, expression levels, etc.) in histograms, scatter plots, or other basic plots and stratified by categorical variables (genotype, race, etc.)
  • List not exhaustive, be creative!

DFCI is currently working on using the PIC-SURE API in development of a replacement front-end for I2B2. Additionally work has been done in accessing genotype and phenotype data via the PIC-SURE API through Jupyter Notebooks utilizing R and Python. For example, an R library (R-Cupcake) has been developed at HMS DBMI which facilitates access to the API through R kernels in Jupyter Notebooks. It is hoped that through this datathon/hackathon, next level functionality will be built using the PIC-SURE API for data access.

HMS DBMI is  developing an open-source infrastructure that will foster the incorporation of multiple heterogeneous patient-level clinical, omics and environment datasets. This system embraces the idea of decentralized datasets of varying types, and the protocols used to access them, while still providing a simple communication layer that can handle querying, joining, and computing. The BD2K PIC-SURE RESTful API implementation is called Inter Resource Communication Tool (IRCT).

See our example use case for Retrieving NHANES PCB-153 levels for two different ages (20-39 and 40-59) at

Another API example using NHANES is here:

Learn more at  

Resources Available for Contest

Contestants (teams) will be provided with access to the contest datasets through the PIC-SURE API and an m4.large Docker Engine with a PIC-SURE API instance and choice of static file servers to deploy UI components to with all necessary proxy and other configurations already in place. Contestants will only need to develop their UI features and PIC-SURE API queries and response transformation scripts. Access to reference datasets - including the National Health and Nutrition Examination Survey (NHANES) dataset from the US Centers for Disease Control and Prevention (CDC) (41K patients), the Simons Simplex Collection (SSC) dataset from the Simons Foundation Autism Research Initiative (SFARI) (9K fully annotated Exomes linked to 6K clinical variables), and the Exome Aggregation Consortium (ExAC) from the Broad Institute - will be made available to the contesting teams through the PIC-SURE API. Docker configuration and deployment code will be provided for deploying static content to your web server of choice which will be exposed on port 443 with proxy configuration for the paths “/nhanes/rest” and “/ssc/rest” already being directed to the PIC-SURE API containers. A brief tutorial will be given at the start of the contest demonstrating how to deploy to this infrastructure, authenticate with the API and run queries from JavaScript in a web browser. OAuth2 access tokens will be provided for each team to share, but all participants must accept data use agreements (etc.) separately prior to the contest.

Questions: contact

Register at to get a free T-shirt!