The majority of these Clinical Natural Language Processing (NLP) data sets were originally created at a former NIH-funded National Center for Biomedical Computing (NCBC) known as i2b2: Informatics for Integrating Biology and the Bedside.
Based at Partners HealthCare System in Boston from 2004 to 2014, under the leadership of Principal Investigator Isaac Kohane, MD, PhD, and Executive Director Susanne Churchill, PhD, the i2b2 Center was a passionate advocate for the potential of existing clinical records to yield insights that directly impact healthcare improvement. Recognizing the value locked in unstructured text, i2b2 provided sets of fully deidentified notes from the Research Patient Data Repository at Partners for a series of NLP Shared Task challenges and workshops, which were designed and led by Co-Investigator Özlem Uzuner, MEng, PhD, originally at MIT CSAIL and subsequently at SUNY Albany. Those notes were then made available to the community for general research purposes, and have already enabled hundreds of journal and conference articles by the research community.
These data sets now remain under the stewardship of the Healthcare Data Science Program at the Department of Biomedical Informatics, where Drs. Kohane and Churchill are Chair and Executive Director, respectively. A new user registration, data access request process, and data use agreement are coming soon.
The NLP Shared Task challenges and workshops continue to be directed by Dr. Uzuner, now Associate Professor of Information Sciences and Technology in the Volgenau School of Engineering at George Mason University. The software development component of the former i2b2 Center is now under the direction of the i2b2 tranSMART Foundation, a member-driven non-profit foundation developing an open-source / open-data community around the i2b2, tranSMART and OpenBEL translational research platforms.