NLP & Big Data Symposium in San Francisco

June 29 2015

Life sciences and healthcare professionals gathered at the UCSF Mission Bay campus for the West Coast Natural Language Processing (NLP) & Big Data Symposium on June 18th. The symposium, co-hosted by UCSF, featured presenters from UCSF, Merck, City of Hope, Copyright Clearance Center and Linguamatics and delegates from a diverse range of organizations.

The central theme of this year’s symposium was “From bench-to-bedside, unlocking key insights from your data”. Healthcare delegates were keen to find new ways to address meaningful use and accountable care leveraging NLP text mining of electronic health records. Life sciences delegates were keen to increase the efficiency and effectiveness of their business operations by mining real world data. There was also a strong interest in forging partnership opportunities between pharma/biotech and hospitals/cancer centers.

Sorena Nadaf, the CIO and Director of Translational Informatics at UCSF Helen Diller Family Comprehensive Cancer Center delivered the welcome address and highlighted the foundation of clinical NLP and its common uses for extracting and transforming narrative information in EMR’s to support and accelerate clinical research.

NLP & Big Data Symposium
Sorena Nadaf at the NLP & Big Data Symposium in San Francisco.

Wendy Cornell, retired from Merck, described Merck’s development of a natural language processing (NLP) workflow to extract conclusions and interpretations from their large corpus of internal reports using the Linguamatics I2E software and the integration and analysis of the data using the ANZO platform from Cambridge Semantics. Automated extraction of conclusions and interpretations from internal preclinical safety reports using I2E was the primary use case discussed and generated a lot of interest and discussion.

Joyce Niland, Chief Research Information Officer, & Rebecca Ottesen, Biostatistician, from City of Hope (COH) presented a recent project with Linguamatics where they created a disease registry using Iterative Interactive Enrichment (IIE) of NLP queries shared across institutions. The I2E queries, initially written by the Huntsman Cancer Institute (HCI) and Linguamatics, identify immunohistochemistry (IHC) marker results from unstructured pathology dictations on malignant Non-Hodgkin’s Lymphoma patients. They were shared with COH, to assess their exportability from one institution to another. Linguamatics, COH, and HCI applied an IIE process through several phases to improve the IHC queries while sharing the improvements between institutions. Precision and recall were measured for each phase to assess the completeness and accuracy of information extraction, and to identify the most critical NLP features that impact these results. Final F Scores for both COH and HCI were .91 and .94 respectively. This impressive level of precision and recall across two institutions validates the Linguamatics approach of sharing it’s wealth of existing healthcare queries with I2E customers to help accelerate research and improve patient outcomes.

Chris Hilbert from Copyright Clearance Center presented CCC’s new RightFind XML for Mining service, the integration with Linguamatics’ I2E and how the combined solution improves the results of text and data mining queries and mitigates infringement risk. Chris demonstrated how customers can obtain and index full-text XML articles from multiple scientific publishers in I2E and avoid many of the data format and licensing issues associated with working with PDF’s. As existing licenced literature does not have to be repurchased, delegates saw this service as highly effective way of leveraging existing full text investments and extracting more value via I2E text mining.

To complement our customer and partner presentations, Linguamatics led presentations including an introduction to NLP text mining; healthcare NLP strategies to improve patient care, reduce costs and enhance population health; and Real World Data and text analytics.

It was wonderful to catch up with many of our customers, meet some new ones and help foster introductions and discussions between the various delegates. Keep an eye out for upcoming opportunities to meet with Linguamatics at our events page including our Princeton seminar on July 16 and Text Mining Summit and I2E Healthcare Hackathon in October.