Skip to main content

Big Data in the San Francisco Bay Area

Natural Language Processing (NLP), big data and precision medicine are three of the hottest topics in healthcare at the moment and consequently attracted a large audience to the first NLP & Big Data Symposium, focussed on precision medicine.

The event took place on August 27th, hosted at the new UCSF site at Mission Bay in San Francisco and sponsored by Linguamatics and UCSF Helen Diller Family Comprehensive Cancer Center.

Over 75 delegates came to hear the latest insights and projects from some of the West Coast’s leading institutions including Kaiser Permanente, Oracle, Huntsman Cancer Institute and UCSF.

The event was held in the midst of an explosion in new building development to house the latest in medical research and informatics, something that big data will be at the heart of.

Linguamatics and UCSF recognized the need for a meeting on NLP in the west and put together an exciting program that clearly caught the imagination of many groups in the area.


Over 75 delegates attended the Symposium

Key presentations included:

  • The keynote presentation from Frank McCormick, Director of UCSF Helen Diller Family Comprehensive Cancer Center, was a tour de force of the latest insights into cancer research and future prospects. With the advances in genetic sequencing and associated understanding of cancer biology, we are much closer to major breakthroughs in tackling the genetic chaos of cancer
  • Kaiser Permanente presented a predictive model of pneumonia assessment based on Linguamatics I2E that has been trained and tested using over 200,000 patient records. This paper has just been published and can be found here. In addition, Kaiser presented plans for a new project on re-hospitalization that takes into account social factors in-addition to standard demographic and diagnosis data
  • Huntsman Cancer Institute showed how pathology reports are being mined using I2E to extract data for use in a research data warehouse to support cohort selection for internal studies and clinical trials
  • Oracle presented their approach to enterprise data warehousing and translational medicine data management, highlighting why a sustainable single source of truth for data is key and how NLP can be used to support this environment
  • UCSF also provided an overview of the current approaches to the use of NLP in medical and research informatics, emphasizing the need for such approaches to deliver the raw data for advanced research
  • Linguamatics’ CTO, David Milward, presented a positional piece on why it is essential for EHRs and NLP to be more closely integrated and illustrated some of the challenges and approaches that can be used with I2E to overcome them
  • Linguamatics’ Tracy Gregory also showed how NLP can be used in the early stages of biomarker research to assess potential research directions and support knowledge discovery


Panel of speakers, from left to right – Tony Sheaffer, Vinnie Liu, Brady Davis, David Milward, Samir Courdy, Tracy Gregory and Gabriel Escobar

With unstructured data accounting for 75-80% of data in EHRs, the use of NLP in healthcare analytics and translational research is essential to achieve the required improvements in treatment and disease understanding.

This event provided a great forum for understanding the current state-of-the-art in this field and allowed participants to engage in many useful discussions.

West coast residents interested in the field can look forward to another opportunity to get together in 2014, or if you can get over to the east coast, the Linguamatics Text Mining Summit will take place on October 7-9 2013 in Newport, RI.