Mining big data for key insights into healthcare, life sciences and social media at the Linguamatics San Francisco Seminar

September 3 2014

Natural Language Processing (NLP) and text analytics experts from pharmaceutical and biotech companies, healthcare providers and payers gathered together to discuss the latest industry trends and to hear the product news and case studies from Linguamatics on August 26th.

The keynote presentation from Dr Gabriel Escobar was the highlight of the event, covering a rehospitalization prediction project that the Kaiser Permanente Department of Research have been working on in collaboration with Linguamatics.

The predictive model has been developed using a cohort of approximately 400,000 patients and combine scores from structured clinical data with factors derived from unstructured data using I2E.

Key factors that could affect a patient’s likelihood of rehospitalization are trapped in text; these include ambulatory status, social support network and functional status. I2E queries enabled KP to extract these factors and use them to indicate the accuracy of the structured data’s predictive score.

Leading the use of I2E in healthcare, Linguamatics exemplified how cancer centers are working together to develop queries for pathology reports, mining medical literature and predicting pneumonia from radiology reports. They also demonstrated a prototype application to match patients to clinical trials and a cohort selection tool using semantic tagging of patient narratives in the Apache Solr search engine.

Semantic enrichment was discussed in the context of life sciences using SharePoint as the search engine. This drew great interest from the many life science companies in the audience, who understand the need to improve searching of internal scientific data. This discussion highlighted the challenges of getting a consistent view of internal and external data.

The latest version of I2E will address this challenge with a new federated capability that provides merged results sets from internal and external searches. These new I2E capabilities have strong potential to improve insight and they also incorporate a model that allows content providers to more actively support text mining.

Another hot topic was mining social media and news outlets for competitive intelligence and insights into company and product sentiment.

The mass of information now available from social media requires a careful strategy of multiple levels of filtering; this will enable extracting the relevant data from the millions of tweets and posting that occur daily. Once these have been identified this data can be text mined but users need to factor in support for abbreviations and shorter linguistic syntax.

Mining social media and news outlets is an area that will continue to grow and require active support.

Linguamatics were grateful for such an engaged and interactive audience and look forward to future discussions on these exciting trends. Keep an eye out for information about our upcoming Text Mining Summit.