Enhancing problem list reconciliation with Natural Language Processing (NLP): Improve patient care quality with population health text mining analytics

The shift from volume to value-based compensation is driving provider demand for better insights into the health of patient populations. Providers recognize that access to more complete patient data can enhance their ability to deliver cost-effective care and high quality outcomes. This is especially true for patients with multiple chronic conditions, who typically have more complicated care needs and higher hospital utilization rates.

Figure 1 High risk patients are frequently suffering from complex comorbidities

Typically, physicians refer to problem lists when assessing a patient’s health and evaluating treatment alternatives. Problem lists rely on coded disease states and offer a concise view of a patient’s medical issues. Unfortunately, these lists are often incomplete or out of date. Consider, for example, a patient who is referred to an orthopedic surgeon for a broken wrist. If the problem list only includes details of the wrist injury, the physician may not be immediately aware of underlying chronic conditions, such as diabetes, that could impact the best course of treatment and outcomes.


Last week I attended the Cambridge Rare Disease Network (CRDN) 2016 SummitCRDN is a newly established charity working to build a community of people in Cambridge to address the unmet needs in rare disease research and treatment. As last year, there was a great set of speakers, from patient groups, academia, pharma/biotech and vendors.

There has been a step-change in awareness within the pharma industry in the last decade, with an increasing interest and investment in tackling rare diseases. I blogged last year about an interview with Patrick Vallance on this same topic.  He gave several reasons why GSK are interested in rare disease, and these three reasons all were echoed by the CRDN speakers.

The topmost of these reasons was given most clearly by speakers from patient groups, such as Daniel Lewi from the Cure & Action for Tay-Sachs (CATS) Foundation, Karen Harrison from ALD Life, and Emily Kramer-Kolingoff, from Emily’s Entourage. They spoke of the huge impact that rare disease has on individuals and families, and the urgent need for research into new or repurposed treatments for the 1 in 17 people affected by a rare disease.


Linguamatics are delighted once more to sponsor the Findacure Student Voice Essay Competition. Findacure is a UK charity that is building the rare disease community to drive research and develop treatments.   

The winning essay will be published in the Orphanet Journal of Rare Diseases, and the essay topics are:

  1. The impact of a rare disease is much more widespread than its direct symptoms. Discuss how, with particular reference to the patient experience.
  2. How can rare diseases lead the way in medical research and clinical innovation?
  3. How can clinicians and researchers, including students, help to deliver the UK Strategy for Rare Diseases?

One of the big challenges for the development of treatments for rare disease is the need for a thorough understanding of the natural history of each of the 7000 currently known rare diseases. It’s critical to have detailed systematic information on both the genotypic aspect (the genes and mutations), and the phenotypic aspect (pathways involved or disrupted, symptom severities, etc.).


BOSTON, MA and NEW YORK, NY--(Marketwired - October 18, 2016) - Linguamatics, a world leader in NLP Text Mining, and Sinequa, a leader in Cognitive Search and Analytics, today announced a partnership based on a tight integration between I2E and Sinequa ES. This integration will provide life sciences and healthcare organizations with deeper insights from their ever-increasing volumes of enterprise unstructured textual data content across the entire enterprise.

Linguamatics' I2E text mining platform enhances the Sinequa Cognitive Search & Analytics platform with its advanced text mining capabilities, providing an unparalleled foundation to build upon in life sciences. The combined strength of both platforms helps users get more precise, actionable and contextual information in their field. They can ask questions such as "what treatments are used for breast cancer?" or "what diseases are treated by drug X?"


I recently attended a talk by Linguamatics CTO David Milward on Structured Queries for Unstructured Data, delivered to the Data Insights Cambridge Meetup group.

The data science community wants to know:

  • How can we deliver insights from big data?

  • What are the optimal approaches to ‘handle’ (store, capture) and analyze (query, structure, repurpose) big data?

The amount of data we can store and generate is many times what we could store or capture just 10 years ago. SQL Database technology is able to handle structured data well and has not changed significantly since the 1980s.  It’s easier to deliver insights from structured data for basic queries than it is for unstructured data in free text sources.

Unstructured data is the new frontier for data science

What drew so many people to David’s talk is the promise of the ‘data insights’ that are locked away in unstructured data. The audience spanned various industries, from those dealing with astronomical data to financial data sources, to many people concerned with health and life science unstructured data. Many industries rely heavily on data to inform their day to day business decisions. For healthcare and life science, where Linguamatics is the text mining leader, transforming how we understand and improve upon population health and patient outcomes will primarily entail extracting data insights from unstructured data sources.