Mining Unstructured Patient Data for Successful Population Health Management

Mining Unstructured Patient Data for Successful Population Health Management

Shifting payment models based on quality and value are fueling the demand for insights into the health of populations. This demand requires the analysis of vast amounts of patient data. For example, before healthcare organizations can implement pre-emptive care programs, they must first identify the relative risk of their patient population. This is based on a variety of clinical, financial, and lifestyle factors, including:

  • Problem list of patients, especially chronic conditions
  • Procedures, medications and other hospital data
  • Claims information
  • Risk factors such as tobacco, alcohol and drug use
  • Availability and accessibility of health services and social support.

As illustrated in Figure 1, a healthcare population typically includes a relatively small percentage of the highest-risk patients, though these least healthy patients usually account for the biggest percentage of overall healthcare costs.


Figure 1: Level of patient risk associated with population segments and
their cost implications; a relatively small segment of the population
accounts for a disproportionate percentage of healthcare costs

Analyzing population health is difficult because of the heterogeneous nature of patient-related data. Historically, healthcare groups have relied heavily on electronic health records (EHRs) and claims data when trying to make sense of the health of their patient populations; more recently, providers and payers have adopted data warehousing solutions that offer a more comprehensive population view. However, because an estimated 80% of patient data is unstructured, integrating this critical health information into population management is quite a cumbersome task.

Unstructured text contains a wealth of clinical information based on physician narratives, patient-reported information, pathology, radiology and discharge reports. To thoroughly assess population risk, organizations must extract insights from data stored in both structured and unstructured formats. Consider, for example, an ACO that wants to assess the risk of Type 2 diabetes in its patient population. An analysis of structured data can reveal risk factors associated with weight, race, and age, but might miss risk factors that are typically noted in the physician narrative, such as number of pack years of smoking, limited access to healthy foods, barriers to physical activity, high stress levels, and social isolation.

The need to unlock insights from unstructured data is driving demand for Natural Language Processing (NLP) technologies such as Linguamatics’ I2E text mining platform.  I2E allows users to mine clinically-relevant details from unstructured text, making it a powerful tool for organizations participating in value-based payment models and needing to monitor and manage their patient populations on an on-going basis. 

Linguamatics is committed to helping healthcare organizations leverage all available data in support of their population health management initiatives. As payers and providers implement wellness initiatives and target at-risk individuals, they must efficiently and effectively stratify their patient populations based on all clinically-relevant data. Advanced NLP technologies like I2E give organizations the tools they need to understand the nuances impacting patient health. This understanding is critical in the implementation of a successful population health strategy.

Read more about how NLP can impact population health:

Download the whitepaper