Population Health Management and Analysis

Population health management today is the culmination of several initiatives over the past thirty years, which started with evidence-based medicine in the 1990s. Patient-centered care and value-based medicine built on this evidence-based approach by adding the quality and value of medical intervention from a patient’s perspective. Recently, precision medicine has shifted the focus still further by connecting genetic details with the environmental and lifestyle factors that affect the health of individuals.

Healthcare payment models reflect this emphasis on quality and value, which fuels demand for comprehensive insights into population health. The provider and payer markets within healthcare are now in the midst of a huge transformation, with 60% of commercial plans now linking payments to value.

Healthcare organizations have relied on structured data in Electronic Health Records (EHRs) and insurance claims to analyze the health of patient populations or make clinical decisions. Structured data is valuable but an estimated 70% of the clinical data stored in EHRs is in "unstructured" form and therefore difficult to analyze. But this unstructured data contains a wealth of clinical information including:

  • Encounter based: clinician narratives, nurse notes
  • Procedure/situation reports: pathology, radiology and discharge reports
  • Patient narratives: patient-reported information (PRI) and patient-reported outcomes (PRO).

To enhance population health analysis and identify the care needs of individuals, providers and payers must extract insights from the data stored in this unstructured text. Social factors, lifestyle choices and living conditions all play a major part in the clinical risk for populations and individuals, and yet they are trapped in text-based form.

In addition, ensuring the patient’s problem list is consistent with their notes is vital when managing complex disease comorbidities. The demand for population-level insights and real-time patient surveillance is increasing. To unlock value from unstructured text, organizations will look to advanced Natural Language Processing (NLP) technologies such as the Linguamatics NLP platform.

The Evolution of Healthcare Payment Models


Healthcare has traditionally relied upon fee-for-service compensation models to pay providers. Today, the government and private payers are shifting to alternative pay-for-value models that offer those same providers financial incentives for proactively monitoring the health of their patients, achieving quality clinical outcomes and controlling the cost of care.

To meet their quality and performance objectives, providers must now analyze vast amounts of population health data. With more access to the data in Electronic Health Records (EHR), payers are also offering care coordination services to improve the overall health of their members. As with providers, payers are targeting high-risk populations and extending additional opportunities for education and support.

Identifying Population Risk Factors

Before healthcare organizations can implement pre-emptive care programs, they must first identify the relative risk of patient populations based on a variety of clinical, socio-economic and lifestyle factors.

Healthcare populations often include a small percentage of highest-risk patients. This percentage often accounts for the largest percentage of healthcare costs.

Healthcare populations often include a small percentage of highest-risk patients. This percentage often accounts for the largest percentage of healthcare costs.

Once healthcare organizations have stratified populations based on their relative health, they can start evidence-based care plans to improve outcomes for the at-risk individuals and install preventive programs for the healthier patients.

By following best-practice protocols, payers and providers can help patients avoid costly complications and hospitalization. Analysis of population health data also helps organizations to assess how a particular value-based care plan will impact their bottom line.

Challenges to Population Health Analysis

For both providers and payers, population health analysis is often difficult because of the heterogeneous nature of patient-related data. It's difficult to access or analyze unstructured text without advanced technologies such as NLP.

The ability to automatically extract precise data from unstructured text is invaluable for organizations participating in value-based payment models. By leveraging NLP, providers can look at both the structured and unstructured data for a complete patient population outlook. They can then identify and extract specific details to assess risk or improve population health.

Similarly, they can assess critical details on individual patients related to lifestyle choices such as smoking and alcohol consumption. And look at insights into a patient’s living arrangements, access to care and mobility status.

Healthcare is moving away from a reimbursement model that rewards procedures to one that rewards quality and outcomes. No longer will health care be about how many patients you can see, how many tests and procedures you can order, or how much you can charge for these things. Instead, it will be about costs and patient outcomes: quicker recoveries, fewer readmissions, lower infection rates, and fewer medical errors, to name a few. In other words, it will be about value. Toby Cosgrove, Harvard Business Review (source).

Population Health Analysis in the USA - a Review of Common Conditions

According to the Centers for Disease Control and Prevention (CDC), over 29 million Americans are currently living with diabetes. Another 84 million are pre-diabetic, and even more may be undiagnosed and untreated. The condition also accounts for more than 20 percent of healthcare spending.

Diabetes risk is closely tied to social and economic circumstances. It is more common among non-white populations, with black, Hispanic, and Native American populations experiencing the disease at much higher rates than whites.

Many symptoms that may be early symptoms of (or complications of existing) diabetes are found in unstructured data, e.g. mentions of excessive thirst or hunger, frequent urination, fatigue, and blurred vision. Mentions of laboratory values such as hemoglobin A1c and blood glucose levels are also markers that are easily missed when you are trying to manage the diabetic population.


One in every three American adults has hypertension, the CDC reports. This condition is highly correlated with other cardiovascular conditions, including heart disease and stroke, two of the leading causes of death in the USA.

Hypertension is commonly seen in non-Hispanic black males, and black individuals are twice as likely to die from the condition as whites.

Controlling high blood pressure is not only important for the patients’ health but also necessary when you are responsible for managing and accounting for a hypertensive population. Hospitals and payers are also accountable for reporting outcome measures as part of their Centers for Medicare & Medicaid Services (CMS) responsibilities for better health. Blood pressure measurements are yet another example of where NLP can be beneficial.

Heart failure

Latest statistics from the CDC show that there are approximately 5.7 million adults suffering from heart failure in the USA (heart failure does not mean that the heart has stopped pumping, but – rather - that it is no longer able to supply enough oxygen to the body's organs). It is estimated that half of all people with this condition will die within 5 years of diagnosis.

The USA spends about $30.7 billion annually on this condition. This cost is estimated from services, medications and lost working days.

Monitoring heart failure symptoms and vital tests such as ejection fraction (EF) are key when managing a heart failure patient population. This monitoring is also essential when evaluating a lifesaving cardiac medical device. See the ground-breaking work of Mercy Health with the help of Linguamatics NLP in our blog.


Opioid Addiction

Opioid addiction is one of the America's biggest health crises. Providers can act as the first line of defense against opioid abuse. Organizations such as the FDA have considered addressing the epidemic with mandatory opioid education for all healthcare professionals.

The fight against addiction is a difficult one. Organizations need proper AI tools that they can utilize to augment human intervention. Clues to addiction are often present in clinical notes long before an addiction is noted in the problem list. It is not uncommon for major events such as visits to the emergency department/ or an inpatient stay to be the triggering event that is the reason a drug addiction is added to one’s problem list.

NLP can help identify patterns that may not be easily identified from a few encounters. To learn more, please refer to our interview in HiTech Answers:


COPD and Asthma

The CDC reports that nearly 15.7 million Americans have received a chronic obstructive pulmonary disease (COPD) diagnosis, while asthma affects about 25 million individuals in the US.

These chronic conditions cost the healthcare industry billions each year. They are also associated with individuals’ environmental circumstances and are often exacerbated by exposure to air pollutants in the home and workplace.

Pulmonary Function Tests (PFT’s) such as spirometry are necessary in detecting the severity and progression of airway issues. Essential values for population management obtained from PFTs such as Forced Expiratory Volume in One second (FEV1) and the Forced Vital Capacity (FVC) are often reported in notes but not present in structured form.

Depression and Other Mood Disorders

Between 2009 and 2012, depression affected 7.6 percent of Americans aged 12 and older. The mood disorder is more prevalent among minority and lower-income populations, and is also associated with higher rates of chronic disease, as well as increased healthcare utilization. 

A mental health diagnosis often takes anywhere from several sessions to years - and by a trained specialist. Many individuals’ suffering with mental health issues never see a specialist but rather stay in primary care. Sometimes the first time a mental health condition enters structured data form in a patient chart is after a major event. Be it a trip to the emergency department for a suicide attempt or after an inpatient psychiatric stay.

From patient notes to social media, patterns of warning signs and symptoms are often documented well in advance. NLP can be a useful tool to help detect an issue before a disaster happens.

The Challenges of Real-Time and Big Data for Population Health

Traditionally, healthcare organizations have relied on Electronic Health Records (EHR) and claims data to analyze patient populations and the health of individuals. Claims data and EHRs have been an adequate source of data for population health analytics in the past. But now the demand for detailed, actionable information has escalated.

Patient Engagement Portals

Today, patient engagement portals such as Accenture’s Intelligent Patient, All scripts FollowMyHealth, Epic’s MyChart and Athenahealth offer a host of web-based tools allowing patients to play an active role in their own healthcare. The data available to patients may include their lab results, physician notes, discharge summaries, immunizations and overall health history.

Research shows that when patients are able to see their own health data, they can take ownership of their health and are better prepared to interact with their providers about their care. In addition, as patient engagement and interaction with their own medical records grows, there will be more electronic communication about progress, lifestyle changes, medication adherence and adverse events.

New sources of patient insights are growing, but often patient-reported information is in an unstructured format.

The Role of Natural Language Processing (NLP)

Providers can use NLP to review this new patient-reported data, and gain insights on everything from mental state and fall risk to firearms access. Payers can leverage NLP to analyze member-supplied data, including from sources such as online chats between patients and nurses. NLP can even be used to review social media posts and provide relevant insights about exercise routines, diet and social behavior.

At the same time, the volume of available data is increasing and so is the need to analyze unstructured patient data in real-time. By deploying sophisticated, predictive clinical models, providers can identify which patients are at higher risk for medication non-compliance or 30-day hospital re-admission. Relevant data, such as details on a patient’s social support network, ambulatory status and living conditions can be mined from discharge summaries and analyzed alongside relevant lab and diagnostic details.

Established technology for mining healthcare information is focused on structured data. So, NLP and associated results must integrate with these systems for actionable insights. Source documents are in an EHR, file system, data warehouse or Hadoop data lake and these systems are often the destination for NLP results. A data warehouse is often the single source of truth for population stratification algorithms running in R, Python or SAS. There is also expanding use of Cloudera HIVE as an analytics environment as Big Data tools continue to be applied in this area.

Source documents need to be loaded into an NLP engine for analysis at either population scale or for real-time processing as new patient-related documents are generated. This means that NLP systems must support both large scale batch processing and real-time analysis. Web Service APIs are key to this type of integration and are ideally Service Oriented Architecture friendly to provide flexibility, fail over and recovery capabilities in production environments.

The output from the NLP engine is usually loaded into a data warehouse via standard ETL processes and aligned with existing structured data via MRN or other patient/member identifiers. Once the structured and unstructured sources are combined, models can be developed and run against all the relevant features.

Population Health Management and Natural Language Processing

NLP technologies are used to extract structured information from unstructured patient-related documentation. For example, providers can leverage NLP to extract discrete values of left ventricular ejection fraction from an echocardiogram, or a patient’s cancer stage from a pathology report.

While Electronic Heath Records (EHR) have fields for such clinically relevant details, records may include major gaps as data is not consistently entered, which impacts clinical care and outcomes analysis.

Applying NLP enables the capture of information from unstructured patient data in a timely manner and facilitates its use for analytical purposes. Unlike earlier systems, the latest NLP tools such as Linguamatics NLP enable open and flexible development of queries, and are not as reliant on expensive data sets manually annotated by clinicians. Interest in this field is expanding as noted in the recent KLAS report: Natural language processing: Glimpses into the future of unstructured data mining (April 2016).

Broadening the Scope of Population Health Analysis

Population health is about people and the ways in which they are both unique and the same. To get a full picture of an individual's health you need more data than you need to analyze a patient’s current clinical status, as shown below.

To get a full picture of an individual's health you need more data than you need to analyze a patient’s current clinical status

Only 20% of an individual’s health status is associated with their clinical care; other major factors that contribute to their status include health behaviors, social and economic factors and physical environment. Lifestyle choices such as tobacco, alcohol and drug use can all be extracted from unstructured text using NLP, as can sexual activity, diet and exercise.

360 degree view of patients from structured and unstructured information

Unlocking Insights within Electronic Health Records

Third-party organizations can provide reporting on the social and economic factors impacting population health but providers and payers are also using NLP to mine these details from their own internal data.

NLP can filter more, relevant information, such as a patient’s environmental, housing and mobility status. NLP is able to unlock critical details from unstructured text. So, it is a powerful tool for organizations as they manage the health of their patient populations.

Case Study: Identifying Drug and Lifestyle Conflicts

NLP's ability to analyze unstructured data enables both payers and providers to build a more complete picture of each patient.

For example, using traditional (structured) claims data, an individual patient might be categorized using only the following information:

patient with missing information

  • Age: 74
  • Gender: Male
  • Suffered a heart attack and pacemaker fitted
  • Hospitalized with DVT
  • Plavix

Information in the claims data shows that the patient has been prescribed DVT Plavix, a blood thinning agent. But, it does not list any aspects of the patient's health or lifestyle that may conflict with this prescription.

Using NLP it's possible to analyze unstructured text in the patient's clinical notes and patient reported information. This data yields three items of information:

Patient full picture using unstructured and structured data

  • the patient's use of fish oil supplements
  • red wine consumption
  • wife recently deceased

The first two items are relevant in the prescription of a blood-thinning drug like DVT Plavix, the last is of major concern and would flag the person as potentially needing more support. These insights and others shown below are extracted using NLP and provide a deeper understanding of the person to improve their care.

From his Clinical Notes:

  • Ejection fraction: 50
  • BMI: 22
  • A1C: 6
  • No shortness of breath
  • Takes fish oil supplements

Social and Lifestyle Data:

  • Non-smoker
  • Red wine drinker
  • Wife recently deceased
  • Lives with sister-in-law
Case Study: Analyzing the Risk of Type 2 Diabetes in a Patient Population

Consider an Accountable Care Organization (ACO) that wants to assess the risk of type 2 diabetes in its patient population. An analysis of structured data can reveal risk factors associated with weight, race and age, but might miss risk factors that are noted in physicians’ narratives.

Using NLP, the ACO could identify the prevalence of other known risk factors, such as limited access to healthy foods, barriers to physical activity, high stress levels and social isolation.

Case Study: Population-level Cohort Selection

How can NLP help population-level cohort selection? A good example is the CMS code for lung cancer screening, which targets 55–77 year-olds who are current or past smokers, have no lung cancer diagnosis and have more than 30 pack-years of smoking.

While some of these details may be captured in modern EHRs, certain critical risk factors such as smoking pack years are typically most accurate in unstructured text. NLP can extract these factors to derive a much deeper understanding of clinical risk. People who meet the criteria are invited for CT screening to identify early signs of lung cancer

Case Study: Supporting Value-Based Care at Atrius Health

ACOs need access to clinical data to meet reporting requirements and facilitate quality care initiatives. Critical patient information is often stored in narrative form in Electronic Health Records (EHR). Like many healthcare organizations, Atrius Health had difficulty obtaining certain information for quality metric reporting, accurate clinical documentation, and safety-net initiatives.

Using Linguamatics NLP, Atrius Health created queries to extract clinical data from free-text fields within clinician progress notes and clinical reports. For example, Atrius Health now queries unstructured echo reports to analyze cardiac function and identify high-risk heart failure patients.


Advantages of Linguamatics NLP in Population Health

Data Discovery and Exploration across Populations

Linguamatics NLP can translate unstructured text into discrete data fields by identifying the key concepts and their relationships in healthcare documentation. It can identify disease severity concepts such as TNM cancer stage, patient ambulatory status, and ejection fraction using NLP. I2E then provides this data as structured fields.

Linguamatics NLP can analyze millions of patients together and characterize how concepts are represented in patient documentation. This reduces reliance on manual chart review and allows algorithms to be tailored to new data sets.


Feature Extraction for Risk Stratification and Predictive Models

Risk stratification can be biased toward structured data due to accessibility issues. Interest in long-term patient/member wellness is increasing in importance. Harnessing the insights trapped in unstructured data will become the differentiator in a changing and competitive market.

The providers and payers who are able to characterize patient/member groups at a more detailed level will have the advantage of population insight over those who struggle to do so.


Linguamatics NLP a Highly Configurable NLP Solution

The NLP solution offers a spectrum of capabilities that customers can apply to extract insights from the patient/member related data. I2E users can:

  • Develop new algorithms
  • Change existing algorithms
  • Create and add ontologies
  • Incorporate new data sources.


Linguamatics NLP unlocks data to improve patient safety, quality, and reporting

Despite the U.S. health system having made progress in recent years, patient safety remains a challenge that healthcare organizations must prioritize. Additionally, adverse medication events cause more and more injuries and deaths each year, with a substantial cost for HCOs.

Under pressure to find a solution and improve quality measures, HCOs are now turning to Augmented Intelligence (AI) technologies, and especially NLP, to make more sense of data and use its full potential. NLP workflows can help reduce the likelihood of human error and improve patient safety. Findings are then transformed into structured data to simplify chart review and speed the identification of high-risk patients.