Posts from April 2018

You may have heard that big data in healthcare is being used to cure diseases, improve quality of life, predict epidemics and so on. But how much of an impact is this having on society today?

The complexity of human health means that there is a lot of information that radiologists and disease specialists inherently best capture in the patient narrative and other clinical documentation. Up to 80% of patient information is made up of unstructured data. Naturally, many clinicians want to concentrate on their job: telling the story of the patient and how to treat them most effectively rather than spending 50% of their time entering structured information in check boxes and drop downs. Therefore, there's a desire to start using Natural Language Processing (NLP) systematically so that clinicians put more work into patient care and less into clinical documentation. Here at Linguamatics we help healthcare organizations look at how this mass of unstructured data can help identify high-risk patients and reduce the time spent on documentation.

An example of what healthcare providers are looking at the population level for individuals that we know have food insecurity or social isolation issues. These social determinants of health help identify if a patient isn’t eating properly or can't get to an appointment their likelihood of having a good outcome is severely reduced.


When people think about real-world evidence, they generally think about using these data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real-world data” can address. If you think of real-world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens up:

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms Real-World Data to Real-World Evidence

Many of these real-world sources have unstructured text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, we have customers who are using text analytics to get actionable insight from real-world data – and finding valuable intelligence that can inform commercial business strategies.

In this blog, we will be looking at two different Linguamatics customer use cases, where text mining has been used to transform real-world data to real-world evidence.


AbbVie, Bayer, Merck KgaA, Mundipharma, and Novo Nordisk to share text mining insights at Cambridge, UK meeting

Cambridge, England and Boston, USA — April, 17, 2018 — Linguamatics, the leading natural language processing (NLP) text analytics provider, today announced its Spring Text Mining Conference 2018 will feature presentations from several top-tier biomedical organizations. The conference, taking place April 23 to April 25 in Cambridge, England, will highlight the wide range of ways that organizations are leveraging I2E, Linguamatics’ powerful NLP-based AI technology, to extract actionable insights from the huge amount of unstructured data available in healthcare and the life sciences.

In addition to presentations from AbbVie, Bayer, Merck KgaA, Mundipharma, and Novo Nordisk, the conference will offer hands-on training for users, opportunities for exchanging ideas and networking, sessions on industry trends and best practices, and demonstrations of the latest Linguamatics technology updates.

“We are seeing ever broader use of NLP for research, intellectual property and real-world evidence,” said David Milward, chief technology officer for Linguamatics. “We are looking forward to learning more about the innovative ways our customers are taking advantage of our text mining technology, and to share details on the latest enhancements to our technology stack, including NLP and machine learning updates.”


Ensuring patient safety is the highest priority for drug companies and prescribers – and obviously for patients themselves – so any steps that can give scientists and clinicians more accurate, well rounded descriptions of safety data should be welcomed by all parties. AstraZeneca (AZ) wanted test the hypothesis that adverse reaction (AR) information from patients could effectively supplement information from clinical trials, and a key challenge was assembling comparable data sets. AZ studied the commonly reported adverse reaction “nausea”: it is associated with many drugs, and there is a wealth of documented information – albeit in a variety of formats. It is also often debilitating, so anything to reduce its occurrence would be of value to patients.

Patient-reported Real-World Evidence

AZ worked with the patient-generated health data in the PatientsLikeMe system and looked for records reporting nausea as an adverse reaction. Because the PatientsLikeMe system is very well structured, it was relatively simple to extract a clean nausea AR data set that was amenable to comparison. 

Clinical Trial Events

Adverse reactions observed in clinical trials are included on drug labels and the data is then listed in the online DailyMed repository maintained by the National Library of Medicine. FDA only offers guidance on how to submit the data, so the content and formats are highly variable, and this complicated creating a well-structured data set to compare with the PatientsLikeMe real-world data.


A valuable part of a clinician’s training includes the effective identification and careful documentation of all the elements impacting a patient's well-being. Thorough documentation is essential to ensure accurate and timely clinical care. Although electronic health records (EHRs) hold many great opportunities to capture essential details in electronic form, patients could be at risk if all elements of their medical records are not compiled and analyzed. With 990.8 million reported visits to physician offices in 2015 [1], odds are that precious information could slip off the radar of even the most dutiful clinical staff. 

The importance of Nature AND Nurture in healthcare

For this reason, providers must adopt a successful population management strategy that considers all elements of a patient’s record, including the estimated 70% of the record that exists as unstructured notes. Structured data is excellent for documenting patient information that ensures a hospital runs effectively but not as efficient for capturing imperative clinical concerns during a patient’s 20-minute encounter with a physician. 

Although the concept of nature vs. nurture has been well-documented for centuries, providers are just now realizing the critical importance of social determinants:

  • Does a patient live alone?
  • Do they utilize a walking cane?
  • Are they on a fixed-income? 

Identifying before protecting: Using I2E to help vulnerable populations

Undoubtedly, EHRs contain a wealth of information to identify patients requiring special attention, such as those with: