Pharma and healthcare companies are data rich organizations; however a large proportion of this data isn't as accessible as desired, because the information is locked in unstructured text. Linguamatics I2E is used by our customers across drug discovery, development and delivery of therapeutics to tackle this problem, and our biannual user conferences are always a great way to hear updates on applications of text mining and best practise to help solve these data access challenges.

Last month, we had speakers presenting on where they are getting value from I2E, from bench to bedside. Attendees from the pharmaceutical industry, biotech, healthcare, academia, and partner vendor companies came to our Spring Text Mining Conference, for training sessions, networking, discussions, and of course, excellent presentations and talks.

Structure Activity Relationships from Patents, for Medicinal Chemistry

Starting in early discovery, Ortrud Steinfuehr (Information Manager Bayer AG) presented on “An Approach to provide SAR Data from Patents”. Structure Activity Relationships (SAR) provide information about how the 3D structure of a chemical compound impacts biological activity, such as effective dose or inhibitory concentration values for specific targets. SAR is very valuable for medicinal chemistry research around optimisation and modification of lead compounds. These data are often in patents, typically written in a way to make automated extraction of SAR very tricky.

You may have heard that big data in healthcare is being used to cure diseases, improve quality of life, predict epidemics and so on. But how much of an impact is this having on society today?

The complexity of human health means that there is a lot of information that radiologists and disease specialists inherently best capture in the patient narrative and other clinical documentation. Up to 80% of patient information is made up of unstructured data. Naturally, many clinicians want to concentrate on their job: telling the story of the patient and how to treat them most effectively rather than spending 50% of their time entering structured information in check boxes and drop downs. Therefore, there's a desire to start using Natural Language Processing (NLP) systematically so that clinicians put more work into patient care and less into clinical documentation. Here at Linguamatics we help healthcare organizations look at how this mass of unstructured data can help identify high-risk patients and reduce the time spent on documentation.

An example of what healthcare providers are looking at the population level for individuals that we know have food insecurity or social isolation issues. These social determinants of health help identify if a patient isn’t eating properly or can't get to an appointment their likelihood of having a good outcome is severely reduced.

When people think about real-world evidence, they generally think about using these data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real-world data” can address. If you think of real-world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens up:

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms Real-World Data to Real-World Evidence

Many of these real-world sources have unstructured text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, we have customers who are using text analytics to get actionable insight from real-world data – and finding valuable intelligence that can inform commercial business strategies.

In this blog, we will be looking at two different Linguamatics customer use cases, where text mining has been used to transform real-world data to real-world evidence.

AbbVie, Bayer, Merck KgaA, Mundipharma, and Novo Nordisk to share text mining insights at Cambridge, UK meeting

Cambridge, England and Boston, USA — April, 17, 2018 — Linguamatics, the leading natural language processing (NLP) text analytics provider, today announced its Spring Text Mining Conference 2018 will feature presentations from several top-tier biomedical organizations. The conference, taking place April 23 to April 25 in Cambridge, England, will highlight the wide range of ways that organizations are leveraging I2E, Linguamatics’ powerful NLP-based AI technology, to extract actionable insights from the huge amount of unstructured data available in healthcare and the life sciences.

In addition to presentations from AbbVie, Bayer, Merck KgaA, Mundipharma, and Novo Nordisk, the conference will offer hands-on training for users, opportunities for exchanging ideas and networking, sessions on industry trends and best practices, and demonstrations of the latest Linguamatics technology updates.

“We are seeing ever broader use of NLP for research, intellectual property and real-world evidence,” said David Milward, chief technology officer for Linguamatics. “We are looking forward to learning more about the innovative ways our customers are taking advantage of our text mining technology, and to share details on the latest enhancements to our technology stack, including NLP and machine learning updates.”

Ensuring patient safety is the highest priority for drug companies and prescribers – and obviously for patients themselves – so any steps that can give scientists and clinicians more accurate, well rounded descriptions of safety data should be welcomed by all parties. AstraZeneca (AZ) wanted test the hypothesis that adverse reaction (AR) information from patients could effectively supplement information from clinical trials, and a key challenge was assembling comparable data sets. AZ studied the commonly reported adverse reaction “nausea”: it is associated with many drugs, and there is a wealth of documented information – albeit in a variety of formats. It is also often debilitating, so anything to reduce its occurrence would be of value to patients.

Patient-reported Real-World Evidence

AZ worked with the patient-generated health data in the PatientsLikeMe system and looked for records reporting nausea as an adverse reaction. Because the PatientsLikeMe system is very well structured, it was relatively simple to extract a clean nausea AR data set that was amenable to comparison. 

Clinical Trial Events

Adverse reactions observed in clinical trials are included on drug labels and the data is then listed in the online DailyMed repository maintained by the National Library of Medicine. FDA only offers guidance on how to submit the data, so the content and formats are highly variable, and this complicated creating a well-structured data set to compare with the PatientsLikeMe real-world data.