When people think about real-world evidence, they generally think about using these data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real-world data” can address. If you think of real-world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens up:

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms Real-World Data to Real-World Evidence

Many of these real-world sources have unstructured text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, we have customers who are using text analytics to get actionable insight from real-world data – and finding valuable intelligence that can inform commercial business strategies.

In this blog, we will be looking at two different Linguamatics customer use cases, where text mining has been used to transform real-world data to real-world evidence.

Ensuring patient safety is the highest priority for drug companies and prescribers – and obviously for patients themselves – so any steps that can give scientists and clinicians more accurate, well rounded descriptions of safety data should be welcomed by all parties. AstraZeneca (AZ) wanted test the hypothesis that adverse reaction (AR) information from patients could effectively supplement information from clinical trials, and a key challenge was assembling comparable data sets. AZ studied the commonly reported adverse reaction “nausea”: it is associated with many drugs, and there is a wealth of documented information – albeit in a variety of formats. It is also often debilitating, so anything to reduce its occurrence would be of value to patients.

Patient-reported Real-World Evidence

AZ worked with the patient-generated health data in the PatientsLikeMe system and looked for records reporting nausea as an adverse reaction. Because the PatientsLikeMe system is very well structured, it was relatively simple to extract a clean nausea AR data set that was amenable to comparison. 

Clinical Trial Events

Adverse reactions observed in clinical trials are included on drug labels and the data is then listed in the online DailyMed repository maintained by the National Library of Medicine. FDA only offers guidance on how to submit the data, so the content and formats are highly variable, and this complicated creating a well-structured data set to compare with the PatientsLikeMe real-world data.

Tracking and reporting adverse events

In recent years, regulatory authorities such as the FDA and EMA have placed an increased emphasis on drug safety of marketed products, particularly the tracking and reporting of adverse events. Pharmaceutical companies are expected to regularly screen the worldwide scientific literature for potential adverse drug reactions, at least every two weeks. The use of text mining and other tools to streamline the literature review process for pharmacovigilance is more crucial than ever in order to ensure patient safety, without overloading drug safety teams.

Manual review of adverse events is time-consuming

Eric Lewis (Safety Development Leader at GlaxoSmithKline) talked at the Linguamatics Text Mining Summit about the challenges of reviewing medical literature for safety signals. For example, he looked for literature for a sample of just 20 marketed products across a 300-day period. Eric found that there were on average 60 new references per day (with a total of over 11,000 documents). He found that manual review time was 1.2 to 1.6 minutes per abstract. He extrapolated this to a typical pharma company product portfolio of 200 marketed products, and showed that this volume of literature would take over 2,200 hours to review – hugely time-consuming.

Understanding drug-drug interactions can improve drug safety

A considerable proportion of adverse drug events are caused by interactions between drugs. With an ageing population, and associated increasing multiplicity of age-related illnesses, there is an increase in the potential for increased risk of drug-drug interactions (DDIs). One way of alleviating some DDIs is by ensuring that potentially interacting drugs are taken at suitable time intervals apart. But, what is the best interval to recommend?

In a recent seminar, Keith Burkhardt of the FDA described a project using text mining to survey the landscape of information on DDIs from FDA Drug Labels. And, in particular, the FDA review division wanted to find labelling for drugs where the time separation was stated, in order to prevent potential drug safety events.

Mining Data from FDA Drug Labels: dosing regimens and time separation

The drug classes of interest included bile acid sequestrants and exchange resins (such as cholestyramine, colestipol, colesevelam, all LDL cholesterol lowering drugs), phosphate binders (e.g. sevelamer; used for patients with chronic kidney failure), and chelators (used to treat excessively high levels of lead, iron or copper in the blood; e.g. deferasirox, deferiprone). These drug classes can all alter the bioavailability of other drugs, particularly for those with a narrow therapeutic range such as warfarin or antiepileptic drugs.

In this world of ever-increasing volume and variety of textual data, there is a growing variety of tools and technologies to handle and get value from these data.  We hear about a potentially bewildering barrage of AI technologies including Natural Language Processing (NLP), Machine learning, and other textual data science applications. A recent blog I read highlighted this, with a Venn covering over a dozen different disciplines (see figure below). These techniques all bring benefits, but often we just need straightforward simple access to our unstructured text data.

Empower a wide variety of users to find relevant data with high recall and precision

Linguamatics I2E brings a combination of powerful text mining tools to many pharma, biotech and healthcare users. We recognize that users’ -demands vary, and so we have created I2E Web Portals. I2E Web portals aim to engage users that want rapid easy access to scientific knowledge from both public domain knowledgebase (e.g. MEDLINE, ClinicalTrials.gov) and internal data silos, ranging from regulatory dossiers, preclinical safety data, patient/customer call transcripts, and many more.