Real world evidence provides significant insight into how a drug or drug class performs or is used in real world medical settings. Real world evidence (RWE) and real world data (RWD) can inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings.

The ability to quickly transform real world data sources (e.g. EHRs, or patient-reported outcome data from forums, social media) into evidence can improve health outcomes for patients by helping pharmaceutical companies be more efficient in drug development and smarter in commercialization.

Voice of the customer call feeds: a valuable source of real world data 

One source of patient reported outcomes available to pharma companies are the feeds that come into the 1-800 call centers – calls from patients, carers, healthcare professionals or pharmacists, asking questions covering many different issues, such as:


Learning more about drugs understanding in the market

How can pharma product managers efficiently learn how their drugs are faring with patients in the market?

Product managers and teams in pharmaceutical companies need to know what patients and healthcare professionals are reporting and asking about their drugs as they are used in the market, in order to discern trends and patterns and respond appropriately. Real world data (RWD) on drug usage and patient behaviours is available in multiple formats from myriad sources, but mining these disparate structured and unstructured sources with traditional manual search and curation is time-consuming and inefficient.

Novo Nordisk wanted to accelerate, automate and scale this process to provide enhanced access to the extracted information for superior and actionable insights.

Natural Language Processing-based Text Mining at Novo Nordisk

Novo Nordisk was already using the Linguamatics NLP platform in-house on multiple individual text mining projects with good success (e.g. reducing a publication gap analysis from three-to-four people for six weeks to a few hours). They wanted to capitalise on this success for real world data about their diabetes therapeutic products, from medical affairs team, healthcare professionals, and patients.


Linguamatics NLP platform enables rapid adverse event understanding from clinical trials

Identifying serious adverse events (SAEs) during clinical trials is a critical part of patient monitoring, and Agios wanted to enable a more rapid response to SAEs. These forms can be in image or PDF format, and manual extraction of the key patient data is slow and error-prone. Agios developed a workflow to process the Serious Adverse Event (SAE) report forms, using Linguamatics NLP platform to extract all relevant patient data. The workflow steps included:

  • OCR of the image SAE reports to render the data accessible
  • Indexing all documents with ontologies such as MeSH, MedDRA, WHO Drugs to normalize and code the data attributes
  • Using Linguamatics NLP platform queries to extract study drug, concomitant medications, adverse events, date of onset, lab test results and other key patient attributes 
  • Loading the data into a clinical safety database for rapid access

Identification of at-risk patients with network visualisations

A specific clinical example explored the risk of a rare (potentially life-threatening) adverse event, Differentiation Syndrome (DS) in patients on a clinical trial of Agios’s IDH1-inhibitor AG120. DS is a complication of first-line chemotherapy in some Acute promyelocytic leukemia (APL) patients, which can be fatal if not recognized on time and treated aggressively.


Four million people die from diabetes annually. Novo Nordisk, a global healthcare company, has a mission to change that. Although it has a presence in 170 countries, is already helping 28 million patients, and supplies half of the world’s insulin, the company still faces an enormous challenge: novel drug approaches are needed, and drug development is a long, expensive process. The GLIA (Global Information & Analysis) team at Novo Nordisk aim to help by providing the best information possible to researchers and product teams.  

Using natural language processing (NLP) to extract information from real world data sources

The answers Novo Nordisk need are buried within a myriad of sources of unstructured real world data. These data sources include research papers, news reports, market information, patient use information, and more.

“Finding accurate information in an ever-growing ocean of information is becoming more important than ever,” explains Novo Nordisk senior information scientist Solmaz Gabery Adams.

Extensive research informs every step on the long path to delivering healthcare, from identifying needs and undertaking drug discovery to clinical trials and regulatory review before bringing new treatments to market. At every stage, Novo Nordisk researchers and managers must make crucial decisions, including which projects to advance and which projects to leave behind.


Drug safety is, of course, one of the central concerns of any drug development project. Right from the start, project teams want to know whether the target they are interested in has any links to adverse events. Or, when they get to lead series or lead compound, is there any evidence that similar compounds or compound classes have been shown to have side effects. If unexpected adverse events occur in clinical trials, again, project teams turn to literature and other sources to see if they can unearth a reason, mechanism, other evidence for this effect. And of course, post-market, pharmaceutical companies must regularly screen the worldwide scientific literature for potential adverse drug reactions, at least every two weeks.

Finding useful information from public sources can be daunting. There are so many different names for any particular gene target, or compound, or disease process, adverse event, side effect. Comprehensive search means using strings and strings of key words. And of course, what is really needed is evidence that a compound is causing an effect, not treating the disease. So, again, key word search doesn’t work well. And then, there are so many different data sources to search.