Real world evidence provides significant insight into how a drug or drug class performs or is used in real world medical settings. Real world evidence (RWE) and real world data (RWD) can inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings. The ability to quickly transform real world data sources (e.g. EHRs, or patient-reported outcome data from forums, social media) into evidence can improve health outcomes for patients by helping pharmaceutical companies be more efficient in drug development and smarter in commercialization.

Voice of the customer call feeds: a valuable source of real world data 

One source of patient reported outcomes available to pharma companies are the feeds that come into the 1-800 call centers – calls from patients, carers, healthcare professionals or pharmacists, asking questions covering many different issues, such as:


There surely can’t be anyone in the pharma industry who hasn’t heard the story of thalidomide. The disaster that followed the release onto the market of thalidomide in 1959 triggered a wave of regulatory changes to ensure reliable evidence of drug safety, efficacy and chemical purity, before a new drug is released onto the market. 

While failure of clinical efficacy is the major cause of drug attrition, a poor safety profile is also a major factor in failure of drugs in development, at all stages from initial lead candidate through preclinical and clinical development to post-marketing surveillance. In order to ensure the safety of drugs on the market, rigorous testing is carried out throughout the pipeline, and can be categorised into preclinical safety/toxicology in animal models, clinical safety in human subjects, and then post-market pharmacovigilance, to look for safety signals across a wide patient population (see schematic below).

At every stage, critical data is being both generated and sought from unstructured text – from internal safety report, scientific literature, individual case safety reports, clinical investigator brochures, patient forum, social media, conference abstracts. Intelligent search across these hundreds of thousands of pages can provide the information for key decision support. Many of our customers are using the power of Linguamatics I2E’s Natural Language Processing (NLP) solution to transform the unstructured text into actionable structured data that can be rapidly visualized and analyzed, at every stage through the safety lifecycle of a drug.


Our latest version of I2E includes improvements that make it easier to integrate the tool into your organization and process your internal documents, as well as the usual usability enhancements and under-the-hood modifications.

I2E 5.3.1 supports Single Sign-On (SSO) by connecting with Federated Authenticated Systems such as ADFS and Shibboleth. This means that you can be authenticated in one system and then seamlessly log into the I2E client without prompting for your credentials. If you are not already logged in via another system, I2E will initiate the login process via a redirect to a special web page.

We’ve improved the hit highlighting in our Excel results format (figure 1): terms from your search use the same colors for each column in your results and the colors are consistent across the I2E Query Editor, HTML results and highlighted cache documents.

Excel results show color-coded terms in the Hit column

Figure 1. Excel results show color-coded terms in the Hit column

A good example of recent continual improvement in I2E is the Class Chooser. In recent releases, we have increased search speed, added as-you-type class suggestions and, in I2E 5.3.1, we’ve added additional information for each class match to show which Ontology the term is from (figure 2). This helps to quickly review your results to get to your correct match(es), particularly when you’re using a term that could occur in different ontologies.


As medicinal chemists strive to fill the pipeline with the best possible novel compounds, they require efficient access to the ever-expanding mass of existing information and knowledge about compounds, targets, and diseases and how they are related. Much of this information is buried in published journal articles, patents, reports, and internal document repositories. Posing chemical compound-, target-, and disease-centered questions to extract and organize the data in order to explore these relationships is laborious, time consuming, and potentially error prone. Locating chemical structural information is especially challenging, when chemicals in the literature are described by many different names: technical, trivial, proprietary, nonproprietary, generic, or trade names.

Roche pRED decided to address this problem and equip their medicinal chemists with a chemically-aware text mining tool (Artemis) that would remove the need for manual searches and data-wrangling, and present the data in a user- and analytics-friendly environment for further exploration. Daniel Stoffler and Raul Rodriguez-Esteban, Roche, presented this work in their talk "ARTEMIS - A Text Mining Tool for Chemists" at Linguamatics Spring Text Mining Conference in 2017.


Linguamatics is pleased to congratulate US healthcare system Mercy on their recent award win at the 12th Gateway to Innovation conference. Mercy won the Innovative IT Project of the Year Award for using Linguamatics I2E Natural Language Processing (NLP) solution to extract clinical analytics insights from their Electronic Health Records (EHR) notes for cardiac patients.

Mercy Technical Services provides contract research services for medical device and pharmaceutical clients to support use of real world evidence (RWE) in Food and Drug Administration submissions. This award recognizes a project that demonstrates value or impact to the organization by solving a business problem or by addressing a specific strategic objective for the company.

NLP used to Extract Real World Evidence from EHRs

As a large health system with a mature and consolidated Epic EHR system, Mercy has a significant data set of patient treatments and outcomes. There is a multitude of information documented in the EHR, such as lists of specific symptoms, diagnoses derived from echocardiogram reports, and certain benchmarking classifications. Since typically 80% of this information is unstructured text, many valuable clinical insights are unavailable in discrete fields, and therefore vital patient information can be trapped when making clinical decisions.

NLP text mining platforms like Linguamatics I2E extract information from unstructured text-based EHRs and transform it into actionable insights that can be placed into a dataset and analyzed.