Skip to main content

Real World Data Analytics

Many sources of real-world data (RWD) contain large amounts of unstructured text. Linguamatics NLP platform extracts the key facts from structured and unstructured data, transforming real world data (RWD) into insights for decision making.

  1. Watch the video
  2. Background
  3. Traditional research settings vs Real World
  4. Real-World Data Challenge
  5. Natural Language Processing-based Text Mining Solution
  6. Use Cases for Transforming RWD into Insights

Transform Real World Data into Insights

Real world evidence (RWE) and Real World Data (RWD) can inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings. Many sources of real-world data contain large amounts of unstructured text (e.g. EHRs; patient-reported outcomes such as forums, social media).

Linguamatics Natural Language Processing (NLP) extracts the key facts, using relevant ontologies and focused queries, transforming real world data into actionable intelligence for decision making.

Watch the video




Real world data (RWD) includes information about patient health, patient reported outcomes, and the impact of drugs and therapeutics outside clinical trials. RWD is derived from multiple sources (see Figure 1), including electronic health records (EHR), insurance claims and billing data, adverse events reports, field-based medical affairs notes,  data gathered through wearable devices or health applications. Voice of the customer (VoC) data further expands on these sources with insights from patient forums, social media, customer call transcripts, survey answers etc.

See how Novo Nordisk uses NLP to generate actionable insights from RWD:



Although sometimes used interchangeably, real world data (RWD) and real world evidence (RWE) are not the same. Data is factual information, such as numbers and statistics, while evidence is data that has been analyzed and found to be relevant. RWD therefore furnishes proof that supports a conclusion. Hence, RWD is the information that supports RWE which in turn is used to support future decisions in healthcare and life sciences.

See how RealHealthData uses NLP to understand treatment outcomes from RWD:


The application of real world data (outside of clinical trials) is not a new practice in healthcare. However, in recent years, the use of computers, mobile devices, wearables and other biosensors to gather and store health-related data, both structured and unstructured, has grown rapidly. Combined with analytical capabilities such as NLP, this data has the potential to improve the way patients are treated, clinical trials are conducted, drugs are developed, and, in doing so, provide answers to questions previously thought beyond reach.

See how Bristol-Myers Squibb uses NLP to mine EMRs for patient stratification of heart failure risk:


Traditional research settings vs Real World

Clinical trials are designed to provide an essential element of the premarket evaluation of a medical product. Trials are traditionally conducted with specific populations and in specialized environments which differ from the realities of clinical or home settings. These specialized environments help control variability and ensure the quality of the data. Measures, such as eligibility criteria and the use of specialized research personnel to ensure patient adherence to the study, support the internal validity of the results attained but are often achieved at the expense of uncertainty about generalizability.

See how AstraZeneca compares RWE to clinical trial events:


Once any particular drug or therapy is approved and in use across broader patient groups, real world evidence (RWE) can fill the knowledge gaps about effects on, and responses by, a much wider patient population and can be leveraged in many ways:

  • to support clinical trial design
  • to monitor postmarket safety and adverse events
  • to enable comparative effectiveness research
  • to investigate how factors such as clinical setting and provider and health-system characteristics influence treatment effects and outcomes.

Importantly, the use of such evidence has the potential to allow researchers, healthcare practitioners and regulatory agencies to answer these questions more efficiently, saving time and money.

See how Kaiser Permanente uses NLP to reduce hospital readmission rate:


Acknowledgment of the importance of RWE by regulatory agencies is exemplified by the 21st Century Cures Act, signed into United States’ law in 2016. The act has been designed to help accelerate medical product development utilizing real world evidence and bring new innovations and advances to patients who need them faster and more efficiently. The law builds on FDA's ongoing work to incorporate the perspectives of patients into the development of drugs, biological products, and devices in FDA's decision-making process. Such work includes the Sentinel System, which officially launched in February 2016. The sentinel system has developed the largest multisite distributed database in the world dedicated to medical product safety and the National Evaluation System for health Technology, which will generate evidence across the total product lifecycle of medical devices.

The Real-World Data Challenge

Interest in real world data (RWD) and its potential continues to grow, however there are many challenges in creating value from it. These include:

  • Data structure (e.g. complex grammar for twitter or customer calls, non-scientific vocabularies)
  • Data extraction and standardization to aid analysis and interpretation; mapping to standards (e.g. medical codes, vocabularies, formats)
  • Data quality (e.g. missing data, coding errors, noisy data)
  • Integration of structured and unstructured fields to get the full picture from your data
  • Balancing the usage of RWD to advance research while protecting the privacy of patient whose data is collected

Addressing these issues has proven challenging and it is estimated that a big pharmaceutical company spends nearly USD 20 million annually for generating RWE-based insights (ReportLinker, 2018: “Pharmaceutical and Life Sciences Real World Evidence: Market Landscape and Competitive Insights, 2018-2030”)

See how NLP enhances commercial engagement and sales productivity in Pharma:


The Natural Language Processing-based Text Mining Solution

Linguamatics NLP text mining platform addresses these challenges to get value out of RWD by extracting key facts from these unstructured data sources. The platform uses relevant ontologies and focused queries to transform real world data into real world evidence, allowing you to get actionable intelligence for decision making.

Organizations can use our platform to extract information on:

  • Treatment patterns e.g. drug switching, adherence, discontinuation;
  • Numbers such as lab values, dosage information;
  • Patient information such as history of disease, problem list, demographics, social factors and lifestyle.

The agile iterative nature of NLP query development means that business rules can be encoded to suit the particular data set, whether you’re looking at sentiments from tweets or treatment pattern choices and resulting outcomes from EHRs.

Learn more about Real World Data for commercial pharmaceutical product insights in our blog, or download our application note at the bottom of this page.

Use Cases for Transforming RWD into Insights

Using NLP at Novo Nordisk to generate actionable insights from RWD

Novo Nordisk wanted to identify healthcare market trends and detect patterns in clinical trial protocol deviations, patient sentiment, compliance, routines, behaviors, and treatment satisfaction and outcomes, from disparate RWD sources such as voice of the customer (call center) feeds and information from medical liaisons and healthcare providers. Novo Nordisk built on previous successes in individual Linguamatics NLP text mining projects to create an automated Linguamatics NLP workflow for real world data. With the new system, they have reduced manual work by FTEs, cut out external vendor manual work and spend, automated the process of generating insights, and significantly broadened access to these insights across a global team.




Patient Insights from Social Media at Roche

Mathias Leddin, Senior Data Scientist, pRED Informatics at Roche gave a fascinating talk on the use of Linguamatics NLP to address patient-centered drug development (a relatively new FDA initiative). The focus was to discover if patient blogs and forum (such as PatientsLikeMe) can provide a good substrate to develop clinical endpoints that are relevant to patients. Being able to understand what matters most to patients and find unexpected insights into patients’ problems could (and should) influence clinical trials, e.g. design and outcome measures. Using “highly trusted” social media sources (i.e. patient-focused communities rather than more diverse Twitter or Facebook posts) gave a more robust substrate to analyse. It is still a noisy process; from 24k verbatims downloaded, they gleaned valuable data from ~450 posts. Ensuring that privacy issues were addressed, they were able to categorize the comments into symptom or impact categories (e.g. voice change: "…voice was getting softer."; " …slight raspy sensation…"; "…crack in her voice …"). Mathias described finding symptoms confirmatory of the clinical trial endpoints, but also new ones; and these specific recommendations have been taken forwards.

Read a related article where NLP was used to clarify information found on social media:


BMS uses NLP to mine EMRs for patient stratification of heart failure risk

Bristol-Myers Squibb wanted to understand more about patient stratification for heart failure risk. BMS researchers believed that if they could acquire a deeper understanding of the clinical characteristics of these patients, they could potentially understand how best to treat different patients or populations.


Comparing Real World Evidence to clinical trial events at AstraZeneca

Text mining allows organizations to extract unstructured data to inform key decisions and speed up the drug discovery pipeline. In this use case, AstraZeneca show how, through their collaboration with PatientsLikeMe (PLM), they were able to examine differences in Nausea adverse reactions (AR) frequencies between patient on-line self-reported data, and data from FDA Drug Product Labels extracted using Linguamatics NLP platform.




Kaiser Permanente uses NLP to reduce hospital readmission rate:

Kaiser Permanente and Linguamatics are working on developing a new model to tackle the excessive readmission rate issue and its financial penalties in hospitals. This model employs data from a comprehensive electronic medical record (EMR) and which could be instantiated in real-time. 


Text Mining for Competitive Intelligence at Novo Nordisk:

Novo Nordisk have built a successful data and technology ecosystem to uncover competitive insights using text mining and natural language processing of news data using Linguamatics technology and Dow Jones DNA.



Understanding Treatment Outcomes using NLP on Unstructured Physician Narratives:

In this 40-minute webinar, RealHealthData and Linguamatics discuss the challenges of working with real world data (in this case, medical transcripts), and show the power of Linguamatics NLP platform to extract relevant data to answer critical questions around treatment outcomes for prostate cancer.


Transform your Voice of the Customer Data Using NLP

RWE provides significant insight into how a drug or drug class performs or is used in real world medical settings. These data sources can inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings.


NLP enhances commercial engagement and sales productivity in Pharma

Do you think Artificial Intelligence (AI) can enhance commercial engagement and sales productivity in Pharma? After discussing with colleagues, Jane Reed, Director Life Science at Linguamatics, realised that our customers use the power of NLP more and more, to unlock outcomes from real world data.


Ready to get started?

Request a Demo

Questions? Ask our experts