IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.

The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.

Linguamatics NLP extracts IDMP data elements

Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.

I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert. Find out more about I2E's Extract Transform Load (ETL) solutions here.

Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.


In the world of healthcare, quality measurement data collection and reporting is far from perfect. Patients move around, and transferring their health records from one organization to another creates a mass of documents that includes a large amount of unstructured data—around 80% of the medical record. Ignoring these unstructured notes often leads to reduced performance on quality measures.

HEDIS® (Healthcare Effectiveness Data Information Set) was “born” in 1991, making it a millennial in demographic terms. It was created to enable the health of patient populations to be assessed consistently, and has matured into a reliable means of comparing health plans and providers. Several HEDIS® measures combine structured and unstructured data, and Linguamatics Health, a natural language processing (NLP) platform powered by I2E, can help gather insights from unstructured text and move your HEDIS® scores “out of the basement,“ as every millennial’s parent aspires to do.

Read the full 'Linguamatics Health for quality measures' application note to find out more about how powerful NLP solutions such as Linguamatics Health enable quality measures to be extracted automatically from clinical documentation, streamlining the collection of data.

Read the application note 


Interest in artificial intelligence (AI), particularly natural language processing (NLP) and machine learning, has grown significantly within the healthcare community in recent years, as vendors, researchers, and providers look for ways to transform medical research and care through technology.

How do these techniques work? Machine learning can help to solve complex issues by analyzing existing data from sources such as electronic health records (EHRs), but often the data contained within EHRs is “trapped” in unstructured medical notes. NLP can interpret structure and meaning in this unstructured text, and make critical information accessible to machine learning applications.

Read the full Health IT Outcomes article to find out more about how the combination of NLP and machine learning can deliver a powerful solution for advancements in the understanding and delivery of care.

Read the full article

About Simon Beaulah:

Simon Beaulah is Linguamatics’ senior director of healthcare and is responsible for the company’s healthcare products and solutions, including applications for clinical risk models, population health, and medical research.


NLP text mining transforms medical transcripts data into insights for better patient care

Santa Cruz, CA, Cambridge, UK & Boston, USA – August 3rd, 2017 – Natural Language Processing (NLP) text analytics provider Linguamatics, and RealHealthData, a narrative medical records database provider, today announced their strategic partnership to combine Linguamatics’ advanced NLP technology with RealHealthData’s extensive database of detailed provider narratives, to improve the understanding of drug use, adverse events, and product switching in Real World settings.

Understanding the real world (i.e., outside of clinical trials) impact of therapies on patients is critical for pharmaceutical and biotech companies. Medical records are one of the key sources of real world data, and provide evidence that can inform all phases of drug development. RealHealthData provides access to patient narratives from all 50 US states and every medical specialty. The data can be used for all phases of drug development and post marketing research. Linguamatics I2E can be used to extract the key facts from these narratives using relevant ontologies and queries, transforming real world data into actionable intelligence for better decision making.

“Deploying Linguamatics I2E Advanced NLP engine to the RealHealthData database of detailed provider narratives is a natural fit,” said Manuel Prado, CEO of RealHealthData. “Current and future customers can now access the unique and valuable insights in the database using a first-in-class, healthcare-specific Natural Language Processing platform.”


I2E makes natural language processing-based text mining intuitive and interactive

SANTA CLARA, Calif. — July 20, 2017 — Based on its recent analysis of the Big Data text analytics market for the healthcare industry, Frost & Sullivan recognizes Linguamatics with the 2017 Global Frost & Sullivan Award for Enabling Technology Leadership. Linguamatics stands out in the natural language processing (NLP) market for its technology expertise and commitment to delivering exceptional value to clients in the US healthcare industry. The highly flexible and scalable Linguamatics Health platform, powered by I2E, is helping healthcare providers and payers to transition to value-based care.

Within the last year, Linguamatics introduced its fifth iteration of I2E, which includes cutting-edge capabilities such as the normalization of concepts and relationships for quick and comprehensive data retrieval regardless of format; advanced range research; and an extraction and search query language (EASL). The EASL can be generated external to the platform to support custom interfaces, queries in a human-readable format, and superior workflow automation.