Posts from August 2017

The 2017 Text Mining Summit (New Castle, New Hampshire, October 2-4) will be your first opportunity to take part in our new I2E Certificate Program.  The Level 1 Query User Certificate will be open to those who have just taken the “Introduction to I2E” hands-on workshops provided at the TMS, as well as more established users, who have taken the “Introduction to I2E” training on previous occasions. See the TMS Workshop Selection Guide for more details. It’s free to join in as part of your TMS registration.

Completing the different levels of the Certificate Program will allow you to validate, extend and improve your I2E skills. The Query User Certificate will focus on using and editing basic queries and Resource queries to:

  • Create simple queries with different constraints, morphological variants, preferred terms and alternative lists

  • Use classes to improve recall and precision of queries with linguistic classes, ontologies, and pattern ontologies

  • Work with results by using limits, output formats and displays

  • Use Resource queries to answer common questions

Those taking the Query User Certificate at the TMS will have access to:

  • In-class instruction

  • Practical, hands-on experience with I2E

  • Open question sessions with I2E Experts

  • A set of learning objectives

  • Learning materials, including

    • Tutorial booklets


IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.

The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.

Linguamatics NLP extracts IDMP data elements

Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.

I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert. Find out more about I2E's Extract Transform Load (ETL) solutions here.

Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.


In the world of healthcare, quality measurement data collection and reporting is far from perfect. Patients move around, and transferring their health records from one organization to another creates a mass of documents that includes a large amount of unstructured data—around 80% of the medical record. Ignoring these unstructured notes often leads to reduced performance on quality measures.

HEDIS® (Healthcare Effectiveness Data Information Set) was “born” in 1991, making it a millennial in demographic terms. It was created to enable the health of patient populations to be assessed consistently, and has matured into a reliable means of comparing health plans and providers. Several HEDIS® measures combine structured and unstructured data, and Linguamatics Health, a natural language processing (NLP) platform powered by I2E, can help gather insights from unstructured text and move your HEDIS® scores “out of the basement,“ as every millennial’s parent aspires to do.

Read the full 'Linguamatics Health for quality measures' application note to find out more about how powerful NLP solutions such as Linguamatics Health enable quality measures to be extracted automatically from clinical documentation, streamlining the collection of data.

Read the application note 


Interest in artificial intelligence (AI), particularly natural language processing (NLP) and machine learning, has grown significantly within the healthcare community in recent years, as vendors, researchers, and providers look for ways to transform medical research and care through technology.

How do these techniques work? Machine learning can help to solve complex issues by analyzing existing data from sources such as electronic health records (EHRs), but often the data contained within EHRs is “trapped” in unstructured medical notes. NLP can interpret structure and meaning in this unstructured text, and make critical information accessible to machine learning applications.

Read the full Health IT Outcomes article to find out more about how the combination of NLP and machine learning can deliver a powerful solution for advancements in the understanding and delivery of care.

Read the full article

About Simon Beaulah:

Simon Beaulah is Linguamatics’ senior director of healthcare and is responsible for the company’s healthcare products and solutions, including applications for clinical risk models, population health, and medical research.


NLP text mining transforms medical transcripts data into insights for better patient care

Santa Cruz, CA, Cambridge, UK & Boston, USA – August 3rd, 2017 – Natural Language Processing (NLP) text analytics provider Linguamatics, and RealHealthData, a narrative medical records database provider, today announced their strategic partnership to combine Linguamatics’ advanced NLP technology with RealHealthData’s extensive database of detailed provider narratives, to improve the understanding of drug use, adverse events, and product switching in Real World settings.

Understanding the real world (i.e., outside of clinical trials) impact of therapies on patients is critical for pharmaceutical and biotech companies. Medical records are one of the key sources of real world data, and provide evidence that can inform all phases of drug development. RealHealthData provides access to patient narratives from all 50 US states and every medical specialty. The data can be used for all phases of drug development and post marketing research. Linguamatics I2E can be used to extract the key facts from these narratives using relevant ontologies and queries, transforming real world data into actionable intelligence for better decision making.

“Deploying Linguamatics I2E Advanced NLP engine to the RealHealthData database of detailed provider narratives is a natural fit,” said Manuel Prado, CEO of RealHealthData. “Current and future customers can now access the unique and valuable insights in the database using a first-in-class, healthcare-specific Natural Language Processing platform.”