Linguamatics I2E recognized for its unique and dynamic approach to meeting specific user needs across the industry

SANTA CLARA, Calif. — Nov 14, 2018 — Based on its recent analysis of the global natural language processing (NLP) in life sciences Artificial Intelligence (AI) market, Frost & Sullivan recognizes Linguamatics with the 2018 Global Product Leadership Award for its industry-leading I2E NLP text mining platform. This intelligent solution generates insights from a wide range of unstructured and semi-structured data, empowering clients to efficiently integrate AI into their operations.

"I2E's data-driven query development approach ensures reliable results even where there is no annotated training data available. It can obtain actionable results from unstructured data up to 1,000 times faster than the traditional keyword-based research," said Kamaljit Behera, industry analyst for Frost & Sullivan. "Linguamatics’ user-focused approach in product development differentiates its solutions from competitor products. For example, Linguamatics has also created a separate interface for less experienced users, allowing them to find and use optimized queries that have been previously created and published, thus making it easy for new users to access the platform’s power for their application areas. Furthermore, the platform employs intuitive reporting to present extracted information in a structured form."


We don’t know what we don’t know. 

“The three great essentials to achieve anything worthwhile are, first, hard work; second, stick-to-itiveness; third, common sense.”

– Thomas A. Edison

Recently I had the honor to spend some time with some amazing people that are committed to quality in healthcare at the at 2018 HQI Conference in Huntington Beach- yes the sacrifices I must make- what an amazing venue! There, I learned that most of the attendants have been doing their reporting for quality via long hours of manual effort. I look at this in two ways…

1) I applaud such valiant efforts to achieve something worthwhile; and

2) are they unaware that technology can decrease their burden? These hard working professionals certainly cover the hard work and stick-to-itiveness. As for the third essential, this isn’t a case of a lack of common sense, it’s simply a case of…”we don’t know what we don’t know”.  

How do you ensure healthcare quality when the priority is quantity?

Data is everywhere. When it comes to data, what is one person’s garbage is another person’s treasure. It simply has to do with what question you have to ask. And when it comes to quality and reporting metrics it’s best to listen to the experts.  The National Committee for Quality Assurance (NCQA) exists to improve the quality of health care...they even say so on their web page.


Drug safety is, of course, one of the central concerns of any drug development project. Right from the start, project teams want to know whether the target they are interested in has any links to adverse events. Or, when they get to lead series or lead compound, is there any evidence that similar compounds or compound classes have been shown to have side effects. If unexpected adverse events occur in clinical trials, again, project teams turn to literature and other sources to see if they can unearth a reason, mechanism, other evidence for this effect. And of course, post-market, pharmaceutical companies must regularly screen the worldwide scientific literature for potential adverse drug reactions, at least every two weeks.

Finding useful information from public sources can be daunting. There are so many different names for any particular gene target, or compound, or disease process, adverse event, side effect. Comprehensive search means using strings and strings of key words. And of course, what is really needed is evidence that a compound is causing an effect, not treating the disease. So, again, key word search doesn’t work well. And then, there are so many different data sources to search.


This month, over 100 life science and healthcare informatics professionals met at the Linguamatics Text Mining Summit 2018 in Portsmouth, NH.

Attendees from multiple pharma companies presented valuable new use cases on how they are using Linguamatics I2E’s Natural Language Processing (NLP)-based AI technology to solve big data challenges from bench to bedside – mining unstructured real world data for rapid reporting of patient trends; discovery of new therapeutic indications of drug targets; developing novel biologics; and supporting risk management and drug safety.

Presenters from healthcare shared how they unlock insights by mining Electronic Health Records (EHRs) in some of the most innovative areas in healthcare today - including real world evidence for clinical outcomes; streamlining prior authorization and medical review workflows; and identifying clinical care gaps.


A key requirement in drug development – and increasingly in precision/personalized medicine and pharmacogenomics – is a comprehensive understanding of the genetic associations for the disease of interest. For a multiple sclerosis (MS) biomarker discovery project, Sanofi wanted to annotate the association of human leukocyte antigen (HLA) alleles and haplotypes with diseases and drug hypersensitivity, as the HLA genotype is responsible for some 30% of the risk of MS and participates in almost every aspect of the disease.

HLA alleles have been associated with multiple autoimmune diseases, various types of cancer, infectious disease, and drug adverse events, but there are no known resources that systematically annotate these associations.

Developing a Comprehensive Catalog of Disease Annotations using Natural Language Processing (NLP)-based Text Analytics

Sanofi identified more than 400 HLA alleles through a whole exome sequencing-based HLA typing and analysis workflow. These potential candidate biomarkers were not annotated in any database. Sanofi then used the Linguamatics I2E NLP solution to analyse and search the literature to annotate the association of the identified HLA alleles with diseases and drug hypersensitivity.

Sanofi linguistically processed and indexed a literature corpus of 25 million PubMed abstracts and 4 million full text journal articles with I2E text analytics, using an internally developed HLA gene ontology, alongside Linguamatics I2E’s dictionary of relationship verbs (e.g. causes, leads to, results in) and Diseases ontology. This identified HLA alleles and haplotypes and their relationships with diseases and drug sensitivity.