Innovative ETL (Extract, Transform, Load) technology frees 80% of unstructured data trapped in Data Lakes, enabling high-value knowledge discovery and decision support

Cambridge, UK & Boston, USA – 30th November, 2016 – Text analytics provider Linguamatics today released the latest version of their award-winning natural language processing (NLP) text mining platform, I2E 5.0.

Game-changing capabilities in I2E 5.0 include normalization of concepts (e.g. dates, measurements, gene mutations) within unstructured text, advanced range search and a new query language EASL. These capabilities tackle the variety in big data, and accelerate insights from unstructured, semi-structured and structured data sources.

Normalization and range search helps users find key information (e.g. a particular temperature or a range of temperatures) in unstructured text sources regardless of how the information is expressed, and boosts ETL operations by identifying, extracting and standardizing data. Given that around 80-90% of big data is unstructured, these new text mining capabilities allow huge amounts of data to be processed that previously had to be read manually.


Uncovering new toxicities from chronic non-rodent studies

Preclinical toxicology studies are an essential part of the drug discovery-development pipeline, to support the safe conduct of clinical trials. And drug safety is, of course, one of the most critical aspects to ensure during drug development.

We were pleased to see the recent publication by Merck on a text-mining approach to assess the value of chronic non-rodent toxicology studies. 

Preclinical safety assessment groups employ a variety of animal models and assays to satisfy regulatory agency requirements to identify and characterize drug toxicities, describe drug exposures, and provide qualitative and quantitative risk assessments for human exposure. These require considerable resource investment, however the results are often “locked away” in internal reports. This means re-use of these valuable data is difficult and costly.

This is a common situation within the pharmaceutical industry – where critical information is locked away in textual reports, such as the informed scientific conclusions of pathologists, histologists, safety experts. Natural language processing can overcome the barriers, extracting structured facts from unstructured documents, and Merck’s paper describes an evaluation of a text mining workflow to access these important data.


Press Release: Natural Language Processing (NLP) to Optimize Clinical Trials: I2E Hackathon at the Linguamatics Text Mining Summit - Using text mining to address healthcare information challenges


Enhancing problem list reconciliation with Natural Language Processing (NLP): Improve patient care quality with population health text mining analytics

The shift from volume to value-based compensation is driving provider demand for better insights into the health of patient populations. Providers recognize that access to more complete patient data can enhance their ability to deliver cost-effective care and high quality outcomes. This is especially true for patients with multiple chronic conditions, who typically have more complicated care needs and higher hospital utilization rates.

Figure 1 High risk patients are frequently suffering from complex comorbidities

Typically, physicians refer to problem lists when assessing a patient’s health and evaluating treatment alternatives. Problem lists rely on coded disease states and offer a concise view of a patient’s medical issues. Unfortunately, these lists are often incomplete or out of date. Consider, for example, a patient who is referred to an orthopedic surgeon for a broken wrist. If the problem list only includes details of the wrist injury, the physician may not be immediately aware of underlying chronic conditions, such as diabetes, that could impact the best course of treatment and outcomes.


Last week I attended the Cambridge Rare Disease Network (CRDN) 2016 SummitCRDN is a newly established charity working to build a community of people in Cambridge to address the unmet needs in rare disease research and treatment. As last year, there was a great set of speakers, from patient groups, academia, pharma/biotech and vendors.

There has been a step-change in awareness within the pharma industry in the last decade, with an increasing interest and investment in tackling rare diseases. I blogged last year about an interview with Patrick Vallance on this same topic.  He gave several reasons why GSK are interested in rare disease, and these three reasons all were echoed by the CRDN speakers.

The topmost of these reasons was given most clearly by speakers from patient groups, such as Daniel Lewi from the Cure & Action for Tay-Sachs (CATS) Foundation, Karen Harrison from ALD Life, and Emily Kramer-Kolingoff, from Emily’s Entourage. They spoke of the huge impact that rare disease has on individuals and families, and the urgent need for research into new or repurposed treatments for the 1 in 17 people affected by a rare disease.