Faced with the challenges of Accountable Care, the triple aim and Meaningful Use, NLP will help CMIOs to improve insights into patient and population health.

CAMBRIDGE, UK and BOSTON, USA - November 10th, 2015 – Chief Medical Informatics Officers (CMIOs) at US healthcare providers see that Accountable Care, the triple aim and Meaningful Use are all creating an unprecedented demand for more insights from patient data. Since much of the key information is locked away in unstructured data, the overwhelming majority of CMIOs believe that rapidly increasing the use of Natural Language Processing (NLP) will be a critical step in accessing this data and thus improving the delivery of patient care. These are the key findings of a recent study prepared by Linguamatics, with support from the American Medical Informatics Association (AMIA).

Linguamatics commissioned their “Assessing the Role of Clinical NLP in the Delivery of Patient Care” report, with the aim of discovering how CMIOs envisage using NLP in new applications that both enhance patient care and improve hospital efficiency. The report succeeded in uncovering four key areas where CMIOs foresee key developments:

• The CMIOs surveyed considered the potential improvement in the quality of care and patient safety, and the resulting reduction in costs that could be generated with predictive models, to be the most important application involving NLP. For example, CMIOs expressed interest in applications such as predicting hospital readmissions or outmigration/patient leakage.


Earlier this year, Linguamatics announced our new Connected Data Technology for federated search, and in our newest version, I2E 4.4, we build on this to take another step along the path of better data interoperability. I2E 4.4 introduces a more powerful way to customize your text analytics results using enhanced linkouts in the HTML output, enabling you, for example, to connect your text-mined data to structured content.

Linkouts enable you to link out to, or pull in, additional information relating to the preferred terms (PTs) or concept identifiers (NodeIDs) in your query results. They can be hyperlinks, images or customized output. For example, you can configure linkouts to see information from an external website by clicking on the concept in the text-mined query results. Alternatively, it is possible to enable the interface to display an image in the query results, such as a chemical structure, instead of the preferred term.

This new functionality means you can use linkouts to enhance query results, by enabling you to access additional related information to provide more context or metadata for your search. So, for example, a search for chemicals from ChEBI could link directly from the preferred term in your results to the webpage for that concept on the EBI web site (e.g. Cyclosporine), whilst a gene name in the same result links to EntrezGene (e.g. ICAM1).


At the October Text Mining Summit, we had speakers from pharma, biotech and academia presenting on an amazing range of different applications of text analytics to provide value within the drug discovery-development pipeline. Over a day and a half we heard from a dozen external speakers from healthcare and pharma, all sharing their enthusiasm for the value that text analytics can bring to the drug discovery, development and delivery environments.

Work presented by UNCC researchers using I2E to understand potential health effects of plant phytochemical: Network map of text-mined associations linking Plant to phytochemical; Phytochemical to human genes; Human genes to biological pathways; Pathways linked to human health phenotypes.

The life science applications ranged from safety, target discovery and alerting, genotype-phenotype annotations, clinical trial analytics, phytochemicals as potential nutraceuticals, and patent landscaping for antibody-drug conjugates.

Back by popular demand, Wendy Cornell (ex-Merck) presented on gaining value from internal preclinical safety reports using I2E, which we’ve discussed in blog posts here before.


This year, the annual Linguamatics Text Mining Summit, which took place Oct 12-14 in Newport, RI, introduced the first-ever I2E Healthcare Hackathon. With a rapidly expanding healthcare user base and continued interest from our pharmaceutical clients in real world data, we realized the value of running a session specifically targeting mining electronic health records. Our aim was to have an in-depth exercise that closely reflected the types of data and issues that our customers and our own team encounter. Linguamatics has been involved in a wide variety of complex and valuable customer projects, allowing us to develop best practice querying strategies and data processing techniques. The Hackathon was a great opportunity for us to share these best practices with the greater Linguamatics and text mining community. The session was set up as a competition in a similar style to the i2b2 NLP challenges, with teams being formed from the 23 attendees from a mixture of experienced and new users from pharma and healthcare groups.


It was great to see our paper on the i2b2 NLP challenge from last year published recently. The challenge looked at extraction of Coronary Artery Disease risk factors from unstructured patient data provided by the Research Patient Data Repository of Partners Healthcare. Having done previous i2b2 challenges, such as smoking cessation, after the competition had closed, we wanted to actively participate in the 2014 NLP challenge and see how we compared against other NLP groups in the competition. Linguamatics work with many academic medical centers and cancer centers and view collaboration as a key component of our customer relationships. As such, we wanted to share our success or failure with our peers and show how a commercial system can tackle these areas.

The i2b2 training set consisted of 790 annotated documents relating to 178 patients, which we decided to divide into training (70%) and development (30%) sets. The test set contained 514 documents from 118 patients. Contestants were set this task: extract CAD risk factors such as specific diseases (e.g. diabetes), medications, family history of CAD and lab results; also take into account when tests were carried out or whether a disease diagnosis was in the past or current.

Our team’s results were excellent and, at 91.7% Micro F-Score, were competitive with the best system in this challenge. I2E, being a rule based system, was well suited for the challenge compared to machine learning systems because: