Scientific papers are mainly written in English, so it is not surprising that most scientific text mining has concentrated on just one language. However, as the use of text mining has become broader, moving from early research through to clinical and post-marketing, there is increasing need to be able to deal with other languages. In the pharmaceutical sector, this is seen in projects ranging across voice of the customer, analysis of sales reports, adverse event monitoring, patent analysis, and checking the quality of regulatory submission documents. In healthcare, hospitals often have a multinational presence, and a need to collect information from records written in several languages.

Multilingual processing not only allows text mining in other languages (for example, a French medic analysing French electronic medical records), but also allows easier mining of foreign language documents, or across different languages. A couple of examples:

  • An English researcher can mine Chinese text using concepts they have found using the English synonyms, extract the relationships of interest, and then use something like Google translate to show the evidence within the original text.
  • A French medic can automatically link their medical records with relevant clinical trials in English

Linguamatics recognized this growing need and, in I2E 4.4, has provided a platform that can deal with multiple languages. It can even deal with cases such as patent documents where a single document contains text written in multiple languages, ensuring that an English synonym for adverse events such as “die” does not hit the German determiner “die”.


At the recent Text Mining Summit, one piece of feedback that we received was that video tutorials were a good source of helpful information for users. We considered this good timing as we had just started work on one!

KNIME and Pipeline Pilot are both popular workflow tools that I2E customers use to enhance the power of text mining but whereas the Pipeline Pilot components provided by Linguamatics are installed on the server, the KNIME nodes that we have produced are often deployed by individuals on their Desktop KNIME application. To get those users up and running quickly, we've put together a 15 minute YouTube video explaining the steps needed to:

  • Download and install the nodes
  • Create a new KNIME workflow and add the Linguamatics I2E nodes
  • Configure the nodes and run the workflow

We would love your feedback on this video (too long or too short? too quick or too slow?) and please let us know what other topics you would like to be covered by a video tutorial.


Clinical NLP Important Applications

The advent of accountable care, meaningful use, and the triple aim is creating an unprecedented demand for insightful patient data. Though structured data reveals valuable information, some 80% of EHR data resides in an unstructured narrative format. Furthermore, of the 1.2 billion clinical documents produced in the US each year, 60% of the valuable information exists in unstructured narrative documents that are largely inaccessible for data mining and quality measurement.

To gain better insight into patient data, providers might be inclined to expand their use of templates to capture discrete observations. Unfortunately, when purely coded templates take the place of free-text narratives, the resulting documentation often fails to capture subtle circumstances of a patient’s story. Frequently the patient narrative is the most effective means of communicating detailed information between healthcare professions.

What alternatives do providers have for preserving the patient narrative, while at the same time gain additional insights from a patient’s complete medical record? One option is to tap into the power of Natural Language Processing (NLP) technology.


Faced with the challenges of Accountable Care, the triple aim and Meaningful Use, NLP will help CMIOs to improve insights into patient and population health.

CAMBRIDGE, UK and BOSTON, USA - November 10th, 2015 – Chief Medical Informatics Officers (CMIOs) at US healthcare providers see that Accountable Care, the triple aim and Meaningful Use are all creating an unprecedented demand for more insights from patient data. Since much of the key information is locked away in unstructured data, the overwhelming majority of CMIOs believe that rapidly increasing the use of Natural Language Processing (NLP) will be a critical step in accessing this data and thus improving the delivery of patient care. These are the key findings of a recent study prepared by Linguamatics, with support from the American Medical Informatics Association (AMIA).

Linguamatics commissioned their “Assessing the Role of Clinical NLP in the Delivery of Patient Care” report, with the aim of discovering how CMIOs envisage using NLP in new applications that both enhance patient care and improve hospital efficiency. The report succeeded in uncovering four key areas where CMIOs foresee key developments:

• The CMIOs surveyed considered the potential improvement in the quality of care and patient safety, and the resulting reduction in costs that could be generated with predictive models, to be the most important application involving NLP. For example, CMIOs expressed interest in applications such as predicting hospital readmissions or outmigration/patient leakage.


Earlier this year, Linguamatics announced our new Connected Data Technology for federated search, and in our newest version, I2E 4.4, we build on this to take another step along the path of better data interoperability. I2E 4.4 introduces a more powerful way to customize your text analytics results using enhanced linkouts in the HTML output, enabling you, for example, to connect your text-mined data to structured content.

Linkouts enable you to link out to, or pull in, additional information relating to the preferred terms (PTs) or concept identifiers (NodeIDs) in your query results. They can be hyperlinks, images or customized output. For example, you can configure linkouts to see information from an external website by clicking on the concept in the text-mined query results. Alternatively, it is possible to enable the interface to display an image in the query results, such as a chemical structure, instead of the preferred term.

This new functionality means you can use linkouts to enhance query results, by enabling you to access additional related information to provide more context or metadata for your search. So, for example, a search for chemicals from ChEBI could link directly from the preferred term in your results to the webpage for that concept on the EBI web site (e.g. Cyclosporine), whilst a gene name in the same result links to EntrezGene (e.g. ICAM1).