Animal models are crucial in the understanding of disease, the underlying pathways and the gene targets that play a role. One tool that has shown great value is the knockout mouse model.

The number of KO mouse models has increased massively since the first one in 1989, and mice models have been used successfully in increasing our understanding of diseases as varied as different cancers, diabetes, obesity, blindness, Huntington's disease, aggressive behaviour, and even drug addiction.

Understanding the landscape of KO mouse models for any particular disease area is important, and curated databases (e.g. IMPC or MGI) provide valuable data, but keeping track of new KO mouse models published in the scientific literature is challenging.

Peng Zhang, ‎Senior Staff Scientist at Regeneron Pharmaceuticals, uses Linguamatics I2E to tackle this challenge, and he presented on “Text Mining for Knockout Mice and Phenotypes” earlier this year.

 Diagram showing the set of KO genes involved in autoimmune phenotypes.  All hits from both I2E and MGI were manually curated and only 479 unique KO genes were considered “true positive”. 61% true positives only came from I2E query and were not covered by MGI.


CMIOs-Importance-of-Clinical-NLP

 

The transition to new value-based payment models is spurring provider demand for technologies that enhance patient care and minimize safety risks, and in turn reduce costs. Of particular interest are tools to help providers predict the likelihood of potentially avoidable outcomes, such as a hospital readmission, pulmonary nodules turning cancerous or the contraction of sepsis.

According to a recent Linguamatics survey, most hospital CMIOs support the use of predictive models to improve the quality of care. In addition, CMIOs believe that these models can be enhanced with the use of Natural Language Processing (NLP) to access insightful data from unstructured chart notes.


Scientific papers are mainly written in English, so it is not surprising that most scientific text mining has concentrated on just one language. However, as the use of text mining has become broader, moving from early research through to clinical and post-marketing, there is increasing need to be able to deal with other languages. In the pharmaceutical sector, this is seen in projects ranging across voice of the customer, analysis of sales reports, adverse event monitoring, patent analysis, and checking the quality of regulatory submission documents. In healthcare, hospitals often have a multinational presence, and a need to collect information from records written in several languages.

Multilingual processing not only allows text mining in other languages (for example, a French medic analysing French electronic medical records), but also allows easier mining of foreign language documents, or across different languages. A couple of examples:

  • An English researcher can mine Chinese text using concepts they have found using the English synonyms, extract the relationships of interest, and then use something like Google translate to show the evidence within the original text.
  • A French medic can automatically link their medical records with relevant clinical trials in English

Linguamatics recognized this growing need and, in I2E 4.4, has provided a platform that can deal with multiple languages. It can even deal with cases such as patent documents where a single document contains text written in multiple languages, ensuring that an English synonym for adverse events such as “die” does not hit the German determiner “die”.


At the recent Text Mining Summit, one piece of feedback that we received was that video tutorials were a good source of helpful information for users. We considered this good timing as we had just started work on one!

KNIME and Pipeline Pilot are both popular workflow tools that I2E customers use to enhance the power of text mining but whereas the Pipeline Pilot components provided by Linguamatics are installed on the server, the KNIME nodes that we have produced are often deployed by individuals on their Desktop KNIME application. To get those users up and running quickly, we've put together a 15 minute YouTube video explaining the steps needed to:

  • Download and install the nodes
  • Create a new KNIME workflow and add the Linguamatics I2E nodes
  • Configure the nodes and run the workflow

We would love your feedback on this video (too long or too short? too quick or too slow?) and please let us know what other topics you would like to be covered by a video tutorial.


Clinical NLP Important Applications

The advent of accountable care, meaningful use, and the triple aim is creating an unprecedented demand for insightful patient data. Though structured data reveals valuable information, some 80% of EHR data resides in an unstructured narrative format. Furthermore, of the 1.2 billion clinical documents produced in the US each year, 60% of the valuable information exists in unstructured narrative documents that are largely inaccessible for data mining and quality measurement.

To gain better insight into patient data, providers might be inclined to expand their use of templates to capture discrete observations. Unfortunately, when purely coded templates take the place of free-text narratives, the resulting documentation often fails to capture subtle circumstances of a patient’s story. Frequently the patient narrative is the most effective means of communicating detailed information between healthcare professions.

What alternatives do providers have for preserving the patient narrative, while at the same time gain additional insights from a patient’s complete medical record? One option is to tap into the power of Natural Language Processing (NLP) technology.