On July 16, delegates across the life sciences, biotech, healthcare and other knowledge-driven industries gathered in Princeton for Linguamatics’ one-day seminar: “From bench to bedside, unlocking key insights in your data”.  

We heard from Regeneron Pharmaceuticals, Johnson & Johnson, Copyright Clearance Center (CCC) and Linguamatics on how NLP technology is moving into new application areas to improve patient outcomes and unlock key insights across the drug discovery, development and delivery continuum. Delegates were very engaged and many stayed long after the talks had finished, to continue the day’s discussions.  

Jim Dixon, Senior Application Specialist, gave us an introduction to I2E NLP text mining and the new features in the latest I2E release and industry’s first federated text mining platform. Whatever the content, I2E can mine and extract with precision and at scale. You can use Linguamatics I2E to provide valuable intelligence from text, getting you to the answers faster so you can make smarter and better informed decisions.

Dr. Peng Zhang’s presentation showed us a real-life use case of I2E’s potential at Regeneron. Eliminating or modifying a single gene in the mouse genome can provide insight into the role that gene plays in normal physiology and disease pathogenesis, but keeping up-to-date with novel information is time-consuming. Dr. Zhang uses I2E to systematically mine the scientific literature for any reported gene knockout in mice, and associated autoimmune phenotype.


Life sciences and healthcare professionals gathered at the UCSF Mission Bay campus for the West Coast Natural Language Processing (NLP) & Big Data Symposium on June 18th. The symposium, co-hosted by UCSF, featured presenters from UCSF, Merck, City of Hope, Copyright Clearance Center and Linguamatics and delegates from a diverse range of organizations.

The central theme of this year’s symposium was “From bench-to-bedside, unlocking key insights from your data”. Healthcare delegates were keen to find new ways to address meaningful use and accountable care leveraging NLP text mining of electronic health records. Life sciences delegates were keen to increase the efficiency and effectiveness of their business operations by mining real world data. There was also a strong interest in forging partnership opportunities between pharma/biotech and hospitals/cancer centers.

Sorena Nadaf, the CIO and Director of Translational Informatics at UCSF Helen Diller Family Comprehensive Cancer Center delivered the welcome address and highlighted the foundation of clinical NLP and its common uses for extracting and transforming narrative information in EMR’s to support and accelerate clinical research.

NLP & Big Data Symposium
Sorena Nadaf at the NLP & Big Data Symposium in San Francisco.


Over the past few months there have been several publications which have used Linguamatics I2E to extract key information to provide value in a variety of different projects. We are constantly amazed by the inventiveness of our users, applying text analytics across the bench to bedside continuum; and these different publications are no exceptions. Using the natural language processing power of I2E, researchers are able to answer their questions rapidly and extract the results they need, with high precision and good recall; compared to more standard keyword search, which returns a document set that they then need to read.

Let’s start with Hornbeck et al., “PhosphoSitePlus, 2014: mutations, PTMs and recalibrations”. PhosphoSitePlus is an online systems biology resource for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation. It’s provided by Cell Signaling Technology who have been users of I2E for several years. In the paper, they describe the value from integrating data on protein modifications from high-throughput mass spectrometry studies, with high-quality data from manual curation of published low-throughput (LTP) scientific literature.


Linguamatics I2E: the first text mining platform to integrate with Copyright Clearance Center's RightFind XML for Mining, to allow access to full-text journal articles

(Cambridge, UK and Boston, USA - 24 June 2015 ) - Linguamatics is expanding its natural language processing (NLP)-based text mining platform I2E to include easier access to full-text articles, with the integration of Copyright Clearance Center's (CCC) new text mining solution, RightFind™ XML for Mining.

Commercial life science researchers can now create sets of full-text XML articles from more than 4,000 peer-reviewed journals produced by over 25 scientific, technical, and medical (STM) publishers, and automatically make them available for text mining in I2E.

The solution enables researchers to make discoveries and connections that can only be found in full-text articles. All of the content is stored securely by CCC and is pre-authorized by publishers for commercial text mining. Users access the content using Linguamatics’ unique federated text mining architecture which allows researchers to find the key information to support business-critical decisions. The integrated solution is available now, and enables users to save time, reduce costs and help mitigate an organization’s copyright infringement risk.


Better access to the high value information in legacy safety reports has been, for many folk in pharma safety assessment, a “holy grail”. Locked away in these historical data are answers to questions such as:  Has this particular organ toxicity been seen before? In what species, and with what chemistry? Could new biomarker or imaging studies predict the toxicity earlier? What compounds could be leveraged to help build capabilities?


I2E enables extraction and integration of historical preclinical safety information, crucial to optimizing investment in R&D, alleviating concerns where preclinical observations may not be human-relevant, and reducing late stage failures.

Coming as I do from a decade of working in data informatics for safety/tox prediction, I was excited by one of the talks at the recent Linguamatics Spring User conference. Wendy Cornell (ex-Merck) presented on an ambitious project to use Linguamatics text mining platform, I2E, in a workflow to extract high value information from safety assessment reports stored in Documentum.

Access to historic safety data is a potential advantage that will be helped with the use of standards in electronic data submission for regulatory studies (e.g. CDISC’s SEND, the standard for exchange of non-clinical data).