via GenomeWeb: Linguamatics Unveils New Version of I2E, Begins Early-Access Testing for Communal Forum

February 6 2017

NEW YORK (GenomeWeb) – Recently, Linguamatics released a new version of I2E, its natural language processing text-mining platform, that includes several new features designed to make it easier for healthcare and life sciences customers to search for and incorporate information from text more accurately and efficiently.

The company has also begun testing a new community site that will provide a forum for people within the pharma and life sciences community to exchange ideas, share best practices, and text mining strategies. In October, the company launched an early-access program for the site at its annual user meeting.

Version 5.0 of the software features tools for normalizing concepts such as gene mutations and improved search features designed to help users find key information in unstructured texts and allow large quantities of data to be processed in a more automated fashion. The release also includes a new query language called the Extraction and Search Language, EASL, — previously available in beta — that allows text mining queries to be described and written in a human-readable text format. EASLs can be generated outside the I2E platform, and support custom interfaces and enhanced workflow automation.

Senior I2E product manager Guy Singh told GenomeWeb that this is the company's largest release to date. He highlighted the new query language as one of the key new features that customers will find attractive. It lets customers move beyond basic keyword searches to craft advanced text mining queries and additionally gives them the ability to develop those queries outside of the application. Previously, customers who wanted to build text mining capabilities into their workflows and interfaces would have had to choose from a pre-defined set of queries and could only craft queries within the Linguamatics' platform. "Now they can on the fly create these queries," Singh said. "It just allows the development of much richer applications and also the integration of text mining into more sophisticated workflows."

Furthermore, the normalization tools make it possible to more automatically capture gene information, for example, within text that may be expressed in different ways. For example, the V158M gene could be written that way or VAL158Met. In order to get the software to recognize that these refer to the same thing, "We've come up with a standardized representation for all of them so we can say these two things are equivalent," Singh said.

What that means for the end user is that "when they go and look for a particular mutation, that the software will look for all the ways that it can be expressed in the text so they don't have to explicitly list out all the ways," he added. Also, when the results are returned, equivalent genes are clustered together making clear the relationship between them. This way, "you can review them much more quickly … rather than having to trawl through all the results and figure out which ones are equivalent by hand," Singh said. The normalization tool also helps with range searches, for example it makes it possible to search for a range of drug doses or patients of a particular age within text documents, he added.

Also, last week, Linguamatics introduced the I2E Asynchronous Messaging Pipeline (AMP), which is designed to help healthcare professionals use NLP technologies to find important content in unstructured texts contained in electronic health records. "I2E can deal with tens of millions of documents …. but then if you want to go to even greater levels and you want to do that in more efficient speeds, then you can use AMP to set up a workflow to distribute all the documents and jobs that have been set up across multiple I2E servers," Singh explained.

In the next release of the platform, Linguamatics plans to develop more reusable queries. "We understand that our customers apply our software to particular applications and based on that, we can develop queries that can be almost used as a library of things that they can apply to a specific area," Singh said. Within pharma and biotech, "we are looking at a number of areas where something like that would be applicable." They also plan to offer query functionality for searching patent and clinical trial documents as well as drug label information. "We know from experience that [customers] are doing all these things and so we'll try and provide as much help to them by giving them building blocks to do that."

The company also plans to explore opportunities for I2E within the machine learning space although it is not disclosing details about those plans at this point, Linguamatics VP of Marketing Mary-Ann Moore told GenomeWeb. The future release will also include more accessible user interfaces that will make the solution accessible to more casual business users or scientists.

Furthermore, next year the company plans to beef up its developer community. "There's going to be more and more applications that are going to be using I2E [so] there'll be a combination of people who are building applications and people who are embedding the technology [in I2E]," Singh said. Furthermore, Linguamatics intends to launch a community site that will provide a forum for people within the pharma and life sciences community to exchange ideas, best practices, and text mining strategies. Early responses to the platform have been very positive, according to the company. That's particularly true for the company's healthcare customers, Singh noted. "They are much keener to be able to share some of the best practices that they have developed."

The company also continues to grow and expand its business organically, Moore said. It currently has about 100 employees on staff and claims that its platform is used in 17 of the top 20 global pharmaceutical companies as well as in government agencies like that National Institutes of Health and the US Food and Drug Administration. It also has customers in the biotech and health insurance markets as well.

One existing customer is Agios Pharmaceuticals, which plans to upgrade to the newest version of I2E as soon as it completes current activities around the launch of its first indication for acute myeloid leukemia, Stuart Murray, associate director at Agios, told GenomeWeb. The small molecule is currently in late-stage clinical trials and the company is preparing to file for FDA approval to begin marketing the drug. As part of those efforts, it's doing a lot of data mining for pharmacovigilance purposes. "Because I'm doing a lot of that, I can't afford any downtime so I'm waiting until this work is finished before I go to version 5," he told GenomeWeb.

Murray was familiar with the software from having used it when he worked at Wyeth Pharmaceuticals — bought by Pfizer in 2009 — prior to joining Agios. At the time, Agios needed an internal informatics group and ended up hiring a number of former Wyeth employees who were also familiar with the solution. It made more sense for them to use tools and capabilities that they were already familiar with rather than try to develop something in house, he explained. They have also explored alternative solutions from other companies "but right now we are sticking with I2E because it's the best one," he said.

Murray has already tested at least one of the new features included in I2E. Agios was part of a beta to test EASL prior to its full release in this version of the software. "One of the big challenges we had in the past was query tracking," he said. "Its virtually impossible to share queries unless you have a significant amount of annotation. EASL makes things a lot easier to use and reuse." He also highlighted the value of the new normalization feature. "In the cancer space, what we are looking for is a genetic lesion which predisposes the tumor to sensitivity to that target … so we do a lot of mutation searching [and] I2E is a great tool for doing that."

The normalization feature would also be useful for running chemical genetic screens which involve testing small molecules on cancer cell line panels and then analyzing the cell lines to find the genetic mutations that are responsible for the sensitivity. For these analyses, they use things like gene expression, protein analysis, mutation data, and chemical data. The normalization and range of search tools available in I2E are "immensely valuable" for searching for things like dosing ranges or effective drug concentrations, Murray said.

Other I2E customers include the Huntsman Cancer Institute where the I2E platform is being used to extract information from electronic health records to help researchers at the center better understand cancer and improve treatments. In 2012, Linguamatics was tapped as a commercial partner in the Multilingual Annotation of Named Entities and Terminological Resource Acquisition, or MANTRA, project, a two-year research effort aimed at developing and providing community resources for improving the accessibility of biomedical information from documents in various European languages.

In 2014, Linguamatics announced that it exceeded $10 million in sales in 2013. Moore declined to provide specifics about the privately-held company's current sales numbers but said that its revenues have continued to grow year over year.