Posts from March 2015

(Cambridge, UK and Boston, USA – March 31, 2015) Linguamatics announces the latest release of its award-winning natural language processing (NLP)-based text mining and analytics platform I2E. I2E 4.3’s Connected Data Technology uses an innovative federated text mining architecture, allowing information extraction from multiple data sources at once.

Textual data is heterogeneous in format and often exists in silos spread across multiple locations. Instead of having to run many text mining queries separately across disparate data sources, I2E’s Connected Data Technology allows users to run a single query simultaneously over multiple data sources whether they are located locally, on Linguamatics’ cloud-based I2E OnDemand platform, or on third party servers elsewhere in the cloud. Results are merged together as a single results set that is sorted and clustered, ready for analysis. The technology allows users to get more comprehensive answers to their business critical questions by extracting, synthesizing and connecting all the relevant knowledge from each data source for faster analysis, leading to better decision support and increased speed to actionable insight.

Jason Stamper, Analyst at 451 Research, comments "The federated architecture I2E 4.3 uses is game-changing for knowledge driven industries like life sciences and healthcare. Access to data is a key part of their requirements for comprehensive results and this approach opens the door for organizations to get better results, faster and also allows third party providers of data to keep control of their content. I look forward to seeing how Linguamatics moves forward in the future and how this impacts their business.”


Interest in big data in healthcare is expanding rapidly with the explosion in genomic data and adoption of electronic health records (EHR) resulting from the Affordable Care Act.

This data holds the promise of improved insights into patient outcomes, treatment effectiveness, patient satisfaction and population risk, which is why it is receiving so much attention.

Considerable focus is on how to integrate structured data within your organization, for example, to gain insights from lab data and disease coding, but this is just the first step.

A large proportion of healthcare data is still in an unstructured format represented as documents, reports and images that hold significant levels of detailed data on patients that is not captured, or is poorly captured in structured data. This unstructured text from pathology, radiology and patient narratives captures the entire patient journey and is critical to understanding patient populations, assessing clinical risk and providing a better understanding of disease.

However, the format of the data poses significant challenges to its application and often results in laborious manual extraction to turn it into structured disease codes or specific data sets such as cancer registries. These manual processes are not scalable for the level of discrete data required for analytics and outcomes analysis, but how can this be addressed?


Patent literature is a hugely valuable source of novel information for life science research and business intelligence.

The wealth of knowledge disclosed in patents may not be found in other information sources, such as MEDLINE or full text journal articles.

Patent landscape reports (also known as patent mapping or IP landscaping) provide a snap-shot of the patent situation of a specific technology, and can be used to understand freedom to operate issues, to identify in- and out-licensing opportunities, to examine competitor strengths and weaknesses, or as part of a more comprehensive market analysis.

These are valuable searches, but demand advanced search and data visualization techniques, as any particular landscape reports requires examination of many hundreds or thousands of patent documents.

Innovative use of I2E resulted in a 99% efficiency gain in delivering relevant information

Patent text is unstructured; the information needed is often embedded within the body of the patent and may be scattered throughout the lengthy descriptions; and the language is often complex and designed to obfuscate.

Bristol Myers Squibb Text analytics workflow uncovers kinase assay trends

A recent paper by a team at Bristol Myers Squibb describes a novel workflow to discover trends in kinase assay technology.


(Cambridge, UK & Boston, USA - 03 March 2015)

Linguamatics has been named in KMWorld’s list of “100 Companies That Matter in Knowledge Management” for the second year running.

Now in its 15th year, the KMWorld 100 Companies That Matter list is compiled by KM practitioners, theorists, analysts, vendors and their customers and colleagues.

Linguamatics’ agile NLP text mining software, I2E, provides rapid knowledge discovery from unstructured text. Organizations face the challenge of filtering ever-increasing volumes of text information to gain actionable insights for key decision-making.

Using I2E, knowledge can be extracted from a wide range of content sources such as scientific literature, patents, clinical trials data, electronic health records (EHRs), news feeds and proprietary content. This knowledge can then be used to inform high-value business decisions in real time. [KMWorld2015]

“The criteria for inclusion on the list vary, but each of those listed have things in common. Each has either helped to create a market, redefine it, enhance or extend it.”, says Hugh McKellar, KMWorld Editor-in-Chief, “Linguamatics continues its role as a forward-thinking pioneer in the natural language processing (NLP) text mining market, combining its text mining platform together with customer support to offer a deep, hands-on understanding of the pharmaceutical and healthcare industries.”

“We are honored to be recognized once again by KMWorld as one of the 100 companies that matter in knowledge management,” comments John M. Brimacombe, Executive Chairman, Linguamatics.