New release broadens ETL and enterprise workflow applications in healthcare and life science

Cambridge, England and Boston, USA — May 15, 2018 — Linguamatics, the leading natural language processing (NLP) text analytics provider, today announced the latest release of its I2E AMP platform to automate the discovery of critical insights from text using NLP.

The I2E Asynchronous Messaging Pipeline (AMP) platform delivers high-throughput, fault tolerant workflow management for real-time document and record processing, addressing the NLP text-mining and ETL (extract transform load) requirements for healthcare and life science organizations of all sizes by allowing users to plug I2E into enterprise workflows and rapidly process streams of data at scale.

I2E AMP 2.0 includes enhanced functionality to speed overall throughput and performance, sophisticated pre- and post-processing capabilities, a Web GUI to simplify the set-up of initial workflows, and new AMP Agents for smarter load balancing, easier deployment, and optimized I2E management.

 “I2E’s flexible NLP platform goes far beyond traditional entity mark-up, providing semantically enriched data that normalizes concepts and relationships based on the relevant context,” said David Milward, chief technology officer for Linguamatics. “With AMP, clients now have an enterprise class, high-throughput solution that provides secure, fault-tolerant, scalable, and real-time ETL from unstructured text to structured data.”

At the recent Text Mining Summit, one piece of feedback that we received was that video tutorials were a good source of helpful information for users. We considered this good timing as we had just started work on one!

KNIME and Pipeline Pilot are both popular workflow tools that I2E customers use to enhance the power of text mining but whereas the Pipeline Pilot components provided by Linguamatics are installed on the server, the KNIME nodes that we have produced are often deployed by individuals on their Desktop KNIME application. To get those users up and running quickly, we've put together a 15 minute YouTube video explaining the steps needed to:

  • Download and install the nodes
  • Create a new KNIME workflow and add the Linguamatics I2E nodes
  • Configure the nodes and run the workflow

We would love your feedback on this video (too long or too short? too quick or too slow?) and please let us know what other topics you would like to be covered by a video tutorial.

There’s a variety of ways of running searches using I2E but for most purposes, the modes can be simplified to:

  • Search using the I2E Java Client, and
  • Everything else

This distinction is important for users, administrators and developers because access to querying is licensed in the same way. Today’s post will explain the differences between the two modes as well as how to make sure that you’re using your existing capabilities in the most efficient way, with reference to license pools, capabilities and user groups.

Querying using the I2E Java Client

If you’re running a search via the I2E Java client, you will have an interactive license pool that has a “Pro Query” capability (for simplicity, I’m ignoring “Express Query” and “Smart Query” capabilities; the description mostly applies to these as well).

In addition to allowing you to run a search, “Pro Query” capability also provides you with uncontested access to the server (unless you log out or your session times out) and the ability to load, create and save queries.

Tasks available for I2E Java client

If two people want to run the I2E Java client at the same time, they will both need to belong to a license pool with a “Pro Query” capability and the sum of those license pools are at least two (e.g. Two named user license pools or 1 concurrent user license pool with two seats).

Part of the I2E Enterprise installation is the Sample Web GUI — a Smart Query interface written as a web application that allows users to run smart queries using only their browser.

The Smart Query interface

A neat trick that it performs is on-the-fly class matching: start typing in a word and the server starts to suggest terms in your dictionary that would match. So a search for “psor” will suggest Psoriasis, Psoriatic Arthritis, etc.

Accepting the suggestion will then populate the search with that class rather than the word. The autosuggestion, dropdowns and tooltips are very nice from the user experience perspective, but today’s post will concentrate on the class match itself – how can a search for “psoriasis” retrieve a class match?

There is a two-part answer to that question – the first part is quite easy to answer and the second part is (only slightly) more complicated. So, let’s start with the first part.

Using the query parameters “search”, “pt” or “synonym”

Class matching is a synchronous operation in I2E that uses a query parameter to specify the input and returns the matches as a list/array of classes. Because of this, it’s something that you can try very simply with your web browser. The general form of the URL is (omitting the protocol, servername and port information for brevity):


Although I2E Queries and Multi Queries are binary objects, the I2E Web Services API provides an interface to a subset of the properties of those items, including some that can be modified when running a query programmatically.

Query properties that are read-only and that can be retrieved using the API include title, creator, comments and column headers. Query properties that can be modified before query submissions include number of hits, time limit and smart query parameters.

I2E has two, related, query resources: Saved Queries (that represent the binary files on disk, stored in the Repository) and Published Queries (that represent the Published location of the Saved Queries). To ensure that Users have permissions to see Query Properties, it is recommended that you only expose access to Published Queries.

Retrieving (by GET) a Published Query provides a “handle” to the Saved Query:

HTTP Header = X-Version: *, Accept: application/json GET;type=published_query/QueryTree/Query1.i2q Success 200

The response should look something like:

“shared”: true,
“valid”: true,
“handle”: “/api;type=saved_query/4.1/Query1.i2q”,
“error”: null,
“editable”: true


If you then retrieve that handle, you will receive an error because the server is trying to represent the query itself as JSON

HTTP Header = X-Version: *, Accept: application/json GET;type=saved_query/4.1/Query1.i2q Error 406