Linguamatics I2E 5.1 focuses on further increasing the power and scale of querying, while optimizing the users’ experiences of query building.

I2E 5.1 enriches and expands on the capabilities introduced in I2E 5.0, which made a big splash in NLP text mining technology.  

I2E 5.1 addresses the increasing variety of representations of the same concept in big data by finding more matches for terms in a document: variations in accented characters, spelling errors, and OCR artefacts are taken into consideration when matching. This ‘fuzzy matching’ returns greater search results and increases recall and accuracy.

One customer commented: ‘I am really looking forward to I2E 5.1’s spelling correction…you don’t realize how much you can miss in your search results because of typos and spelling mistakes.’

Data normalization in I2E, a key feature for tackling big data’s increasing variety, is now easier to use. Regardless of how the original document is written, you can define your numeric ranges in a different unit; for example, you can filter in pounds (as an upper or lower threshold or as a range) and display the results in kilograms.

I2E 5.1 introduces an integrated view of your query and a way of dragging queries around the editor, making it easier to design, tune and maintain your searches.

I2E Asynchronous Messaging Pipeline (AMP) Extract, Transform, Load (ETL) technology automates the NLP text mining of real-time documents at scale

Cambridge, UK & Boston, USA – December 6th, 2016 – Text analytics provider Linguamatics today introduced the I2E Asynchronous Messaging Pipeline (AMP) platform to help healthcare professionals find critical clinical insights faster using Natural Language Processing (NLP).

The addition of I2E AMP to Linguamatics’ award winning NLP text mining solution, I2E, makes the management of background healthcare workflows more efficient, and provides scalability as NLP text mining requirements grow. By automating the text mining of real-time documents, I2E AMP can provide healthcare professionals with rapid insights and help them make timely – and potentially critical – clinical decisions.

Scientific papers are mainly written in English, so it is not surprising that most scientific text mining has concentrated on just one language. However, as the use of text mining has become broader, moving from early research through to clinical and post-marketing, there is increasing need to be able to deal with other languages. In the pharmaceutical sector, this is seen in projects ranging across voice of the customer, analysis of sales reports, adverse event monitoring, patent analysis, and checking the quality of regulatory submission documents. In healthcare, hospitals often have a multinational presence, and a need to collect information from records written in several languages.

Multilingual processing not only allows text mining in other languages (for example, a French medic analysing French electronic medical records), but also allows easier mining of foreign language documents, or across different languages. A couple of examples:

  • An English researcher can mine Chinese text using concepts they have found using the English synonyms, extract the relationships of interest, and then use something like Google translate to show the evidence within the original text.
  • A French medic can automatically link their medical records with relevant clinical trials in English

Linguamatics recognized this growing need and, in I2E 4.4, has provided a platform that can deal with multiple languages. It can even deal with cases such as patent documents where a single document contains text written in multiple languages, ensuring that an English synonym for adverse events such as “die” does not hit the German determiner “die”.

At the recent Text Mining Summit, one piece of feedback that we received was that video tutorials were a good source of helpful information for users. We considered this good timing as we had just started work on one!

KNIME and Pipeline Pilot are both popular workflow tools that I2E customers use to enhance the power of text mining but whereas the Pipeline Pilot components provided by Linguamatics are installed on the server, the KNIME nodes that we have produced are often deployed by individuals on their Desktop KNIME application. To get those users up and running quickly, we've put together a 15 minute YouTube video explaining the steps needed to:

  • Download and install the nodes
  • Create a new KNIME workflow and add the Linguamatics I2E nodes
  • Configure the nodes and run the workflow



We would love your feedback on this video (too long or too short? too quick or too slow?) and please let us know what other topics you would like to be covered by a video tutorial.

There’s a variety of ways of running searches using I2E but for most purposes, the modes can be simplified to:

  • Search using the I2E Java Client, and
  • Everything else

This distinction is important for users, administrators and developers because access to querying is licensed in the same way. Today’s post will explain the differences between the two modes as well as how to make sure that you’re using your existing capabilities in the most efficient way, with reference to license pools, capabilities and user groups.

Querying using the I2E Java Client

If you’re running a search via the I2E Java client, you will have an interactive license pool that has a “Pro Query” capability (for simplicity, I’m ignoring “Express Query” and “Smart Query” capabilities; the description mostly applies to these as well).

In addition to allowing you to run a search, “Pro Query” capability also provides you with uncontested access to the server (unless you log out or your session times out) and the ability to load, create and save queries.

Tasks available for I2E Java client

If two people want to run the I2E Java client at the same time, they will both need to belong to a license pool with a “Pro Query” capability and the sum of those license pools are at least two (e.g. Two named user license pools or 1 concurrent user license pool with two seats).