What's New in I2E?

Introducing I2E 5.5.1

I2E 5.5.1 is a minor update available for I2E OnDemand and Enterprise customers. The main change is a new option to change the way that blank items are presented in results for queries containing optional smart query parameters. It also contains bug fixes and performance enhancements.

Introducing I2E 5.5

I2E 5.5 introduces updates and new features to most parts of the product.

Better display of Tabular Data

In I2E 5.5, results from tabular files (Excel spreadsheets or comma, tab or pipe delimited files) are displayed in an easier-to-read cache document format (figure 1).

Figure 1 New format for tabular data cache documents

New and Updated Ontologies

I2E 5.5 introduces a new pattern ontology to find, highlight and extract mentions of Energy values in your documents. Standardized to kJ, the ontology will also find other joule, calorie and electronVolt expressions. In addition, there are derived units for Energy/Area, Energy/Mass and Energy/Volume (figure 2).

Figure 2 New Energy classes in the Measurements pattern ontology

The OnDemand ontologies have been updated with more recent source data and the settings for these have been reviewed. The Entrez Gene ontology matches more terms (particularly multi-word phrases) in the documents by leveraging a new indexing setting that permits finer-grained term matching for longer strings containing prepositions. For HPO and Orphanet, restrictions on term matches have been relaxed to increase the recall for these ontologies.

Improvements to Searching with Regular Expressions

The improved regular expressions in I2E 5.5 can now include spaces in their definition: for example, you may wish to extract US state and zip code from an address with “[A-Z]{2} [0-9]{5}”. Or you may want to find terms where there may or may not be a space, like “i(odine)?[\- ]?131”.

The new raw regular expressions in I2E 5.5 work more like regular expressions in other systems: for example, they permit the use of “.” to match spaces: “the.*gene” will hit “the gene”, “the pathogenesis”, “the general”, “the expression of genes”, etc.

Spaces in these expressions are preserved in EASL, so that queries can easily be generated from your regular expressions outside of I2E.

Extensions to our Multilingual Ontologies

The out-of-the-box pattern ontologies for Measurements, Units, Numerics and Duration & Frequency (dosage) has been extended to include French, German, Spanish and Dutch terms. Italian terms have also been added to the Numerics pattern ontology. Matches and normalization are language-specific, so a mention of “1 billion” would be interpreted as 1,000,000,000,000 in a German document and 1,000,000,000 in an English document.

Compound words are rare in English but common in certain other languages, for example German and Dutch. To improve recall when using ontologies, additional processing happens in I2E 5.5 so that, for example, the concepts “Sugar” and “Blood” will match “Blutzuckerwerte” in a document.

Faster Access to Source-Specific Information

The I2E OnDemand FDA Drug Labels index now includes a specific region to identify the Strength of Active Ingredient. In addition, the I2E OnDemand ClinicalTrials.gov index now includes some additional regions related to Pending Results and Individual Participant Data. The changes to both of those indexes make it easier to extract these values from the relevant documents.

Easier Server Administration

In I2E 5.5, it is easier to configure your I2E server by setting up environment variables that will override the equivalent settings in the main server configuration file (server.conf). The environment variable names are derived from the names in server.conf, for example: the environment variable for managing the deletion of completed tasks (called deleteCompletedTask in server.conf) is I2E_DELETECOMPLETEDTASK.