I2E 5.4

Introducing I2E 5.4

Development of I2E 5.4 encompassed all areas of the product, with improvements to ontologies, indexing and query evaluation. These are areas that improve results for I2E users with Enterprise and OnDemand deployment but also enhance users of iScite, I2E AMP and I2E Web Portals.

Ontology Improvements

In every release our Ready-to-Use Ontologies (which are used in our I2E OnDemand indexes) are rebuilt with the latest sources and there are often other improvements made. I2E 5.4 includes several of these additional changes, including Cache Document Tooltips and Ontology Aliases.

Cache Document Tooltips now provide extra information when Ready-to-Use Ontology terms are highlighted in cached documents: click on the term to see additional information for that item (figure 1).

cache document tooltips

Figure 1 Rich Tooltips now include color-coded information about the Ontology matched, the PT and any other available information, for example a chemical structure for Chemistry-enabled indexes.

Ontology Aliases provides a mechanism for you to be informed that an out-of-date concept has been replaced with a new concept in an ontology. This means that the I2E server will know how to deal with concepts that are in older queries (developed against older indexes) which have been superseded in current indexes. This new feature is used extensively in I2E 5.4 to accelerate changes to our NCI Enhanced and Organizations by Sector ontologies.

Specific changes in our Ready-to-Use Ontologies include:

  • Re-arranging the Linguamatics Diseases hierarchy to split it into two major branches: “Diseases and Disorders” and “Signs and Symptoms” (figure 2). This allows you to focus on Symptoms or, conversely, exclude Symptoms from your results.

diseases hierarchy

Figure 2 The Linguamatics Diseases ontology has been split into two major branches: "Signs and Symptoms" and "Diseases and Disorders"

  • Many gene aliases (which are actually related but non-synonymous terms) have been reviewed and excluded from the Gene/Protein ontology. This improves precision when used by itself or in bigger patterns, for example, protein-protein interactions.
  • The matching of names from the Organizations by Sector ontology has been improved. This means that indexing will automatically match “Contoso Société à responsabilité limitée” to “Contoso SARL” without requiring both terms to exist in the ontology.

Query Gold Standard Evaluation

I2E 5.4 introduced the new Query Gold Standard Evaluation feature. In use for several years at Hackathons at Linguamatics events, this is now available to all users of I2E to allow you to objectively measure the quality of your queries against a pre-determined gold standard dataset. Running the evaluation will score your run your query on your index, compare it against the gold standard and generate a measure of precision, recall and an F-score. Evaluation can be run in either training or test mode: training mode will also show you the query results classified as true positives, false positives and false negatives (figure 3).

query gold standard evaluation results

Figure 3 Results pages generated by evaluating a Smoking Categorization I2E query against a gold standard dataset in training mode.

Additional Indexing Features

There are some other new features in I2E 5.4 that can improve document processing and indexing. These new features are enabled by a new way to index file metadata in I2E 5.4: it is possible to pass a properties file alongside your document that will be indexed to create and populate new shadow regions in your index. This mechanism improves the experience of indexing Documentum files using I2E via the ManifoldCF connectors, allowing you to link directly back to your document in Documentum from your I2E results.