I2E 5.4.1

Introducing I2E 5.4.1

I2E 5.4.1 introduces a more flexible way to run smart queries, a way to restrict access to subsets of an index, help with building large indexes at the same time as small indexes, updated terminologies, and various enhancements and fixes.

Blank Smart Query Slots

A great use of Smart Queries is to combine relationship extraction with document filtering. For example, you can specify the Phase and Indication for a set of a clinical trials and then use the underlying query to identify inclusion and exclusion criteria. Previously, Smart Queries required a search term in each field: you had to include a Phase (or multiple Phases). In I2E 5.4.1, it is now possible to design the smart query with a field that is allowed to be blank. For example, you may want to make it possible to leave the Phase field blank (that is, allow clinical trials with any Phase) but still search for an Indication (or multiple Indications).

The Table queries have been updated in the Resources query tree to use this feature; they now allow table headers to contain Any item.

Fig. 1 Out-of-the-box Table Extraction queries support the new ability to allow a Smart query field to match Anything

Index Restrictions

I2E 5.4.1 has introduced a general-purpose method to restrict access to specific documents in an index, either by specifying an inclusion list (i.e. a subset) or by specifying an exclusion list of document identifiers to suppress.

Fig. 2 Include and Exclude lists can be used to create index subsets or to prevent access to documents

Ontology and Configuration Updates

Ontologies that are included in I2E OnDemand indexes or are downloaded via the Linguamatics Community Site have been updated with the latest source data and refinements to improve class matching within documents. MeSH, NCI Derived, Linguamatics Diseases, ChEBI, CPC, Entrez Gene, HPO, Orphanet, MedDRA, ICD9CM, ICD10CM, ICD10PCS and RxNorm all have updated source data. The Organizations by Sector ontology has a new top-level branch: Pharmacy and Pharmacy Providers.

The MeSH source data update coincides with indexing configuration changes to support source changes in MEDLINE 2019, including new Reference regions to find which documents are cited by others.

Another configuration improvement means that Organizations by Sector terms will no longer incorrectly match Author names in MEDLINE.

Multi-Lingual NLP Improvements

I2E 5.4.1 builds on the platform’s first-class multi-lingual NLP functionality to further increase the quality of results in non-English language documents. This includes improved class matching for German compound nouns and the suppression of noise at the synonym level per language. It is also now possible to see the languages included in the index, either via the Index Properties or programmatically via the API.

Managing Indexing

There is a new field in Index Templates to allow you to set the maximum number of indexing tasks that will run at the same time when you turn on parallel indexing. For example, you may want to split your corpus into 30 sub-indexes, but want to restrict the number of parallel tasks to 7 at a time, so that other, smaller, indexing jobs can run at the same time.

The pre-processor that can be used to convert a plain-text document to sectioned XML has been updated to detect a broader range of section headers.

Finally, any documents that do not get indexed correctly will now show up in a new “Skipped Input” tab in the completed Indexing Task.