Skip to main content

How can Medicinal Chemists Overcome the Text Big Data Deluge?

Roche Medicinal Chemists overcome Text Big Data

As medicinal chemists strive to fill the pipeline with the best possible novel compounds, they require efficient access to the ever-expanding mass of existing information and knowledge about compounds, targets, and diseases and how they are related. Much of this information is buried in published journal articles, patents, reports, and internal document repositories. Posing chemical compound-, target-, and disease-centered questions to extract and organize the data in order to explore these relationships is laborious, time consuming, and potentially error prone. Locating chemical structural information is especially challenging, when chemicals in the literature are described by many different names: technical, trivial, proprietary, nonproprietary, generic, or trade names.

Roche pRED decided to address this problem and equip their medicinal chemists with a chemically-aware text mining tool (Artemis) that would remove the need for manual searches and data-wrangling, and present the data in a user- and analytics-friendly environment for further exploration. Daniel Stoffler and Raul Rodriguez-Esteban, Roche, presented this work in their talk "ARTEMIS - A Text Mining Tool for Chemists" at Linguamatics Spring Text Mining Conference in 2017.

Natural Language Processing (NLP)-based Text Mining

Roche opted to use Linguamatics NLP platform to extract the relevant compound/target/disease information from a broad range of published and internal sources into a structured and analytics-friendly format. I2E is ideally suited to deal with the variability found in drug-related text and data and uses NLP, taxonomies, thesauri, and ontologies to detect and extract drug names, targets, diseases, and their relationships no matter how they are expressed.

Chemical Annotation

Roche used the I2E chemically-enabled text mining solution (using ChemAxon’s chemical annotation and name-to-structure tools) to extract the maximum amount of chemical structural information from the text sources.

User- and Analytics-friendly Interface

The easy-to-use Artemis interface Roche developed lets chemists pose pre-defined I2E compound, target, and disease related questions via simple forms. The results are enhanced by adding journal ranking and specific medicinal chemistry journals, molecular properties for filtering results, InChIkeys for linking chemicals to public databases, and clustering of chemicals by structural similarity for browsing results faster.

Result sets are then displayed in a pre-defined Spotfire template, ready for immediate browsing, filtering, and exploration.

Outcome using NLP AI Technology

The NLP text mining solution extracts, organizes, and presents information on compounds, targets, and diseases and how they are related, and frees chemists from the drudgery of trawling through the external and internal literature and other sources. The Artemis user interface (a web portal to broaden access to I2E text mining) makes searching simple, and hit filtering and browsing are easy in a pre-defined, interactive Spotfire template. The result is more incisive insights and fact-based decision making, with substantially less effort.

Download and read the full Text Mining with Roche pRED case study.

Download case study

To learn more about this the benefits of I2E for chemically-enabled text mining, and the potential productivity gains for your company, please contact us or visit our Chemistry-enabled text mining application area.

Contact us

Ready to get started?

Request a Demo

Questions? Ask our experts