With new and exciting technologies, it often happens that one particular application or use case leads the way initially… and then, when the euphoria turns into commercial reality, people start looking at other applications where the new technology can also bring value. In text mining, the same holds true. Pharma companies have now been using NLP text mining technologies for many years, in areas such as target validation, gene-disease associations, clinical trial optimization, and patent analytics, for example. As they become comfortable and, indeed, expert in these areas, attention has turned to areas where the core technology needs to be adapted or tweaked to meet a specific requirement.
For example, when looking to apply NLP to the time-consuming and costly business of discovering new, novel compounds, users hit a significant issue; trying to understand every single component part of some of the long chemical names. Not an insurmountable problem, but one that needed time, expertise and determination.
The ChiKEL project (Chemically-Informed Knowledge Extraction from Literature), supported by the EUREKA Network (http://www.eurekanetwork.org/), aimed to provide the impetus required to resolve the problem. ChiKEL enabled existing partners Linguamatics and Chemaxon to extend the existing integration of their software platforms to enable the recognition of novel chemical compounds, expressed either by words or, critically, images. The net result is that researchers will be able to conduct deeper analyses, because they are able to extract a wider and deeper range of information from documents, which, ultimately, will fuel and accelerate new drug discovery and power the pharma industry as a whole.
