Application of automated natural language processing (NLP) workflow to enable a federated search of external biomedical content in drug discovery development

McEntire R, Szalkowski D, Butler J, Kuo MS, Chang M, Chang M, Freeman D, McQuay S, Patel J, McGlashen M, Cornell WD, Xu JJ.

Drug Discov Today. 2016 May; 21(5):826-35

PMID: 26979546

http://www.sciencedirect.com/science/article/pii/S1359644616300757

Abstract

External content sources such as MEDLINE®, National Institutes of Health (NIH) grants and conference websites provide access to the latest breaking biomedical information, which can inform pharmaceutical and biotechnology company pipeline decisions.

The value of the sites for industry, however, is limited by the use of the public internet, the limited synonyms, the rarity of batch searching capability and the disconnected nature of the sites.

Fortunately, many sites now offer their content for download and we have developed an automated internal workflow that uses text mining and tailored ontologies for programmatic search and knowledge extraction. We believe such an efficient and secure approach provides a competitive advantage to companies needing access to the latest information for a range of use cases and complements manually curated commercial sources.