I attended a Big Data in Pharma conference recently, and very much liked a quote from Sir Muir Gray, cited by one of the speakers: "In the nineteenth century health was transformed by clear, clean water. In the twenty-first century, health will be transformed by clean clear knowledge."
This was part of a series of discussions and round tables on how we, within the Pharma industry, can best use big data, both current and legacy data, to inform decisions for the discovery, development and delivery of new healthcare therapeutics. Data integration, breaking down the data silos to create data assets, data interoperability, use of ontologies and NLP - these were all themes presented; with the aim of enabling researchers and scientists to have a clean, clear view of all the appropriate knowledge for actionable decisions across the drug development pipeline.
A new publication describes how text analytics can provide one of the tools for that data interoperablity ecosystem, to create a clear, clean view. McEntire et al. describe a system that combines Pipeline Pilot workflow tools, Linguamatics I2E NLP linguistics and semantics, and visualization dashboards, to integrate information from key public domain sources, such as MEDLINE, OMIM, ClinicalTrials.gov, NIH grants, patents, news feeds, as well as internal content sources.
The use of this combination of tools enabled the system to find and extract entities of interest (e.g using ontologies for genes, diseases, drugs, organizations, and more); and for these entities to be normalized to one concept, enabling data integration, clustering and analysis of results, and visualization for that "clear" view of the knowledge.
The authors detail how this automated NLP workflow has been applied to specific use cases across drug discovery and development at Merck. These included:
- Alerts for novel target information around specific gene lists or therapeutic areas (such as diabetes)
- Competitive intelligence for lead optimization
- Conference abstract mining, for example for up-to-date immuno-oncology targets
- Mining of clinical trial records for site selection, protocol optimization, or endpoint biomarker idenfitifaction.
This paper demonstrates that text analytics provides a critical element in the toolbox needed to enable data interoperablity, and to move us towards that goal of "clean clear knowledge", across the drug discovery and development process.