Identifying Domain Experts through Text-Mining Medline - Eric Su

Eric Su emphasized the importance of thought leaders and domain experts in the pharmaceutical industry to complement and plug gaps in-house knowledge. Previous manual methods of finding them relying on personal knowledge and contacts were too limited, and Eli Lilly decided instead to locate  and rank the experts by applying the Linguamatics NLP platform to text-mine Medline. 

What superficially seems to be a simple data extraction task – find and extract papers that describe diseases/drugs of interest and list and rank the authors by the number of publications – is made much more complicated by the variability in formats for personal, institutional, and drug names and diseases, so that disambiguation is huge challenge. Eric described in detail how they built Lua code to construct NLP queries of Medline via the easy-to-use I2E Pro interface.  This took advantage of Linguamatics’ various ontologies (e.g. institutions, diseases) to help overcome the disambiguation problem, and then to extract and correctly format and rank the output for use by scientists and researchers. 

This Linguamatics NLP methodology and infrastructure is potentially expandable to include other data sources beyond Medline, and can be easily deployed in multiple therapeutic areas. 

Linguamatics NLP can be used to automate domain or scientific expert identification in a disease area or in any biomedical area. And there are many scientific questions that you can build into the query – it doesn’t have to be just a disease.