Posts from April 2015

Last week’s BioIT World Expo kicked off with a great keynote from Philip Bourne (Associate Director for Data Science, National Institutes of Health) setting the scene on a theme that ran through out the conference – how can we benefit from big data analytics, or data science, for pharma R&D and delivery into healthcare. With 2 days of talks, 12 tracks covering cloud computing, NGS analytics, Pharmaceutical R&D Informatics, Data Visualization & Exploration Tools, and Data Security, plus a full day of workshops beforehand, and a busy exhibition hall, there was plenty to see, do, take in and discuss.  I attended several talks on best practise in data science, by speakers from Merck, Roche, and BMS – and I was pleased to hear speakers mention text analytics, particularly natural language processing, as a key part of the overall data science solution.


We are always amazed and impressed at the inventiveness of Linguamatics customers, in their applications of text analytics to address their information challenges.

Our annual Linguamatics Spring Users Conference showcased some examples of their innovation, with presentations on text mining used for patent analytics, chemical pharmacokinetics and pharmacodynamics data extraction, creating value from legacy safety reports, and integrating open source tools for advanced entity recognition.

We had a record-breaking number of attendees this year, representing over 20 organizations, ranging from our most experienced I2E users to text mining novices.
 

A record-breaking number of attendees enjoyed the opportunity to experience Cambridge and share insights with one another at this year's conference.

Patent analytics featured in two of the presentations, demonstrating the value of NLP in extracting critical information from obtuse and lengthy patent documents.

Julia Heinrich (Senior Patent Analyst, Biotechnology at Bristol-Myers Squibb, Princeton, New Jersey) asked the question: “Can the infoglut of biotech patent publications be quickly reviewed to enable timely business decisions?”.


Submitting a drug approval package to the FDA, whether for an NDA, BLA or ANDA, is a costly process.

The final amalgamation of different reports and documents into the overview document set can involve a huge amount of manual checking and cross-checking, from the subsidiary documents to the master.

It is crucial to get the review process right.

Any errors, and the FDA can send back the whole package, delaying the application. But the manual checking involved in the review process is tedious, slow, and error-prone.
 

A delayed application can also be costly.

How much are we talking about?

While not every drug is a blockbuster, these numbers are indicative of what you could be losing: the top 20 drugs in the United States accounted for $319.9 billion in sales in 2011; so a newly launched blockbuster could make around $2Bn in the first year launched – that’s $6M per day.

If errors in the quality review hold up an NDA for even just a week this could generate significant costs.

So – how can text analytics improve this quality assurance process?

Linguamatics has worked with some of our top 20 pharma customers to develop an automated process to improve quality control of regulatory document submission.

The process cross-checks MedDRA coding, references to tables, decimal place errors, and discrepancies between the summary document and source documents. This requires the use of advanced processing to extract information from tables in PDF documents as well as natural language processing to analyze the free text.