Accelerating drug approvals with better regulatory QA

April 7 2015

Submitting a drug approval package to the FDA, whether for an NDA, BLA or ANDA, is a costly process.

The final amalgamation of different reports and documents into the overview document set can involve a huge amount of manual checking and cross-checking, from the subsidiary documents to the master.

It is crucial to get the review process right.

Any errors, and the FDA can send back the whole package, delaying the application. But the manual checking involved in the review process is tedious, slow, and error-prone.

A delayed application can also be costly.

How much are we talking about?

While not every drug is a blockbuster, these numbers are indicative of what you could be losing: the top 20 drugs in the United States accounted for $319.9 billion in sales in 2011; so a newly launched blockbuster could make around $2Bn in the first year launched – that’s $6M per day.

If errors in the quality review hold up an NDA for even just a week this could generate significant costs.

So – how can text analytics improve this quality assurance process?

Linguamatics has worked with some of our top 20 pharma customers to develop an automated process to improve quality control of regulatory document submission.

The process cross-checks MedDRA coding, references to tables, decimal place errors, and discrepancies between the summary document and source documents. This requires the use of advanced processing to extract information from tables in PDF documents as well as natural language processing to analyze the free text.

The errors that can be identified include:

  • Incorrect formatting: doubled period, incorrect number of decimal places, addition of percent sign
  • Incorrect calculation: number of patients divided by total number does not agree with percent term
  • Incorrect threshold: presence of row does not agree with table title
  • Text-Table inconsistency: numbers in the table do not agree with numbers in the accompanying text


Sample table and text highlighting, to show inconsistencies between data. The highlight colour makes it easy for the reviewer to rapidly assess where there are errors and what type of errors, and can then correct these appropriately.

Using advanced text mining processing, we are able to identify inconsistencies within FDA submission documents, across tables and textual parts of the reports. Overall, we found that using automated text analysis for quality assurance of submission documents can save countless hours or weeks of tedious manual checking, and potentially prevent a re-submission request, with potential savings of millions of dollars.


This work was presented by Jim Dixon, Linguamatics, at the Pharmaceutical Users Software Exchange Computational Science Symposium (PHUSE) in March 2015.