IDMP compliance will require Market Authorization Holders to submit and maintain a broad range of data elements about medicinal products with the EMA - 70% of which currently exist in unstructured text, hidden in multiple document formats, styles, and languages. These data play a key role in the core operation of a pharmaceutical company, and are re-used for multiple purposes across the business.
The challenge for such companies is to find a quick, accurate, and affordable way to search, extract, standardize, and structure the 300–2,000 data elements required per product for IDMP compliance.
Linguamatics NLP extracts IDMP data elements
Mundipharma Research Limited implemented a pilot project using I2E, Linguamatics natural language processing-based text mining solution, to find, highlight and extract data elements for Iteration 1 from unstructured documents such as the EMA Summary of Product Characteristics (SmPC) documents.
I2E queries were developed to extract the individual data elements using standard and customized ontologies, as well as linguistic features of SmPCs. Accuracy was evaluated against a ‘gold standard’ data set that had been manually extracted by an independent expert.
Jon Sanford, Head of Regulatory Information Management and Operations at Mundipharma Research: “We were really impressed when we saw the accuracy with which I2E had been able to extract data elements from the documents”.
Productizing text-mining workflows for IDMP compliance
Mundipharma Research is now planning to productize the text-mining workflow for IDMP, as well as exploring how to expand the use of I2E across the wider enterprise, including: