Skip to main content

Automated identification of potential drug safety events

Leading pharma company reveals insights and better understands the AE landscape across the patient population

Learn how Agios uses text mining for automated identification of potential drug safety events

About the company   How Agios uses text mining

Agios is a biopharmaceutical company that develops drugs in the areas of cancer and rare genetic diseases. Tibsovo (ivosidenib; AG-120) is their first wholly-owned, first-in-class, approved oncology precision medicine, for acute promyelocytic leukemia (APL). Agios has multiple investigational therapies that are in clinical and/or preclinical development stages. As for any pharmaceutical company, monitoring safety during clinical trials is critical. To that end, Agios uses the LinguamaticsNLP platform to boost its safety monitoring processes.

Quick facts

Situation: Agios monitors all adverse events (AEs) that occur throughout its clinical trials, and wanted to improve support for clinical safety teams in the identification and understanding of serious adverse events (SAEs) in clinical trial participants. Serious events are reported using a standard form, the SAE Report Form. Manual data extraction from these SAE forms is slow and inefficient.

Solution: Agios developed a workflow to process and index the SAE forms, and use IQVIA (formerly Linguamatics) NLP platform queries to find AE mentions (from both structured and unstructured sections) and map to MedDRA. The resulting structured data were loaded into the Agios clinical safety database, providing rapid reporting both to research teams and to the clinical safety teams.

Success: The information gained from the IQVIA queries was highly valuable to the clinical safety team, and was streamlined into additional tools for advanced statistical and networks analysis, to provide further insights and support patient monitoring.


During clinical trials, investigators use SAE forms to report any potential AEs. Each form contains a large amount of valuable data, including: patient identification information; date of onset of the AE; MedDRA terms describing the event, and whether it is life threatening or not; lab tests; concomitant medications; and medical history. These forms are either scanned image files or saved in pdf format, with the data in the unstructured text. Significant manual effort is needed to extract key data elements from these forms for the clinical safety teams to assess.

Agios wanted to reveal insights and better understand the AE landscape across the patient population by efficiently mining the available information. The Agios team was interested in both analyzing specific events that are frequent or clinically important, and also characterizing groups of patients who might need a closer follow-up or require special attention.

More specifically, Agios had three goals:

  1. To build a better understanding of the patient pool for research through rapid effective access to the unstructured data from the SAE forms, transformed and loaded into a safety database.
  2. To use the safety database to assess the patient pool for known, anticipated AEs.
  3. To use the safety database to monitor the patient pool for high-risk priority events that are not anticipated.

The clinical experts needed actionable information from the safety database. Rapid identification of AEs occurring in patients or patient pools is important, and discrimination of whether the AE was anticipated or not, and serious or not, is critical. For example, the experts needed to understand whether an AE was caused by the study drug, by the disease (or background diseases), or by other prescribed medications. To make that decision, they needed to see all of the available information, such as incidence of the AE across all patients. They also wanted to see which AEs were in progress at a certain time point, and how they changed over time. These insights are extremely useful and can even prevent life-threatening events if timely and close follow-up is needed.


Agios developed a workflow to process the SAE forms, and use the IQVIA NLP platform to extract all relevant patient data (see Figure 1). These queries use the natural language processing (NLP) capabilities of the platform to understand the meaning of the free text in the forms. The platform identifies negations, synonyms for diseases and medications that appear in the medical history section, dates in different formats, and measurement units such as drug doses.

Any AE mentions (from both structured and unstructured sections) were extracted and normalized to MedDRA terms. Study drug, concomitant medications, date of onset, lab test results and other key patient attributes were also extracted from the SAE forms using the NLP platform. This allowed correlations between the events and other variables to be identified, and provided a longitudinal dataset to allow tracking of the progression or remission of events.

The resulting structured data were loaded into the Agios clinical safety database, providing rapid reporting to both research teams and clinical safety teams. The workflow enabled the queries to run [automatically as new forms arrived, using the NLP platform to efficiently extract relevant information much quicker than via manual methods, providing alerts and Excel reports to the clinical teams.


Agios was happy with the fast turnaround time of the forms and rapid access to the data via the NLP platform. A five-page SAE form could run through the OCR, indexing and querying workflow, and provide results to the clinical team within an hour, rather than the tedious manual processing that could take a day.

The Linguamatics NLP platform is well suited to extract free text from complex, unstructured documents, and process these data to create a rich database record that is immediately available for further analysis.

The tabular output can be easily loaded into many other software tools (e.g. Python Data Science, or visualization packages such as Matlab and Cytoscape), making the NLP platform ideal for integration in automatic workflows and pipelines.

One example of identification of actionable events in the AG-120 clinical trial relates to differentiation syndrome (DS). DS is a complication of first-line chemotherapy in some acute promyelocytic leukemia (APL) patients, which can be fatal if not recognized on time and treated aggressively. During an AG-120 clinical trial, three patients were diagnosed with DS based on a subset of symptoms. Using the text mining workflow developed, Agios was able to highlight and cluster MedDRA terms associated with DS across the patient pool (Figure 2). Using a variety of statistical methods, such as hierarchical clustering and network analysis, it was able to characterize which AEs are most likely to indicate patients with DS; which events appear in only some cases; and which subsets of patients might be more at risk from DS than others. This enabled the clinical team to effectively monitor the patient cohort and prevent potentially fatal AEs.

The clinical safety team was not familiar with this type of network analysis before, but found the insights it provided highly valuable, and decided to implement this analysis in other projects.


Agios was interested in questions such as: “Given an AE during a clinical trial, is this associated with the study drug that the patient is taking, related to the disease the patient has, or to other diseases/drugs from the patient’s history?” Obviously, there are different implications for each of these options, and with the Linguamatics NLP platform Agios was able to reach the right answers rapidly, ensuring better patient safety.