Skip to main content

Using NLP to Target Adverse Events at the FDA

Target Adverse Event Profiles for Predictive Safety in the Postmarket Setting
8th Oct 2020
Peter Schotland


Peter Schotland, Rebecca Racz, David B. Jackson, Theodoros G. Soldatos, Robert Levin, David Strauss, Keith Burkhart


We improved a previous pharmacological target adverse-event (TAE) profile model to predict adverse events (AEs) on US Food and Drug Administration (FDA) drug labels at the time of approval. The new model uses more drugs and features for learning as well as a new algorithm. Comparator drugs sharing similar target activities to a drug of interest were evaluated by aggregating AEs from the FDA Adverse Event Reporting System (FAERS), FDA drug labels, and medical literature. An ensemble machine learning model was used to evaluate FAERS case count, disproportionality scores, percent of comparator drug labels with a specific AE, and percent of comparator drugs with the reports of the event in the literature. Overall classifier performance was F1 of 0.71, area under the precision-recall curve of 0.78, and area under the receiver operating characteristic curve of 0.87. TAE analysis continues to show promise as a method to predict adverse events at the time of approval.

Many adverse events (AEs; adverse drug reactions) are identified in the postmarketing period and often undergo a costly, time-consuming analysis before a safety label change or other regulatory decision is made related to a product. The US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) MedWatch reporting has increased to over 1.8 million reports per year. Automated tools that provide mechanistic insights and signal strengthening are needed to identify rare AEs for augmented pharmacovigilance. Efforts to predict AEs have utilized a variety of data sources, including FAERS reports,- drug labels,, ,  signaling pathways, chemical features,,  gene expression, literature, the electronic health record, prescription records, and social media. Several of these algorithms have demonstrated excellent performance. For example, a machine learning algorithm utilizing multiple chemical and biological features achieved a precision of 66% and successfully predicted AEs associated with several drug withdrawals. Additionally, an ensemble method was used to identify adverse drug events on social media datasets, achieving area under the receiver operating characteristics curve values of about 80%. Knowledge gained from these data mining analytics will also provide important safety information for drug development.

Our previous pilot work created a model based on data from FAERS and drug labels. Molecular target adverse event (TAE) profiles were created by the selection of comparator drugs that closely resemble the target activity of the drug of interest. Our previous work achieved a precision of 0.67, recall of 0.81, and specificity of 0.71. In this report, we have added literature reports as another data source as well as additional features for learning. We tested an ensemble learning method to predict unlabeled AEs using data available at the time of drug approval.

Read full publication