Drug Development Safety and Pharmacovigilance
Modern drug safety and pharmacovigilance dates back to the thalidomide disaster. The growing cost of drug development is driving pharmaceutical companies to identify potential safety issues earlier in the process. Valuable safety data are available in public databases and internal sources, but much of this is unstructured text. Linguamatics NLP transforms this text into actionable data that can be visualized and analyzed at every stage of the drug development process.
Modern drug safety and pharmacovigilance began in the early 1960s following the thalidomide disaster. Thalidomide, a drug designed to prevent morning sickness, was released in 1959 and resulted in over 10,000 children in 46 countries being born with birth defects.
In the wake of thalidomide, the World Health Organization (WHO) set up the Programme for International Drug Monitoring (PIDM). Today, PIDM has more than 150 participating countries, with over 16 million Adverse Event Reports (ADRs) collected.
In parallel, the United States Congress passed the Kefauver-Harris Drug Amendments (1962). For the first time, these laws required drug makers to prove their drugs worked safely before the Food and Drug Administration (FDA) would approve them for sale.
These changes were the start of a wave of regulatory changes designed to ensure reliable evidence of drug safety, efficacy and chemical purity prior to market release.
While a lack of clinical efficacy is the major cause of drug attrition, a poor safety profile is also a significant factor in the failure of drugs during development. This may occur at any stage in the development process, from initial drug discovery to preclinical trials, clinical trials and post-marketing surveillance (pharmacovigilance).
The diagram below shows the timing of the main safety assessment studies conducted during the drug development process.
Typically this involved highly parallelized processes for making new compounds and testing them in high-throughput screens. From this, a certain number of hits will be obtained and these will be whittled down by further analysis into a set of leads.
This includes in vitro and in silico testing of the compounds to identify the best members of a series to take into Clinical Trials. This is also where the first stages of safety assessment are undertaken via toxicity testing in animals. If a drug shows promise in preclinical trials, a pharmaceutical company can request permission from the FDA to begin testing in humans (known as First-in-Man or FIM trials). This is called an Investigational New Drug (IND) application. In Europe, the European Medicines Agency (EMA) equivalent is an Investigational Medicinal Product Dossier (IMPD).
Phase 1 clinical trials are concerned primarily with establishing how a drug is absorbed, distributed, metabolized and excreted by the human body - a study known as pharmacokinetics (PK).
The dosage range of a new drug is determined by administering increasingly larger doses to one or more groups of subjects, who are closely monitored for harmful side effects. The goal is to learn the maximum tolerated dose that does not produce unacceptable side effects.
Phase 2 clinical trials are designed to answer the question: does drug X improve disease Y?
Subjects in a phase 2 clinical trial may benefit from their participation if they receive an active treatment. Most phase 2 clinical studies are randomized, with subjects assigned randomly (by chance and not by choice) to receive the experimental drug, a standard treatment or placebo (harmless, inactive substance). Since larger numbers of patients receive a treatment in Phase 2 clinical trials, there is a greater chance to observe and compile information on potential side effects.
Phase 3 clinical trials are conducted at multiple centers with hundreds or thousands of patients for whom the drug is intended. Testing on large patient populations allows continuous generation of data on a drug’s safety and efficacy. As in phase 2, most phase 3 clinical trials are randomized and blinded. A drug in this phase can be studied for several years.
Once the Phase 3 clinical trials are complete, a pharmaceutical company can request FDA approval to market the drug within the USA. This is called a New Drug Application (NDA). The NDA contains all the scientific data that the company has gathered during clinical trials. Within the EU, pharmaceutical companies submit a Marketing Authorization Application (MAA).
These studies are performed to Good Laboratory Practice (GLP) standards and comprise those required by local regulatory authorities or ethics committees before a drug can be given to human subjects for the first time. Regulatory toxicology also covers the studies required to support a New Drug Application (NDA).
Overseen by the FDA or EMA, post-market surveillance is designed to ensure the safety of a drug once it released onto the market. Pharmacovigilance is designed to ensure that regulators monitor any adverse events reported by the public who may be suffering from a wide range of medical conditions (far wider than those to which the drug would have been exposed during clinical trials).
There are a number of problems associated with the drug development process as it stands, but they can be distilled into three factors: cost, time and effectiveness.
For years, the pharmaceutical industry has relied on development cost estimates from the Tufts Center for the Study of Drug Development (TCSDD), the most recent of which (2015) puts the cost of bringing a drug from discovery to market launch at $2.9 billion. This includes actual out-of-pocket costs averaging $1.4 billion, opportunity costs of nearly $1.2 billion and the cost of post-market studies amounting to $312 million.
On average, it takes 12 years to bring a new drug to market. This is one reason why the process is so expensive, as capital costs are magnified by the amount of time that money is tied-up in a single project.
Almost 90% of drugs that start testing in patients don’t reach the market because they are unsafe or ineffective, and there is a pressing need to improve the understanding of safety issues during drug discovery, development and after launch. A successful drug development process demands that potential safety issues are recognized as early as possible.
At all stages of drug development, critical data is being generated and retrieved from unstructured text. Project teams need the most comprehensive view of all relevant data, and text mining plays a key role in access to actionable insights for drug safety.
Valuable information can be gained by improved analysis and understanding of the wide variety of information available to researchers and clinicians. This data can come from internal data sources such as study reports, project reviews, clinical investigator brochures and case reports, or from external sources such as:-
Linguamatics provides access to a range of ready-to-access content options (including all of the above) via our OnDemand or Connected Data Technology services. This content is linked to the relevant domain specific ontologies, and updated weekly to ensure that you always have up-to-date information. In addition, valuable information may be found on patient forums, social media and conference abstracts.
The ability to search intelligently across the hundreds of thousands of pages contained in these disparate sources is a prerequisite for efficient decision support. However, much of this data will only be available as unstructured text.
Linguamatics Natural Language Processing (NLP) platform can transform unstructured text into actionable (structured) data that can be rapidly visualized and analyzed at every stage of the drug development process.
Linguamatics I2E can query and extract drug names, dosages, adverse events, safety indicators and context such as species and tissue (among other things) from large document collections. Queries can be defined using keywords and linguistic expressions. I2E has powerful table processing which enables accurate data to be extracted from preclinical toxicity or drug safety summaries.
By plugging in ontologies, queries will automatically find synonyms or search for entire classes of items. Pre-defined smart queries can also be used; these are templates that hide complexity from the user by only exposing specific pre-defined options. In addition, queries can be combined to answer a set of questions simultaneously, for example: by providing systematic profiles of compounds.
I2E presents the structured results in a choice of formats. These include web pages with results classified by drug, dosage and adverse event. Microsoft Excel spreadsheets, XML files and network graphs are also supported and allow the user to visualize direct and indirect relationships between entities. Results can also be presented in formats suitable for export to third-party databases.
Using I2E’s unique strengths, you can provide comprehensive, precise and accurate data to end-users: capture precise relationships, find concepts in their appropriate context, normalize and extract quantitative data and data in embedded tables.
In using I2E's powerful query functionality, we can ask a variety of direct and indirect safety-related questions: what information are we looking for, and what questions need answering to ensure that a drug will be safe?
The flexibility of I2E enables you to answer these questions precisely; tailoring queries to extract exactly the information you require and then combine results into the desired format.
During preclinical trials, the critical question pharmaceutical developers seek to answer is whether the new drug is safe to be tested in humans, which is also the primary concern of regulatory agencies.
The safety assessment starts early and - as candidates advance from discovery to preclinical trials - more extensive tests have to be performed in vitro, in silico (the rapidly growing discipline of computational toxicology) and in vivo to gain a better understanding of their pharmacodynamics (PD) and pharmacokinetics (PK) behavior and establish their pharmacologic, safety and toxicity profile.
Preclinical trials are the final hurdle prior to clinical trials, and only 12% of the candidates advance to Phase 1 clinical trials. From this point, the success rate increases at each clinical phase, with 17% at Phase 1, 27% at Phase 2, 58% at Phase 3 and 82% at the registration phase. On average, drug discovery and preclinical development take three to six years and account for 30% of costs per drug. Source: DiMarsi, J.A., “Cost of Developing a New Drug Briefing,” Tufts Center for the Study of Drug Development. Nov 18, 2014
I2E enables early identification of potential safety issues. This is crucial to optimizing investment in R&D and avoiding failures later in the drug development process. Assessment and prediction of the potential for adverse side effects from a particular compound, lead series or biologic molecule is important both in drug discovery and preclinical trials, as well as later clinical trials.
However, much of the relevant safety information is locked up in textual documents, either from medical and scientific literature or within internal study reports. The challenge is therefore to mine available literature sources - both internal and external - and to find and extract relevant information in a timely manner. In addition, I2E can be used to hypothesize indirect relationships, for example by finding mechanisms linking a compound to an adverse effect through a protein or biological process.
Linguamatics I2E can query and extract dosages, drug names, tissues and safety indicators from large document collections, to answer questions such as:-
According to our customers, I2E reduces the time spent searching and reviewing safety and toxicity information by up to 70%.
Download the application note at the end of this page to learn more on Text Search and Mining for Safety/Toxicity.
I2E's powerful linguistic processing capabilities can be used to extract numeric dosage data associated with toxicities and adverse reactions from MEDLINE® abstracts. Read this application note to learn how to use I2E to identify potential safety issues for drugs at specific dosages.
The advantages of extracting potential safety and toxicity issues from existing literature can be enormously beneficial financially if done early enough in the drug development process. Access this webinar on extracting safety and toxicity knowledge with I2E and learn how Linguamatics' agile text mining platform can aid this process significantly.
Better access to the high value information in legacy safety reports remains a cherished goal for those involved in preclinical safety assessment. Locked away in these historical data are answers to questions such as: Has this particular organ toxicity been seen before, in what species and with what chemistry? Could new biomarker or imaging studies predict the toxicity earlier? What compounds could be leveraged to help build capabilities?
Learn how Merck developed an I2E workflow that extracted the key findings from safety assessment ante- and post-mortem reports, final reports and protocols.
Learn how Merck used I2E to review 32 chronic toxicology studies in non-rodents (22 studies in dogs and 10 in non-human primates) and 27 chronic toxicology studies in rats dosed with Merck compounds to determine the frequency at which additional target organ toxicities are observed in chronic toxicology studies as compared to sub-chronic studies of 3 months in duration.
Clinical trials provide the evidence on which every new drug is approved. Approximately 25% of drugs that fail in clinical trials do so for safety reasons, for example, exhibiting unacceptable toxicity levels in patients.
The regulatory landscape for clinical trials has also evolved with increased requirements for risk management plans, risk evaluation and minimization strategies. As the industry transitions from passive to active safety surveillance, there will be a greater demand for monitoring data from a wide variety of sources, much of which will only be available as unstructured text.
Many pharmaceutical and biotech companies are realizing the benefits of applying Artificial Intelligence (AI) technologies such as NLP to internal clinical safety data silos, as well as publicly available clinical safety-related data. The following three case studies provide some examples.
One top-10 pharmaceutical company has provided access to their silo of Clinical Investigator Brochures using Linguamatics Portals. This means the safety assessment teams can, with just a couple of clicks, get answers to questions such as:
Behind the scenes, the documents have been processed and indexed, ontologies applied, appropriate document regions identified, NLP queries run, key concepts such as chemicals and disease have been standardized and mutations and dosages normalized.
This workflow, and I2E’s easy-to-access Portal, means that the safety data otherwise buried in valuable internal reports can be re-used to answer critical questions for other drug development teams.
Agios is in the process of implementing an Adverse Event Reporting System (AERS). The case study shows how Agios are using data generated by Linguamatics NLP text mining to help understand the progression of Adverse Events in on-going clinical trials.
AstraZeneca set out to test the hypothesis that Real World Evidence (RWE) adverse reaction information from patients could effectively supplement information from clinical trials, giving scientists and clinicians a more accurate, well-rounded description of safety data for particular treatments. The Linguamatics I2E text-mining solution played a key role in rapidly assembling comparable data sets.
AstraZeneca's case study illustrates how they were able to use I2E to examine differences in Nausea Adverse Reactions (AR) frequencies between patient online self-reported data, and the data from FDA Drug Product Labels.
Linguamatics I2E was included in a recent paper from the FDA on the “Use of data mining at the Food and Drug Administration” (Duggirala et al, 2016, J Am Med Inform Assoc).
This FDA review covers a very broad range of text and data mining approaches, across both FDA databases (e.g. MAUDE, VAERS) and external data such as MEDLINE® clinical study data, and social media.
The FDA review describes the use of I2E “to study clinical safety based on chemical structure information contained in medical literature. Linguamatics I2E enables custom searches using natural language processing to interpret unstructured text. The ability to predict the clinical safety of a drug based on chemical structures is becoming increasingly important, especially when adequate safety data are absent or equivocal.”
Learn more about Data Mining at the FDA.
In recent years, regulatory authorities such as the FDA and EMA have placed an increased emphasis on the safety of marketed drugs, particularly the tracking and reporting of adverse events.
Pharmaceutical companies are expected to regularly screen the worldwide scientific literature for potential adverse drug reactions, at least every two weeks. The use of text mining and other tools to streamline the literature review process for pharmacovigilance is more crucial than ever in order to ensure patient safety, without overloading drug safety teams.
Eric Lewis (Safety Development Leader at GlaxoSmithKline) talked at a recent Linguamatics Text Mining Summit about the challenges of reviewing medical literature for safety signals. For example, he looked for literature for a sample of just 20 marketed products across a 300-day period. Eric found that there were on average 60 new references per day (with a total of over 11,000 documents). He found that manual review time was 1.2 to 1.6 minutes per abstract. He extrapolated this to a typical pharmaceutical company product portfolio of 200 marketed products, and showed that this volume of literature would take over 2,200 hours to review – hugely time-consuming.
Eric went on to describe how, by using NLP, it is possible to use linguistic processing to focus more specifically on potential drug-related adverse events. This is achieved by searching for the most appropriate relationships between a drug and an adverse event.
Eric presented a specific search to find the adverse events associated with the selective androgen receptor modifier, Enobosarm (an investigational drug also known as MK-2866 or Ostarine). Searching manually across literature databases, Eric pulled out 132 abstracts, but manual review (3 hours) found that only about 30% of these were relevant and actually described an association with an adverse event. However, using I2E to index and query for a precise and accurate pattern took just a few minutes, clearly demonstrating the value of I2E within a pharmacovigilance application.
Organizations increasingly require auditable methods to check whether signals indicating adverse or toxicity related events appear in clinical records. If events do occur, companies need to be able to react fast to find out if they are caused by the drug, are side effects of the original disease or are the result of external factors.
Text mining can be used both to review clinical reports and also to understand potential mechanisms of action. In particular Linguamatics NLP platform has been used to highlight different adverse event profiles at different dosages. Researchers can also search medical records for particular adverse effects, and code the effects found. The linguistic capabilities of I2E are critical in providing a distinction between new effects, a history of an effect, the lack of an effect, or the lack of a history of an effect.
Linguamatics I2E provides powerful linguistic tools to capture numerical data. The screenshot below shows structured normalized dose information for Olanzapine, extracted from recent MEDLINE® abstracts.
The ability to capture both dose details and causal relationships (e.g. does drug x cause adverse event y) in structured form means that safety assessment teams can quickly and effectively review literature reports for safety signals.
Real world data (RWD) is becoming increasingly available in the fields of pharmacovigilance and post-market surveillance. It provides top pharma companies, such as Pfizer, with a rich seam of data for real world data analytics. Using AI text mining technology, such as in Linguamatics NLP platform, real world data allows users to extract structured information from diverse data sources, including Voice of the Customer (VoC) data from patient surveys, customer complaints databases, focus groups, etc. and to regularly monitor specialized literature to search for potential adverse events not reported on drug labels.
For instance, Pfizer used Linguamatics NLP to categorize and tag call center feeds for key metadata such as caller demographics and reasons for calling allowing them to deepen the relationship for drug-disease associations by looking for information on pre-existing conditions, and relating these to the potential reported side effects versus ADRs.