Pfizer improves Patent Search 10-fold with Linguamatics I2E

Intellectual property is critical in the drug discovery process. Before initiating any new project it is important to understand the patent landscape around any particular disease area, check if there is freedom-to-operate, and assess patentability. The business case to assess commercial viability for a project must cover not just the biology, such as “is there unmet medical need” but also, “what is the IP position”.

Streamlining patent research with natural language processing (NLP) text mining

So, scientists and researchers need to be able to access the information on genes and diseases in patents. But patents can be hundreds of pages long and contain complex information constructions and interconnected facts.  Manual patent research is a time-consuming and costly process. More and more pharma companies, such as Pfizer, are looking to NLP text mining to keep up to date with their patent literature.

Pfizer researchers use Linguamatics Life Science Platform powered by I2E to find patents relating to specific diseases. The results feed a database to visualize gene targets, invention type, competitor organizations and overall patent “relevancy”. 

The combined value of NLP and Machine Learning – a concrete example

With the rising costs of de novo drug discovery, and increasing focus on rare diseases, there is continuous innovation for methods and solutions to find new uses for existing drugs.  I was interested to hear of a novel approach for this, published recently by Eric Su and Todd Sanger at Eli Lilly. In this paper, “Systematic drug repositioning through mining adverse event data in”, the authors describe the combined use of Natural Language Processing (NLP) and Machine Learning (ML), to extract potential new uses of existing drugs.

It’s quite astonishing how often in the last weeks and months I’ve been asked about the interplay between NLP, Artificial Intelligence (AI), and ML. It seems that everyone wants to understand more about the real potential (rather than the hype that is being shouted from the rooftops) that these tools will provide to impact healthcare, research, and many other areas of our lives, in the next decade.

So, let’s delve further into this concrete example of the combined value of NLP and ML. The innovative step here was to exclude trials for a specific indication, such as cancer, and then find trials with Serious Adverse Events (SAEs) classified as cancerous. The researchers then looked to see if the placebo arm had more cancerous SAEs. If the placebo arm had more cancer-related SAEs than the treatment arm, they hypothesized that the treatment has a positive anti-cancer effect.

What are the challenges facing life sciences and healthcare organisations, where text analytics can play a part?  This is one of the key questions that I ask myself and others regularly. There is so much buzz at the minute around big data, real world data, healthcare informatics, wearables; but what is really working, and what is just hype?

One of the ways we get input on this question is, of course, meeting our customers and hearing about their successes. Linguamatics hosts two user group meetings every year, and our European Spring Text Mining Conference is coming up rapidly. Held over 3 days in April, the conference provides scientists and clinicians interested in text mining to come for hands-on training workshops, round table discussions, and a day of talks from both Linguamatics staff and our customers.

This year, our customer speakers encompass a wide range of use cases, spanning the pipeline of discovery, development, and delivery of therapeutics:

Clinical Trials text mining can speed key decisions, effective site selection and trial design 

Clinical trials form the cornerstone of evidence-based medicine, and are essential to establishing the safety and efficacy of new drugs. Each new drug, before being approved by regulatory agencies, must pass through a set of gates. At the very basic level these include phase 1 for first-in-human safety; phase 2 for efficacy and biological activity against the target; and phase 3 for safety, efficacy and effectiveness of the new therapeutic.

At each of these phases, careful planning is essential for a successful study. The clinical study protocol covers objective(s), design, methodology, statistical considerations and organization of a clinical trial, and ensures the safety of the trial subjects and integrity of the data collected.

Over recent years, clinical trial designs and procedures have become more diverse and more complex. The impact of precision medicine means trials have to be more carefully planned to ensure adequate statistical power for smaller patients groups, and adaptive, umbrella, basket and n-of-1 trials are now more frequent.

The regulatory requirements and growing complexity of clinical trials translates into more numerous and more complex eligibility criteria for study enrolment, increased site visits and required procedures, longer study duration, and more rigorous data collection requirements. From: PhRMA Biopharmaceutical Industry Profile 2016

Reading some of the FDA blogs review, I was interested to read that "for the second consecutive year, [the FDA] approved more drugs to treat rare diseases than any previous year in our history." This is great news for the patients affected by these rare or orphan diseases, and there is of course potential for applications of such drugs and the knowledge around these diseases across the wider population and in broader healthcare.

Text analytics can play a part in developing better understanding around the biology of these rare diseases. There's a great example of this application of text mining from Madhusudan Natarajan at Shire Pharmaceuticals. Shire develops and provides healthcare in the areas of behavioural health, gastrointestinal conditions, rare diseases, and regenerative medicine, and Madhu has presented his research using text analytics to uncover disease severity and genotype-phenotype associations for Hunter Syndrome (also known as Mucopolysaccharidosis II).

We hosted a webinar with Madhu, and in this webinar, he illustrates some of the challenges for R&D for orphan diseases, particularly around text mining for mutation and variant patterns, which can be reported in so many different ways in the literature. 

Webinar: A systematic examination of gene-disease associations through text mining approaches