There seems to be a certain buzz around rare and orphan diseases. Following the Findacure meeting I attended last month, there are two recent events I’d like to mention.

Firstly, I attended the first Cambridge Rare Disease Network summit, held in Cambridge UK, with a fantastic line-up of speakers from a range of professions to discuss current and new initiatives in rare disease. The debates ranged from the use of next generation sequencing for diagnostics, to crowd-sourcing both for science and funding, to drug repurposing, to the views of payers and the issues around pricing.

For me it was also a reminder, particularly from some of the parent speakers, of the impact that rare disease has on individuals and families. All too often we are so busy with the day-to-day of research and business that it's easy to lose sight of the ideal end-goal - treatments for all adults, all children, affected by these disparate and often devastating diseases.

Secondly, this month the FDA released new draft guidance “to navigate the difficult road to approval of drugs for rare diseases”.


I’m thrilled to see that Linguamatics I2E 4.3 is named as a KMWorld 2015 Trend-Setting Product.  Linguamatics I2E has a proven track record in delivering best of breed text mining capabilities across a broad range of application areas. Its agile nature allows tuning of query strategies to deliver the precision and recall needed for specific tasks, but at an enterprise scale.

According to customers, I2E gets to actionable results at least 10 times faster than a traditional keyword search. In many cases, I2E will produce successful results for projects that would otherwise be impossible or intractable.

Actionable information extracted using I2E can be presented in a variety of ways depending on your needs. NLP-based text mining provides the capability to look through unstructured text (typically in large sets of documents, from scientific reports, patents, or electronic healthcare records, pathology and radiology reports); and use sophisticated queries to automatically identify and extract out structured data (concepts and associations) to enable the system to interpret the meaning of the text. 

 


Linguamatics I2E natural language processing technology to automatically extract clinical attributes from pathology reports across eight hospital groups in Stratified Medicine Programme.

LONDON and CAMBRIDGE, UK, September 1st, 2015 – Cancer Research UK and Linguamatics announced today they will work on a joint project to apply Linguamatics’ natural language processing (NLP) text analytics platform, I2E, to automatically extract clinical attributes from cancer pathology reports and improve annotation of clinical samples relating to Cancer Research UK’s Stratified Medicine Programme (SMP). This project will allow the analysis of detailed patient characteristics alongside large volumes of genetic data, enabling more effective research into the causes and personalised treatment of cancer.

Dr Ian Walker, Director of Clinical Research and Strategic Partnerships at Cancer Research UK, said: “Pathology reports tell us a range of important information about a patient’s cancer, but the way this data is recorded can vary widely, which makes it harder to spot trends or other significant information that could have a bearing on treatment decisions or prognosis. This collaboration should help translate these reports into more meaningful data, which should help our researchers better understand the disease and accelerate advances in personalised medicine.”


I attended the Findacure “Drug Repurposing for Rare Diseases” event last week; a small symposium with an interesting mix of attendees – academics, pharma, patient groups, vendors.  The main focus was networking, inspired by a series of short talks (see Findacure blog for more information).

  • 6,000 to 8,000 identified rare diseases (prevalence less than 5 in 10,000)
  • Only approximately 200 have licenced treatments – large unmet need
  • 1 in 17 people (6-8% of population) will develop a rare disease
  • 30-40 million people in US, 30-40 million in Europe
  • 75% of all rare diseases affect children

With the changing landscape from “blockbuster” to more personalised “nichebuster” therapeutics, and the incentives provided by regulatory bodies (such as FDA’s Orphan Drug Designation), rare diseases are an increasing focus of many of Linguamatics’ pharma and biotech customers.

So, I hear you ask – how does text analytics fit into rare diseases drug discovery?  It’s simple: Information associated with rare diseases is essential at many stages of drug discovery and development.  And, this essential information is often buried in unstructured text - in different data sources, with differing formats, vocabs, etc.


Giving a presentation on NLP text mining a couple of weeks ago*, I was asked whether our text analytics solution can help one of the extra Vs of big data – Veracity. This is a much-discussed topic at the moment, and after Volume Velocity and Variety, seems to be the most important of the additional Vs (see Seth Grimes blog for a good discussion on some more “wanna-Vs”).

Veracity, when it comes to data and decision making, can mean many things:

  • Does my conclusion make sense?
  • Is this particular data point accurate?
  • Do I trust this publication?
  • Is this assertion evidenced reliably?

 - but the bottom line is, if I am making an important business decision, how can I be sure it’s made using the best possible data?

This is obviously a tricky question and has been thrown into public view over recent years with studies trying to replicate critical experimental data and finding reproducibility frighteningly low (e.g. PLoS . So, how can a text analytics tools shed any light in such a minefield?

Scientists in the United States spend $28 billion each year on basic biomedical research that cannot be repeated successfully. That is the conclusion of a study published on 9 June 2015 in PLoS Biology that attempts to quantify the causes, and costs, of irreproducibility.