There's been a lot of excitement around recent studies on immuno-oncology in which the body’s immune defences are corralled to fight cancer. Experts consider it the most exciting advance since the development of chemotherapy half a century ago.

Many of our customers are involved in anti-cancer approaches based on modulation of immunosuppressive properties of immune cells; and are using I2E to help generate insight around immuno-oncology and the tumor microenvironment (TME). Cancers can be viewed as complex ‘rogue’ organs, with malignant cells surrounded by blood vessels and a variety of other cells, including immune cells, fibroblasts, lymphocytes, and more. The tumor cells and the surrounding non-transformed cells interact constantly, and developing a better understanding of these TME interactions is a valuable approach in immuno-oncology drug development.

Knowledge in this field is growing very rapidly which makes it very difficult for scientists to capture it manually, both because of the volume of publications, but also the variety and complexity of information.

Challenges include ensuring a thorough search to capture relationships between genes/proteins and their effect or correlation on or with a variety of cellular actors. These cellular actors included many of the immune system cells currently under investigation for immunotherapeutic approaches to oncology.

I2E provides the capability to find and extract these interactions from textual data, including capture of negation where needed. I2E allows efficient and effective searches over millions of text documents, and can harmonize the output to enable computational post-processing and visualization of these complex data.


Tom Schmidt, Managing Editor, IDG Strategic Marketing Services interviewed Dr. Jane Reed, Head of Life Science Strategy, Linguamatics, on how pharma and biotech companies use text analytics to reduce the time and cost of their clinical trials and get drugs to market faster.

The common statistic is that over 80% of data lies in unstructured text. Often, the way that people write things, whether in patents, healthcare records, or scientific literature, it's not easy to pull out the nuggets that are going to help with those decisions, whether around the real world value of your product, regulatory compliance, or many other different areas. Text analytics has to play a part in addressing many problems because of the volume of data that is unstructured.

Watch the full interview below.


With the ongoing focus on healthcare outcomes-based payment models, pharmaceutical companies face powerful pressures to demonstrate not just safety and efficacy of a new treatment, but also both cost effectiveness and comparative effectiveness. This means they must show that their agent is not only better than placebo but also better than other agents. Comparative effectiveness of any particular treatment can be established by interventional clinical trials, observational real-world evidence studies, or systematic review and meta-analysis. Access to on-going and past clinical trials via trial registries provides much valuable information, but effective search can be hindered by issues such as search vocabularies and problems of searching the unstructured text.

Merck recently published a paper, demonstrating the success of a text-mining pipeline that overcomes these issues and extracts key information for comparative effectiveness research from clinical trial registries. Researchers in the Informatics IT group wanted to search clinical trial registries (NIH ClinicalTrials.gov, WHO International Clinical Trials Registry Platform (ICTRP), and Citeline Trialtrove) and synthesize comparative effectiveness data for a set of Merck drugs, in order to:


I spent an informative and enjoyable day at the Findacure Scientific Conference last week, on Rare Disease Day, 29th February 2016. One of the aims of the charity Findacure is to find new cures for rare diseases by repurposing of existing medicines, and Dr Rick Thompson gave an excellent introduction to the problem, with an example of cost of illness modelling for Congenital Hyperinsulinism (CHI). This brought up some of the key challenges for disease modelling and understanding of rare diseases that were repeated again and again across the day:

  • Limited background information e.g. epidemiology and clinical burden of the disease
  • Paucity of knowledge of natural history of disease, and understanding of the disease heterogeneity
  • Little or no data on economic burden of the disease

The talks were varied, ranging from the cost effectiveness of potential drug repurposing programmes, the promise of big data and the ‘omics revolution in identifying suitable candidates for rare diseases, to how collaborations between academia, patient bodies, the pharma industry and rare disease charities are progressing discoveries and developments in certain areas.


Much of the work of researchers builds on previous discoveries, possibly best expressed by Isaac Newton: "If I have seen further, it is by standing on the shoulders of giants". In fact, one definition of research is: "a systematic investigation of sources in order to establish facts and reach new conclusions". To some extent, then, text analytics is a key tool for research, to enable users to see further and to reach new conclusions, by gaining a comprehensive and systematic view of what has already been found.

Clinical research is surely an area where re-use of data is of great scientific value. Using existing data to see further can bring benefits in speeding up drug development, and thereby enhancing patient care. Linguamatics have many customers using I2E to extract existing information from past and on-going clinical trials.

One example of data re-use is shown by Eric Su, Principal Research Scientist at Eli Lilly and Company. Eric uses I2E to extract summary statistics on clinical endpoints for therapeutic areas such as oncology and diabetes, to feed into clinical trial design and competitive environment analysis.