Over the past few months there have been several publications which have used Linguamatics I2E to extract key information to provide value in a variety of different projects. We are constantly amazed by the inventiveness of our users, applying text analytics across the bench to bedside continuum; and these different publications are no exceptions. Using the natural language processing power of I2E, researchers are able to answer their questions rapidly and extract the results they need, with high precision and good recall; compared to more standard keyword search, which returns a document set that they then need to read.
Let’s start with Hornbeck et al., “PhosphoSitePlus, 2014: mutations, PTMs and recalibrations”. PhosphoSitePlus is an online systems biology resource for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation. It’s provided by Cell Signaling Technology who have been users of I2E for several years. In the paper, they describe the value from integrating data on protein modifications from high-throughput mass spectrometry studies, with high-quality data from manual curation of published low-throughput (LTP) scientific literature.
The authors say: “The use of I2E, a powerful natural language processing software application, to identify articles and highlight information for manual curation, has significantly increased the efficiency and throughput of our LTP curation efforts, and made recuration of selected information a realistic and less time-consuming option.” The CST scientists rescore and recurate PTM assignments, reviewing for coherence and reliability – and use of these data can “provide actionable insights into the interplay between disease mutations and cell signalling mechanisms.”
A paper by a group from Roche, Zhang et al., “Pathway reporter genes define molecular phenotypes of human cells” described a new approach to understanding the effect of diseases or drugs on biological systems, by looking for “molecular phenotypes”, or fingerprints, patterns of differential gene expression triggered by a change to the cell. Here, text analytics played a small role in the project, which was (along with other tools) to compile a panel of over 900 human pathway reporter genes – representing 154 human signalling and metabolic networks. These were then used to gain understanding of cardiomyocyte development (relevant to diabetic cardiomyopathy) and assessment of common toxicity mechanisms (relevant to the mechanistic basis of adverse drug events).
The last one I wanted to highlight moves away from the realms of genes and cells, into analysis of co-prescription trends and drug-drug interactions (Sutherland et al., “Co-prescription trends in a large cohort of subjects predict substantial drug-drug interactions”). Better understanding of drug-drug interactions is of increasing importance for good healthcare delivery, as more and more patients are routinely taking multiple medications, particularly in the elderly – and the huge number of potential combinations prohibit testing for safety for all these combinations in clinical trials. In this study, the authors used prescription data from NHANES surveys to find what drugs or drug classes were most routinely prescribed together; and then used I2E to search MEDLINE for a set of 133 co-prescribed drugs to assess the availability of clinical knowledge about potential drug-drug interactions. The authors found that over 30% of older adults take 5 or more drugs – but these combinations were pretty much unique. Of the co-prescribed pairs, a large percentage were not mentioned together in any MEDLINE record – demonstrating a need for further study. The authors conclude that these data show “that personalized medicine is indeed the norm, as patients taking several medications are experiencing unique pharmacotherapy” – and yet there is little published research on either efficacy or safety of these combinations.
What do these three studies have in common? The use of text analytics, not as the only tool or necessarily even the major tool, but as part of an integrated analysis of data, to answer focused and specific questions, whether those questions relate to protein modification, molecular patterns of genes in pathways or drug interactions and potential adverse events. And I wonder, where will Linguamatics I2E be used next?