All publications

Text Mining for Clinical Support

2019 107(4): 603–605

Hartmann J, Van Keuren L.


Background: In 2013, the Dahlgren Memorial Library (DML) at the Georgetown University Medical Center began using text mining software to enable its clinical informationists to quickly retrieve specific, relevant information from MEDLINE abstracts while on patient rounds.

Long-term Risk of Colorectal Cancer and Related Death After Adenoma Removal in a Large, Community-based Population

2019 Oct Gastroenterology In Press, Journal Pre-proof

Lee J, Jensen C, Levin T, Doubeni C, Zauber A, Chubak J, Kamineni A, Schottinger J, Ghai N, Udaltsova N, Zhao W, Fireman B, Quesenberry C, Orav J, Sugg Skinner C, Halm E, Corley D


Ontology Mapping for Semantically Enabled Applications

2019 Oct, vol. 24, Issue 10, Pages 2068-2075.

Harrow I, Balakrishnan R, Jimenez-Ruiz E, Jupp S, Lomax J, Reed J, Romacker M, Senger C, Splendiani A, Wilson J, Woollard P.


Accurate Identification of Colonoscopy Quality and Polyp Findings Using Natural Language Processing

2019 Jan;53(1):e25-e30.

Lee JK, Jensen CD, Levin TR, Zauber AG, Doubeni CA, Zhao WK, Corley DA.



The aim of this study was to test the ability of a commercially available natural language processing (NLP) tool to accurately extract examination quality-related and large polyp information from colonoscopy reports with varying report formats.

Rx Data News: Impact of Advanced Data Technologies on Pharma R&D

Originally published in Rx Data News

Author: Jane Reed

Published: 19th June 2019

Jane Z. Reed, Ph.D, is the Linguamatics’ head of life science strategy and responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science market.

Automatically identifying social isolation from clinical narratives for patients with prostate cancer

(2019) 19:43 

Vivienne J Zhu, Leslie A Lenert, Brian E Bunnell, Jihad S Obeid, Melanie Jefferson and Chanita Hughes Halbert.

A small molecule inhibitor of mutant IDH2 rescues cardiomyopathy in a D-2-hydroxyglutaric aciduria type II mouse module

Wang F, Travins J, Lin Z, Si Y, Chen Y, Powe J, Murray S, Zhu D, Artin E, Gross S, Santiago S, Steadman M, Kernytsky A, Straley K, Lu C, Pop A, Struys EA, Jansen EE, Salomons GS, David MD, Quivoron C, Penard-Lacronique V, Regan KS, Liu W, Dang L, Yang H, Silverman L, Agresta S, Dorsch M, Biller S, Yen K, Cang Y, Su SM, Jin S.

J Inherit Metab Dis. 2016 Jul; 39(6): 807–820

PMID: 27469509

The promises of quantitative systems pharmacology modelling for drug development

Knight-Schrijver VR, Chelliah V, Cucurull-Sanchez L, Le Novère N.

Comput Struct Biotechnol J. 2016 Sep;14:363-370

PMID: 27761201


Recent growth in annual new therapeutic entity (NTE) approvals by the U.S. Food and Drug Administration (FDA) suggests a positive trend in current research and development (R&D) output.

The role of chronic toxicology studies in revealing new toxicities

Galijatovic-Idrizbegovic A, Miller JE, Cornell WD, Butler JA, Wollenberg GK, Sistare FD, DeGeorge JJ.

Regul Toxicol Pharmacol. 2016 Oct; pii: S0273-2300(16)30299-9

PMID: 27769827


Chronic (>3 months) preclinical toxicology studies are conducted to support the safe conduct of clinical trials exceeding 3 months in duration.

Reflection of successful anticancer drug development processes in the literature

Heinemann F, Huber T, Meisel C, Bundschus M, Leser U.

Drug Discov Today. 2016 Nov 16

PMID: 27443674

Use of data mining at the Food and Drug Administration

Duggirala HJ, Tonning JM, Smith E, Bright RA, Baker JD, Ball R, Bell C, Bright-Ponte SJ, Botsis T, Bouri K, Boyer M, Burkhart K, Condrey GS, Chen JJ, Chirtel S, Filice RW, Francis H, Jiang H, Levine J, Martin D, Oladipo T, O'Neill R, Palmer LA, Paredes A, Rochester G, Sholtes D, Szarfman A, Wong HL, Xu Z, Kass-Hout T.

J Am Med Inform Assoc. 2016 Mar;23(2):428-34


Making sense of big data in health research: Towards an EU action plan

Auffray C, Balling R, Barroso I, Bencze L, Benson M, Bergeron J, Bernal-Delgado E, Blomberg N, Bock C, Conesa A, Del Signore S, Delogne C, Devilee P, Di Meglio A, Eijkemans M, Flicek P, Graf N, Grimm V, Guchelaar HJ, Guo YK, Gut IG, Hanbury A, Hanif S, Hilgers RD, Honrado Á, Hose DR, Houwing-Duistermaat J, Hubbard T, Janacek SH, Karanikas H, Kievits T, Kohler M, Kremer A, Lanfear J, Lengauer T, Maes E, Meert T, Müller W, Nickel D, Oledzki P, Pedersen B, Petkovic M, Pliakos K, Rattray M, I Màs JR, Schneider R, Sengstag T, Serra-Picamal X, Spek W, Vaas LA, van Batenburg O, Vandelaer M, Varnai P, Villoslada P, Vizcaíno JA, Wubbe JP, Zanetti G

Genome Med. 2016 Jun 23;8(1):71


PMC ID: PMC4919856

Application of automated natural language processing (NLP) workflow to enable a federated search of external biomedical content in drug discovery development

McEntire R, Szalkowski D, Butler J, Kuo MS, Chang M, Chang M, Freeman D, McQuay S, Patel J, McGlashen M, Cornell WD, Xu JJ.

Drug Discov Today. 2016 May; 21(5):826-35

PMID: 26979546

Developing Timely Insights on Comparative Effectiveness Research with a Text Mining Pipeline

Chang M, Chang M, Reed JZ, Milward D, Xu JJ, Cornell WD

Drug Discov Today. 2016 Mar; 21(3):473-80

PMID: 26854423


Comparative effectiveness research (CER) provides evidence for the relative effectiveness and risks of different treatment options and informs decisions made by healthcare providers, payers, and pharmaceutical companies.

Co-prescription trends in a large cohort of subjects predict substantial drug-drug interactions

Sutherland JJ, Daly TM, Liu X, Goldstein K, Johnston JA, Ryan TP

PLoS One. 2015 Mar 4; 10(3):e0118991

PMID: 25739022

Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2

Stubbs A, Kotfila C, Xu H, Uzuner Ö

J Biomed Inform. 2015 Dec; 58 Suppl:S67-77

PMID: 26210362


The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients.

Biocuration with insufficient resources and fixed timelines

Rodriguez-Esteban, R

Database (Oxford). 2015 Dec; bav113: pp1-9

PMID: 26708987


Biological curation, or biocuration, is often studied from the perspective of creating and maintaining databases that have the goal of mapping and tracking certain areas of biology.

PhosphoSitePlus, 2014: mutations, PTMs and recalibrations

Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E

Nucleic Acids Res. 2015 Jan; 43(Database issue):D512-20

PMID: 25514926

Agile text mining for the 2014 i2b2/ UTHealth Cardiac risk factors challenge

Cormack J, Nath C, Milward D, Raja K, Jonnalagadda SR

J Biomed Inform. 2015 Dec; 58 Suppl:S120-7

PMID: 26209007

Pathway reporter genes define molecular phenotypes of human cells

Zhang JD, Küng E, Boess F, Certa U, Ebeling M

BMC Genomics. 2015 Apr 24; 16:342

PMID: 25903797


Background: The phenotype of a living cell is determined by its pattern of active signaling networks, giving rise to a "molecular phenotype" associated with differential gene expression.

Using natural language processing and machine learning to identify gout flares from electronic clinical notes

Zheng C, Rashid N, Wu YL, Koblick R, Lin AT, Levy GD, Cheetham TC.

Arthritis Care Res (Hoboken). 2014 Nov; 66(11):1740-8

PMID: 24664671

Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery

Tari L,  Patel J, Küntzer J, Li Y, Peng Z, Wang Y, Aguiar L, Cai J

Int J Data Mining Bioinformatics. 2014 Sep; 10(4):357-373


Precise Medication Extraction using Agile Text Mining

Shivade C,  Cormack J, Milward  D

Proc 5th Int Workshop Health Text Mining Information Analysis (Louhi), EACL. 2014 Apr; pp.75–79


A Novel Catechol-O-Methyltransferase Variant Associated with Human Disc Degeneration

Gruber HE, Sha W, Brouwer CR, Steuerwald N3 Hoelscher GL, Hanley EN

Int J Med Sci. 2014 May; 11(7):748-53

PMID: 24904231

A Non-Technical Journey into the World of Big Data: an Introduction

Fishleigh, J

Legal Inform Mgmt. 2014 Jun; 14(02): 149-151


View the publication here


In this article Jackie Fishleigh provides an introduction to the issue of Big Data, gives the definitions for the subject and offers an explanation of the key issues about Big Data and what involvement we have as information professionals. 

Human post-mortem synapse proteome integrity screening for proteomic studies of postsynaptic complexes

Bayés À, Collins MO, Galtrey CM, Simonnet C, Roy M, Croning MD, Gou G, van de Lagemaat LN, Milward D, Whittle IR, Smith C, Choudhary JS, Grant SG.

Mol Brain. 2014 Nov; 7:88

PMID: 25429717

Discovery of novel biomarkers and phenotypes by semantic technologies

Trugenberger CA, Wälti C, Peregrim D, Sharp ME, Bureeva S.

BMC Bioinformatics. 2013 Feb; 14:51

PMID: 23402646

Evaluating gold standard corpora against gene/ protein tagging solutions and lexical resources

Rebholz-Schuhmann D, Kafkas S, Kim JH, Li C, Jimeno Yepes A, Hoehndorf R, Backofen R, Lewin I.

J Biomed Semantics. 2013 Oct; 4(1):28

PMID: 24112383

Integration of software tools in patent analysis

Masiakowski P, Wang S

World Patent Inform. 2013 Jan; 35(2): 97-104



Modern patent information analysis requires, in addition to profound domain knowledge, sophisticated and specialized computer software tools. Integration of such resources can be a challenging task.

Automated identification of pneumonia in chest radiograph reports in critically ill patients

Liu V, Clark MP, Mendoza M, Saket R, Gardner MN, Turk BJ, Escobar GJ.

BMC Med Inform Decis Mak. 2013 Aug; 13:90

PMID: 23947340


Prior studies demonstrate the suitability of natural language processing (NLP) for identifying pneumonia in chest radiograph (CXR) reports, however, few evaluate this approach in intensive care unit (ICU) patients.

Deriving an English Biomedical Silver Standard Corpus for CLEF-ER

Lewin I, Clematide S

CLEF 2013: Evaluation Labs and Workshop: Online Working Notes. 2013 Sep


Entity recognition in parallel multi-lingual biomedical corpora: the CLEF-ER laboratory overview

Dietrich Rebholz-Schuhmann, Simon Clematide, Fabio Rinaldi, Senay Kafkas, Erik M. van Mulligen, Chinh Bui, Johannes Hellrich, Ian Lewin, David Milward, Michael Poprat, Antonio Jimeno-Yepes, Udo Hahn, Jan A. Kors

Lecture Notes in Computer Science. 2013; 8138:353-367


A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions

Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM, Lennon-Hopkins K, Sciaky D, Johnson R, Keating H, Greene N, Hernandez R, McConnell KJ, Enayetallah AE, Mattingly CJ.

Database (Oxford). 2013 Nov; 2013:bat080

PMID: 24288140

Clarifying the social media blur

Milward D, Singh G

Information Outlook. 2012 Mar; 16(2): 10-13



By using powerful filters to extract key information, library professionals can mine noisy ‘big data’ and help their organizations understand and influence stakeholders.

PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse

Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M.

Nucleic Acids Res. 2012 Jan; 40(Database issue):D261-70

PMID: 22135298

Text mining for the biocuration workflow

Hirschman L, Burns GA, Krallinger M, Arighi C, Cohen KB, Valencia A, Wu CH, Chatr-Aryamontri A, Dowell KG, Huala E, Lourenço A, Nash R, Veuthey AL,Wiegers T, Winter AG.

Database (Oxford). 2012 Apr; 2012:bas020

PMID: 22513129

Dynamic changes in the microRNA expression profile reveal multiple regulatory mechanisms in the spinal nerve ligation model of neuropathic pain

von Schack D, Agostino MJ, Murray BS, Li Y, Reddy PS, Chen J, Choe SE, Strassle BW, Li C, Bates B, Zhang L, Hu H, Kotnis S, Bingham B, Liu W, Whiteside GT, Samad TA, Kennedy JD, Ajit SK.

PLoS One. 2011 Mar; 6(3):e17670

PMID: 21423802

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Rebholz-Schuhmann D, Jimeno Yepes A, Li C, Kafkas S, Lewin I, Kang N, Corbett P, Milward D, Buyko E, Beisswanger E, Hornbostel K, Kouznetsov A, Witte R, Laurila JB, Baker CJ, Kuo CJ, Clematide S, Rinaldi F, Farkas R, Móra G, Hara K, Furlong LI, Rautschka M, Neves ML, Pascual-Montano A, Wei Q, Collier N, Chowdhury MF, Lavelli A, Berlanga R, Morante R, Van Asch V, Daelemans W, Marina JL, van Mulligen E, Kors J, Hahn U.

J Biomed Semantics. 2011 Oct; 2 Suppl 5:S11

PMID: 22166494

Enhancing patent landscape analysis with visualization output

Yang YY, Akers L, Barcelon Yang C, Klose T, Pavlek S

World Patent Inform. 2010 Sep; 32(3): 203-220



Abstract: A patent landscape analysis can be defined as a state-of-the-art patent search that provides graphic representations of information from search results.

CALBC silver standard corpus

Rebholz-Schuhmann D, Jimeno Yepes AJ, Van Mulligen EM, Kang N, Kors J, Milward D, Corbett P, Buyko E, Beisswanger E, Hahn U.

J Bioinform Comput Biol. 2010 Feb; 8(1):163-79

PMID: 20183881

An in silico analysis of microRNAs: mining the miRNAome

Murray BS, Choe SE, Woods M, Ryan TE, Liu W.

Mol Biosyst. 2010 Oct; 6(10):1853-62

PMID: 20539892


Systematic analysis of literature- and experimentally-derived datasets using text mining with ontological enrichment and network modeling revealed global trends in the microRNA (miRNA) interactome.

Transcriptional pathway signatures predict MEK addiction and response to selumetinib (AZD6244)

Dry JR, Pavey S, Pratilas CA, Harbron C, Runswick S, Hodgson D, Chresta C, McCormack R, Byrne N, Cockerill M, Graham A, Beran G, Cassidy A, Haggerty C, Brown H, Ellison G, Dering J, Taylor BS, Stark M, Bonazzi V, Ravishankar S, Packer L, Xing F, Solit DB, Finn RS, Rosen N, Hayward NK, French T, Smith PD.

Cancer Res. 2010 Mar 15; 70(6):2264-73

PMID: 20215513

Identifying and classifying biomedical perturbations in text

Rodriguez-Esteban R, Roberts PM, Crawford ME.

Nucleic Acids Res. 2009 Feb; 37(3):771-7

PMID: 19074486

The CALBC Silver Standard Corpus - Harmonizing multiple semantic annotations in a large biomedical corpus

Rebholz-Schuhmann D, Jimeno Yepes AJ, Van Mulligen EM, Kang N, Kors J, Milward D, Corbett P, Hahn U.

Proc 3rd Int Symp Languages Biology Medicine. 2009 Nov; pp64-72


Mining protein-protein interactions from published literature using Linguamatics I2E

Bandy J, Milward D, McQuay S.

Methods Mol Biol. 2009; 563:3-13

PMID: 19597777


Natural language processing (NLP) technology can be used to rapidly extract protein-protein interactions from large collections of published literature. In this chapter we will work through a case study using MEDLINE biomedical abstracts (1) to find how a specific set of 50 genes interact with each other.

Information needs and the role of text mining in drug development

Roberts PM, Hayes WS.

Pac Symp Biocomput. 2008:pp592-603

PMID: 18229718


Drug development generates information needs from groups throughout a company. Knowing where to look for high-quality information is essential for minimizing costs and remaining competitive.

Mapping similarities in mTOR pathway perturbations in mouse lupus nephritis models and human lupus nephritis

Reddy PS, Legault HM, Sypek JP, Collins MJ, Goad E, Goldman SJ, Liu W, Murray S, Dorner AJ, O'Toole M.

Arthritis Res Ther. 2008; 10(6):R127

PMID: 18980674

Text Data Mining using Interactive Information Extraction

Milward D, Milligan P

BioLINK SIG Text Mining Workshop, ISMB/ECCB 2007


View the publication here

Flexible Test Mining Strategies for Drug Discoveries

Milward, D., Blaschke, C., Neefs, J.-M., Ott, M.-C., Verbeeck, R., and Stubbs, A.

Proc 2nd Int Symp Semantic Mining BioMedicine. 2006; pp101-104


Ontology-based interactive extraction from scientific abstracts

Milward D, Bjäreland M, Hayes W, Maxwell M, Oberg L, Tilford N, Thomas J, Hale R, Knight S, Barnes J.

Comp Funct Genomics. 2005; 6(1-2):67-71

PMID: 18629299


Over recent years, there has been a growing interest in extracting information automatically or semi-automatically from the scientific literature.

Ontology-based Interactive Information Extraction from Scientific Abstracts

Milward D, Bjäreland M, Hayes W, Maxwell M, Oberg L, Tilford N, Thomas J, Hale R, Knight S, Barnes J.

BioLINK SIG Text Mining Workshop, ISMB/ECCB 2004


Text Mining for Drug Discovery

Fickett J, Hayes W.

European Pharmaceutical Contractor, Autumn 2004


Automatic extraction of protein interactions from scientific abstracts

Thomas J, Milward D, Ouzounis C, Pulman S, Carroll M.

Pac Symp Biocomput. 2000:pp541-52

PMID: 10902201

From Information Retrieval to Information Extraction

Milward D and Thomas J

Proc ACL-2000 Workshop Recent Advances Natural Language Processing Information Retrieval. 2000 Oct; pp85-97



This paper describes a system which enables users to create on-the-fly queries which involve not just keywords but also sortal constraints and linguistic constraints.