Skip to main content

Linguamatics Text Mining Summit 2018

Portsmouth, New Hampshire, New Castle, United States

The Linguamatics Text Mining Summit 2018 will take place on October 15 - 17, 2018 at the beautiful Wentworth by the Sea in New Castle, New Hampshire, USA. Join the conversation on Twitter using #TMS18.

Speakers include Eli Lilly, Mercy, Bristol-Myers Squibb, Atrius, Novo Nordisk, Sanofi, Regeneron Pharmaceuticals, Secure Exchange Solutions, GSKMore speakers will be announced soon! Interested in speaking? Please email

Why attend?

This event is the ideal opportunity for experienced, new and potential I2E users to:

  • Network and exchange ideas with peers and text mining experts

  • Understand the challenges other pharmaceutical and healthcare professionals are facing, and explore solutions to these challenges

  • Gain hands-on experience of NLP text-mining and where it can fit into your organization, through workshops and training

Want to find out more?

Linguamatics Text Mining Summit 2018 Brochure

Learn more about the Linguamatics Text Mining Summit 2018.

You can also read and download the final TMS agenda.

Presentations and Abstracts

Eric Su, Principal Research Scientist at Eli Lilly

Presentation: Mining to Support Clinical Trial Design

Abstract: ( has grown to include over 280,000 trials by 2018, from over 200 countries globally.  Much knowledge can be learned from, to assist in trial planning, protocol design, and more. However, manual review of all relevant trials is not feasible with such a large database.  I2E has the capability of extracting data and text with high precision and recall.  I will present two examples of data extraction from to support clinical trial design at Lilly.

Yun Yun Yang, Senior Patent Analyst at Bristol-Myers Squibb

Presentation: A Multiple Step Approach for Finding Clinical Trials For GPCR Target Class

Abstract: G-protein coupled receptors (GPCRs) represent a large class of important therapeutic targets.  The first objective of this work was to provide therapeutic area trend analysis, based on clinical trial information, on GPCR target classes.  The second objective was to identify clinical trial information associated with specific GPCR targets.  We used the value-added information that Cortellis provides around drugs and target-based actions, and applied i2e disease ontologies to achieve our objectives.  This presentation provides a framework on how to find clinical trials associated with a target class or specific targets. These targets can then be categorized into therapeutic areas of interest and visualized for trend analysis.

Thierry Breyette, Associate Director, Information Analytics at Novo Nordisk

Presentation: Using NLP at Novo Nordisk to Generate Actionable Insights from Real World Data

Abstract: The density and variability of the information landscape is making it increasingly difficult to identify meaningful trends in data.  Traditional data sources such as clinical trial data and publication data are one piece of an increasingly complex information puzzle.  As data capture and publishing platforms explode, newer and highly varied data sources are available for analysis, including internally generated data, social data, patient data, clinician data, market data, hospital data, etc. Much of these data are in unstructured, textual format, making it difficult to extract and analyse using traditional search methods. Building a forward-looking analytics framework to tackle these new data challenges requires both extensible and flexible tools, and creative thinking.

At Novo Nordisk we are using advanced tools and technologies such as natural language processing (NLP) to gain real value from sources including call centre feeds, information from medical liaisons and health care providers. These enable us to identify macro and micro healthcare market trends in the US, detect patterns in clinical trial protocol deviations, and discern patterns in patient sentiment, compliance, routines, behaviors, and overall treatment satisfaction and outcomes. The talk will focus on our approach to these projects, the outcome and impact.

Cheng Zhu, Senior Scientist, Translational Bioinformatics and Research Informatics at Sanofi

Presentation: Integrated text mining approaches for drug target new indication search

Abstract: There is strong interest within biopharma to expand the therapeutic base of drugs on the pipelines. To address this, one way is to identify novel drug target to disease associations that can be considered as potential new indications. The traditional approach on this can be described as an ad hoc process that relies on expert knowledge and experiment observations, which are usually limited and time consuming. Text mining provides a powerful approach that enables us to quickly and systematically discover the hidden linkages between target and new indications. In this presentation, we will introduce several text mining strategies and use cases on using I2E to interconnect drug target and diseases.  Our approaches attempt to capture and integrate all the available information of drug target from literature, including genetic causal or risk mutations to diseases, and the underlying causal pathways, cell types and clinical phenotypes that shared by target and diseases.  The approaches allow us to quickly identify new indication opportunities from target-disease pairs that have various commonalities based on broad scientific, medical and strategic values. We have applied the approaches in several disease areas, such as respiratory and rare diseases as proof of principle.

Peng Zhang, Sr Staff Scientist, Target Information Group, Regeneron Pharmaceuticals, Inc.

Presentation: Knowledge Management and Data Integration – Building A Resource for GPCR Drug Discovery

Abstract: The Target Information Group (TIG) at Regeneron uses a wide variety of public and proprietary information sources to help Regeneron researchers address their scientific questions in target and disease biology. G-protein coupled receptors (GPCRs) family has traditionally been the biggest class of targets for drug discovery in pharmaceutical industry, with the efforts mostly focused on small molecule class of drugs. As a biologics-focused company, Regeneron has decided to reassess the landscape of this drug discovery space and to identify potential opportunities for developing novel or better drugs by leveraging our unique VelociSuiteÒ technologies. A web-based dashboard was built to provide a focused information portal for all GPCRs from human and mouse. The information integrated together includes GPCR hierarchical family classifications, known natural ligand and signal transduction mechanism. This dashboard also contains all drugs that are known to target GPCRs and provides ways to quickly drill down for more information according to their molecular type, development status, mechanism of action, etc. Linguamatics I2E was used to mine large scale clinical trials information related to these known drugs and the result provided unique insights on the potential reasons behind development status change. This resource has been widely used within the company and provided a starting point for various drug discovery projects.

Craig Monsen, Chief Medical Information Officer at Atrius Health

Presentation: Operationalizing NLP to support value-based care at Atrius Health

Abstract: Healthcare providers are facing an urgent need to streamline operations while improving quality of care and patient satisfaction. With a wealth of technology hype around AI, Natural Language Processing (NLP) and big data, how are providers to know what investments to make and how to bring these technologies into production use? During this talk, we bring together leading healthcare organization, Atrius Health, and NLP experts, Linguamatics, to explore the practical uses of NLP in healthcare and give real-life examples of how Atrius Health has implemented processes to improve clinical documentation, identify at-risk patients and streamline their ACO reporting.

Tom McGraw, Senior Vice President of Product Development for Secure Exchange Solutions (SES)

Presentation: SPOT for Clinical Review

Abstract: Secure Exchange Solutions (SES) launched its SPOT for Clinical Review product in June of this year.  SPOT for Clinical Review leverages the natural language capabilities in Linguamatics I2E product to read medical records and other freeform text and normalized data to ascertain if payer’s prior authorization, claims adjudication, or other requirements have been met.  SPOT leverages SES’ position as a leader in digital communication of clinical healthcare information and is designed to speed up medical review, lower its costs, and improve consistency.  Designed for the healthcare payer market, the product has also received interest from the healthcare provider, workers compensation carrier, and life insurance markets.  An overview of SPOT for Clinical Review and application of payers’ rules in several areas will be presented.

Chaya Duraiswami, Associate Fellow, Compliance Solutions Manager, GSK

Presentation: Use of Text Analytics to Enable Data-Driven Risk Management

Abstract: Biopharm Product Development and Supply (BPDS) within GSK utilizes a data-driven approach to risk management, by consolidating internal data feeds from Deviations, Corrective and preventative actions (CAPAs), Risks, and Response to questions (RTQs) to form a Data Lake. The Data Lake also receives external data feeds from FDA Warning Letters (483s), Biological License Application (BLA) Review Reports, White Papers and industry benchmark repositories that add to the broader context of relevancy. This establishes a broad knowledge-base for analysing risk rating and frequency, understanding risk relation to industry practices and applying thresholds for risk management (accept, transfer, mitigate, accept). The Data Lake of unstructured or semi-structured data is structured using Linguamatics which then enables the extraction of intelligence, (concept and sentiments) embedded in the data. The value proposition is further maximized with simple to understand visualizations (that are easy to drill down), sustainable (up-dates contemporaneously) and scalable reporting of risks, its analysis, and recommendations to act. This presentation will demonstrate how Linguamatics, which uses Natural Language Processing and Machine Learning algorithms, can be used for addressing emerging concerns.

Kerry Bommarito, Director Data Science at Mercy

Presentation: Mercy’s Experience: Natural Language Processing.

Stuart Murray, Research Fellow/Director of Informatics, Agios Pharmaceuticals, Agios

Using I2E to Accelerate Tool Compound Discovery for Chemical Genetic Screens

Genetic changes such as mutations in cancers create potential druggable Achilles Heels. At Agios we use genomics, proteomics and metabolomics date to build an understanding of how these genetic changes perturb metabolic biology in a tumour. This insight is used to design screening strategies to identify potential new drug targets. One such strategy is a Chemical Genetics Screen. At Agios a Chemical Genetics Screen explores the function of metabolic proteins and pathways in cancer cells by the using of chemical libraries of small molecules (tool compounds). In this presentation, we will highlight results from using I2E to build libraries of tool compounds that have known activities and defined characteristics. We then used these tool compounds in medium throughput chemical genetic screen across a broad cancer cell line panel. The outcome of the screen revealed important genes or pathways involved in cell growth and proliferation. We will highlight the benefits of using this kind of approach to speed the identification of potential new genes of interest and the development of tools to validate new anti-cancer targets.


What does the event include?

  • Customer presentations featuring best practice, case studies and insights on practical approaches to text mining and knowledge discovery
  • Presentations covering what's new in I2E, looking ahead at developments in the pipeline and future directions for text mining and knowledge discovery
  • Roundtable Discussions covering important topics and challenges in the field of text mining and knowledge discovery
  • Opportunities to network with peers and with Linguamatics experts
  • Hands-on workshops, giving new and experienced users the opportunity to explore the full capabilities of I2E, and discuss best practice in consultation with Linguamatics experts
  • Healthcare track to discuss best practice and use of NLP in healthcare
  • User certification training and attendance certification
  • Evening social events
  • Partner presentations and exhibits
  • Meals and refreshments provided during the conference are included with the registration fee

Speakers and Bios

Eric Su

Principal Research Scientist at Eli Lilly

Eric Su received a B.S. in biochemistry from Peking University in 1984 and a Ph.D. in molecular biology from UC Berkeley in 1991.  After his post-doctoral research at Dana-Farber Cancer Institute/Harvard University and working at Vysis Inc. (Abbott), he joined Eli Lilly and Company in 1997.  At Lilly, Eric discovered and filed patent applications on ~100 novel therapeutic protein candidates (through data mining and cloning; Biology, 1997-2004), led the Bioinformatics team (Biology, 2004-2006), initiated text mining in the Data Mining and Advanced Analytics groups (Statistics, 2007-2017), and is now working in the newly formed Text Mining and NLP group in the Advanced Analytics and Data Science organization (IT, 2017-present).

Kerry Bommarito

Director Data Science at Mercy

Full bio coming soon

Yun Yun Yang

Senior Patent Analyst at Bristol-Myers Squibb

Yun Yun Yang is a Senior Patent Analyst in the Scientific Information & Patent Analysis Group, Intellectual Property/Law Department, Bristol-Myers Squibb company (BMS). Yun Yun received her PhD degree in Organic Chemistry from Beijing Normal University. She then conducted her postdoctoral training in the School of Medicine at the University of Pennsylvania. Before joining BMS, Yun Yun worked as chemistry editor at the Institute for Scientific Information (ISI, currently Thompson Reuters) and as senior patent information scientist at DuPont and DuPont Pharmaceutical (acquired by BMS in 2001). In addition to her operational function as a senior patent analyst where she is responsible for providing legally significant patent information including freedom-to-operate, patentability, and due diligence, Yun Yun takes lead in exploring Natural Language Processing (NLP) tool for patent landscape analysis which led to three publications in the journal of World Patent Information. Yun Yun was received the 2014 PIUG (Patent Information Users Group) Stu Kaback Business Impact Award, which recognized her achievement in text mining for patent landscape analysis for kinase technology platform.

Craig Monsen

Medical Director of Analytics at Atrius

Craig Monsen is a board-certified internist and clinical informaticist. He serves as the Medical Director of Analytics and Reporting at Atrius Health, is a fellow at the University of Washington's Department of Biomedical Informatics and Medical Education, and has co-founded two health IT startups. His current work includes operationalizing advanced analytics to enable the transformation to value-based care as well as building new products that help clinicians make sense of the deluge of patient data they face today.

Prior to medical training at the Brigham and Women's Hospital, he received Highest Honors in engineering and computer science at Harvard and completed his MD at Johns Hopkins, where he served as the President of the Medical Student Senate.

He writes and has been an invited speaker on topics of consumer engagement, applied predictive analytics, and health care interoperability and continues to cultivate his interest in leveraging emerging technologies to support the quadruple aim in healthcare.

Thierry Breyette

Associate Director, Information Analytics at Novo Nordisk

Thierry Breyette is the Senior Manager of Information Analytics at Novo Nordisk Inc., located in Princeton, NJ. Currently Thierry’s primary data focuses are real world, market, and clinical trial data. Thierry is influenced by design thinking principles and enjoys exploring novel solutions to information problems. In his work, Thierry utilizes a combination of tools and techniques, including natural language processing, information visualization for discovery and presentation, and conducting descriptive and predictive analyses. Some of Thierry’s past projects include working on social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, and detecting patterns in clinical trial protocol deviations.

Cheng Zhu

Senior Scientist, Translational Bioinformatics and Research Informatics at Sanofi

Dr. Cheng Zhu is currently a senior translational bioinformatics scientist in Sanofi Genzyme, where he provides integrated informatics analysis and computational solutions for multiple translational programs. Cheng received his Ph.D. degree in computer science from University of Cincinnati in 2013, and started his research on rare disease informatics in Cincinnati Children's Hospital Medical Center since 2009. Cheng joined Sanofi Genzyme in 2013, where he has led or supported multiple drug discovery projects, especially on target discovery and validation in disease areas such as Rare, Neurological, Immunological and Inflammatory diseases.

Peng Zhang

Sr Staff Scientist, Target Information Group, Regnereron Pharmaceuticals, Inc.

Dr. Peng Zhang is currently a Senior Staff Scientist at Regeneron Pharmaceuticals. His work focuses on using informatics approaches to support early target identification and target validation. Previously he worked at Lundbeck Research USA and obtained his Ph.D. in pharmacology from Rutgers University.

Tom McGraw

SVP, Product Development and Government Healthcare, Secure Exchange Solutions

Tom McGraw is Senior Vice President of Product Development for Secure Exchange Solutions (SES).  SES is enabling simple, secure, and scalable connections between healthcare community members and advancing the benefit of electronic clinical information for payers and providers.  Tom served from 2012 to 2017 as CEO of Noridian Healthcare Solutions, a company that administers over $50 Billion in annual services for the Medicare program.  Tom previously served in operations and business development leadership roles for Amerigroup, Optum, MAXIMUS, and the Virginia Department of Medical Assistance Services.  He received an MA in Economics from Old Dominion University and a BS from the US Naval Academy.

Chaya Duraiswami

Associate Fellow, Compliance Solutions Manager, GSK

Chaya Duraiswami is am a Pharmaceutical scientist with 20 years of experience, with a passion for discovering solutions, technology & medicines to help patients. At GSK, Chaya has contributed to the development of over 10 clinical candidates across several therapeutic areas and target classes. She has also had numerous leadership experiences in Drug Discovery and Development, and across a diverse set of functional areas, including: ADME-Tox modeling, QSAR and predictive modeling, designing time-dependent and allosteric inhibitors in lead optimization projects, providing strategic review of emerging computational technologies, and leading external innovation projects.

More recently, in 2014, Chaya moved to Biopharm R&D to lead the ‘Non-Product Quality’ Compliance and Risk Management programs, and have since expanded my role to applying text-mining, and promoting modeling approaches in Biopharm CMC development. She am PMP & ADP certified.

Stuart Murray

Research Fellow/Director of Informatics

Full bio coming soon

6 reasons to attend

  1. Gain first-hand knowledge and experience on how structured and unstructured content can be mined to uncover valuable information
  2. Gain hands-on experience of NLP text mining through workshops and training
  3. Understand the challenges other pharmaceutical and healthcare professionals are facing and explore solutions to these challenges
  4. Gain a better understanding of NLP text mining and where it can fit into your organization
  5. Network and exchange ideas with peers and text mining experts
  6. Join the I2E Certificate Program and earn yourself an I2E Query User Certificate

I2E Certificate Program

At this year's event, you can be one of the first people to qualify for the I2E Query User Certificate. This exciting opportunity will allow you to validate, extend and improve your I2E skills.

The Level 1 Query User Certificate will be open to those who attend the “Introduction to I2E” hands-on workshops that will take place at the TMS this October, as well as more established users, who have already attended the “Introduction to I2E” training. 

You will receive a certificate of attendance for our training sessions to show the value you’re bringing to your organization. It’s free to join in as part of your registration.

Learn more

The I2E Query User Certificate will focus on using and editing basic queries and Resource queries to:

  • Create simple queries with different constraints, morphological variants, preferred terms and alternative lists
  • Use classes to improve recall and precision of queries with linguistic classes, ontologies, and pattern ontologies
  • Work with results by using limits, output formats and displays
  • Use Resource queries to answer common questions

Venue, Travel and Accommodation


The Linguamatics Text Mining Summit will be held at the Wentworth by the Sea Hotel in New Castle, New Hampshire. Overlooking the Atlantic Ocean from the island of New Castle, Wentworth by the Sea, a Marriott Hotel & Spa, welcomes guests to one of last grand Portsmouth hotels. Our AAA Four-Diamond retreat features the three original Victorian towers constructed in the 1870s, along with 161 stately guest rooms and suites that blend the hotel’s historic elegance with luxurious, modern amenities.


Getting to Wentworth by the Sea

Directions to Wentworth by the Sea can be found on the hotel's website.

Transportation from Boston-Logan Airport

C & J Bus Lines serves Logan Airport. The Portsmouth terminal is at Pease International Tradeport — 7 miles from the Wentworth by the Sea. Visitors will need to take a taxi to Wentworth. The bus schedule can be viewed on their website.

Tickets can be booked online.


Linguamatics has negotiated a special rate of $249.00/night for the conference. To book a room, please visit the link below. The conference rate is valid through September 21, 2018

Book your group rate for Linguamatics Text Mining Summit 2018


Ready to get started?

Request a Demo

Questions? Ask our experts