Skip to main content

Linguamatics Text Mining Summit 2016

297 Shore Road, Massachusetts, Chatham, United States
Text Mining Summit 2016


This year, the Text Mining Summit will take place on October 17 - 19, 2016 at the renowned and award-winning Chatham Bars Inn, located in historic Chatham MA, located close to the Atlantic Ocean and Nantucket Sound on Cape Cod.

See the Agenda

See the Agenda, Speaker Bios and Presentation Abstracts


What does the conference include?

  • Customer presentations featuring best practice, case studies and insights on practical approaches to text mining and knowledge discovery
  • Presentations covering what's new in I2E, looking ahead at developments in the pipeline and future directions for text mining and knowledge discovery
  • Roundtable Discussions covering important topics and challenges in the field of text mining and knowledge discovery
  • Opportunities to network with peers and with Linguamatics experts
  • Hands-on workshops, giving new and experienced users the opportunity to explore the full capabilities of I2E, and discuss best practice in consultation with Linguamatics experts
  • Healthcare track to discuss best practice and use of NLP in healthcare
  • Evening social events
  • Partner presentations and exhibits
  • Meals and refreshments provided during the conference are included with the registration fee

Reasons to attend

  1. Gain first-hand knowledge and experience on how structured and unstructured content can be mined to uncover valuable information
  2. Gain hands-on experience of NLP text mining through workshops and training
  3. Understand the challenges other pharmaceutical and healthcare professionals are facing and explore
  4. solutions to these challenges
  5. Gain a better understanding of NLP text mining and where it can fit into your organization
  6. Network and exchange ideas with peers and text mining experts


Monday, October 17

11:00am–12:00pm Registration in the Monomoy Foyer

12:00pm–1:00pm Lunch at the STARS restaurant

1:00pm–5:00pm Training workshops: Session 1

  • 1A Introduction to I2E: Tony Wu, Eldridge Room
  • 1B Introduction to I2E—Healthcare focus: Erin Tavano, Alden Room
  • Linguamatics I2E Query Hackathon: David Milward, Monomoy Meeting House

6:15pm onward Evening social event at The Beach House

Tuesday, October 18

8:30am–9:00am Registration in the Monomoy Foyer

9:00am start Main day presentations in the Monomoy Meeting House

12:25pm–1:35pm Group photo, then lunch at the STARS restaurant

1:35pm–2:35pm Roundtable discussions

6:15pm onward Evening social event at The Beach House

Wednesday, October 19

9:00am - 11:50am Presentations  in the Monomoy Meeting House

12:00pm -1:00pm Lunch at the STARS restaurant

1:00pm–5:00pm Training workshops: Session 2

  • 2A I2E hints and tips—ask the expert: Jim Dixon, Eldridge Room
  • 2C Exploiting new I2E features: David Milward, Alden Room
  • 2D Developing applications for I2E: Paul Milligan, Monomoy Meeting House

5:00pm End of Conference


Thierry Breyette

Novo Nordisk

Thierry is the Senior Manager of Information Analytics at Novo Nordisk Inc., located in Princeton, NJ. Currently, Thierry’s primary data focuses are real world, market, and clinical trial data.  Thierry is influenced by design thinking principles and enjoys exploring novel solutions to information problems. In his work, Thierry utilizes a combination of tools and techniques, including natural language processing, information visualization for discovery and presentation, and conducting descriptive and predictive analyses.

Some of Thierry’s past projects include working on social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations.

When Thierry is not playing around with data, he enjoys spending time with his wife and two daughters, admiring vintage cars, collecting transit maps, and sipping the occasional whisky. 

Presenting on generating actionable insights from real world data.

Eric Su


Eric Su is a Principal Research Scientist at Eli Lilly and Company. He has research experience in molecular biology, bioinformatics, data mining and text mining. Eric mines various texts including scientific literatures, surveys, and call center scripts. He received a Ph.D. from UC Berkeley, and did post-doctoral research at DFCI/Harvard University.

Yoni Dvorkis

Atrius Health

Yoni Dvorkis, MPH, CHDA is a Data Consultant with ten years of experience in health care, data analytics, information technology, and health informatics. He is a key member of the Department of Population Health at Atrius and is certified as a Health Data Analyst by the American Health Information Management Association. Yoni received his Master's in Public Health from the Harvard School of Public Health and his Bachelor of Arts in Mathematics with a minor in Economics from Tufts University.

Rick Lewis


Eric (Rick) Lewis MD is a Medical Director in the GlaxoSmithKline department of Global Clinical Safety and Pharmacovigilance.  He has worked in the pharmaceutical industry for over 30 years and has development experience in clinical research, biometrics and data management, clinical pharmacology and drug safety.

Nina Mian


Nina Mian (MSc, MBA), Head of Biomedical Informatics, AstraZeneca

Nina leads the Biomedical Informatics group, part of the Advanced Analytics Centre at AstraZeneca. This global team pioneers the use of novel visualization methods and advanced analytical tools to improve drug development decision-making. Nina’s research interests include initiatives centred around creating patient-centred insights. Nina holds degrees in Biochemistry, Informatics, and an MBA from Manchester Business School, UK.

Connect on LinkedIn

Walt Niemczura

Drexel University College of Medicine

Walt Niemczura is the Director of Application Development for Drexel University Department of Information & Resource Technology (IRT).  In his role, Walt oversees application development and several major third party products including Microsoft SharePoint, Vanderbilt’s REDCap (Research Electronic Data Capture), and Linguamatics I2E Natural Language Processor.  Walt’s team will also be responsible for programs that employ Allscripts Unity, the application interface to Drexel’s EMR system.  Although assigned primarily to the college of medicine, Walt and his team provides support for I2E and REDCap across the entire University.

Walt previously served on committees that governed the selection and use of Drexel’s NLP solution (I2E) and secured data repository (REDCap), and currently serves on the college of medicine’s data warehousing committee.

Walt is a graduate of Drexel University and has over thirty years’ experience in application development, including real time systems and large database applications, and information technology management. 

Stuart Murray

Agios Pharmaceuticals

Stuart Murray is an associate director at Agios Pharmaceuticals.  He plays a key role in developing and employing knowledge-driven, integrated data analytics to support ongoing research and clinical programs in the fields of cancer metabolism, metabolic immunology and rare genetic diseases.  Prior to Agios, Stuart spent 14 years in large Pharma in Oncology and Systems Biology groups.  Stuart received a Ph.D. in biochemistry and genetics from the University of Newcastle-upon-Tyne and did postdoctoral research at Albert Einstein College of Medicine, NY.

Tracy Edinger

OHSU Knight Cancer Institute

Tracy Edinger, ND, MS, is an NLP Data Scientist at Oregon Health & Science University’s Knight Cancer Institute, where she is part of a team developing data pipelines to support oncology research. Following her clinical training, she completed a postdoctoral research fellowship and MS in clinical informatics at OHSU, with a research focus on retrieving patient cohorts from clinical text.

David Milward


David Milward is chief technology officer (CTO) at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years experience of product development, consultancy and research in natural language processing (NLP). After receiving a PhD from the University of Cambridge, he was a researcher and lecturer at the University of Edinburgh. He has published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Jane Reed


Jane Reed is the head of life science strategy. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 15 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech - with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics.

John Brimacombe


John Brimacombe is a serial entrepreneur and experienced investor. After graduating in Law and Computer Science from Trinity College, Cambridge, he founded Jobstream Group plc, which provides specialist ERP software to the international financial services sector and was acquired by Microgen Plc (MCGN.L). Brimacombe subsequently co-founded pioneering mobile entertainment start-up nGame Ltd., which was acquired by Hands-On Mobile Inc. He served as President/COO of HandsOn Mobile for over 2 years, leading the company through 7 major M&A transactions and massive global expansion. Brimacombe currently chairs the enterprise natural language search tools provider, Linguamatics, is a Partner at Sussex Place Ventures, the venture-capital arm of the London Business School, is a seed-investor in multiple US and UK start-ups and has extensive experience as a non-executive director from start-up to Public markets.

Simon Beaulah


Simon Beaulah is Senior Director, Healthcare and is responsible for Linguamatics’  healthcare products and solutions, including applications in the areas of clinical risk  models, population health, and medical research. Previously, Simon was Marketing Director, Translational Medicine at IDBS/InforSense, where he was responsible for the company’s  market  analysis,  product  marketing,  and Go To Market strategy in healthcare analytics and translational medicine. Prior to IDBS, he was Director  of  Product  Management  at  BioWisdom,  where he was responsible for delivery of customer projects  using  the  company’s  ontology  products.  He  also  worked  as  a  senior  product  manager  at  LION Bioscience and Synomics, and as a software developer at the UK’s Biotechnology and Biological Sciences  Research  Council.  Simon  has  degrees  from Aston University and Cranfield Institute of Technology, UK.

Guy Singh


Guy Singh is senior manager, product and strategic alliances at Linguamatics. He has a joint role that spans both managing the I2E product and the partner ecosystem around it. In his product role, he is responsible for product management, marketing and strategy guidance of I2E, an award winning agile, high performance enterprise text mining software. His strategic alliances role holds responsibility for recruiting and managing partners across technology, content and services for text mining. Guy joined Linguamatics from information intelligence organization I2 (now part of IBM), where he was responsible for the launch of their first ever search based product. He has held posts in R&D, Engineering, Product Marketing and Management at Vodafone, Baltimore Technologies, Oracle and IXI. With over 20 years of experience in the IT industry working in a wide range of environments, his achievements include building a new multi-million dollar search product line from scratch and managing product businesses with turnovers in excess of $40 million.

Gabriel Escobar

Kaiser Permanente

Gabriel J. Escobar, MD, is a research scientist at the Kaiser Permanente Northern California Division of Research; director of the Division of Research Systems Research Initiative (a research program focusing on adult hospital processes and outcomes); and Regional Director for Hospital Operations Research for Kaiser Permanente Northern California, in which capacity he works to improve Kaiser Permanente's internal reporting and quality measurement. Dr. Escobar received his medical degree from Yale University School of Medicine; completed his pediatrics residency at University of California, San Francisco; and was a Robert Wood Johnson Clinical Scholar at Stanford University School of Medicine.

Dr. Escobar's research now focuses on the processes and outcomes in the care of hospitalized adults. His interests include risk adjustment, predictive modeling, severity-of-illness scoring, the use of comprehensive inpatient and outpatient electronic medical records for health services research, and the use of real-time decision support tools that are embedded in the electronic record. Between 1991 and 2001, Dr. Escobar developed a research program in neonatology for Kaiser Permanente Northern California. In 2001, he began development of his current research program, and in 2009 he turned over the neonatology research program to Dr. Michael Kuzniewicz at the Division of Research so that he could focus his work on adult hospital research. Dr. Escobar also practices at the Kaiser Permanente Medical centers in Walnut Creek and Antioch, where he works in the neonatal intensive care nursery and as a hospital-based pediatrician.

Glenn Abastillas

National Cancer Institute at the National Institutes of Health

Glenn is a computational linguist with a background in medical NLP development and research. Currently, he works with the National Cancer Institute at the National Institutes of Health to bolster oncology research by extracting biomarker related features from big unstructured textual data for machine learning and other types of analysis. His personal linguistic research investigates the phenomenon and recognition of code-switching in unstructured texts containing two or more language systems. He also speaks more than 8 languages including English, Cebuano, Spanish, German, Portuguese, French, Arabic, Norwegian and Sign Language.

Marina Matatova

National Cancer Institute at the National Institutes of Health

Marina is an informatics program manager and information technology specialist focusing on surveillance informatics initiatives for the Surveillance Research Program with the National Cancer Institute at the National Institutes of Health. She has spent the past decade working on health informatics and data analytics initiatives focusing on cancer research. She enjoys uncovering fundamental user needs and creating informatics solutions that drive change.

Presentations: Tuesday, October 18 & Wednesday, October 19

Welcome address

John will welcome attendees with an update on company progress and industry trends. He will introduce and set the context for the forthcoming major I2E 5.0 release and also talk about the critical importance of the I2E user community and initiatives to better support users.

Using Natural Language Processing for Mining the Electronic Medical Record

By Walt Niemczura, Drexel University

An overview of the process employed to extract information from Drexel’s EMR system for processing via natural language processing, the de-identification process, and delivery to a secured environment.  Presentation will include examples that the highlight success of using Linguamatics I2E to reduce labor hours and improve results for research and operational success.

“You make me sick” – how text mining can help

By Nina Mian, AstraZeneca

Clinicians rely on drug product labels, derived from clinical trial data, as a primary source of Adverse Reaction (AR) to inform their prescribing. Nausea is a common  and often debilitating AR associated with many medicines. Yet, under-reporting of a subjective AR such as Nausea may give rise to a potential disparity in clinical results and a real world occurrence.

Through AstraZeneca’s collaboration with PatientsLikeMe (PLM) [1], this research aims to examine differences in nausea AR frequencies between patient on-line self-reported and drug product label data extracted using i2e. The results of this analysis will help to determine if PLM data could provide an supplementary data source for AR frequencies, in addition to the drug product labels.

This is an update to the work presented at the Linguamatics spring conference 2016 by James Loudon-Griffiths

1. AstraZeneca. 2015. AstraZeneca And PatientsLikeMe Announce Global Research Collaboration. [Press release]. [Accessed 13 Apr. 2016]. Available from:

Applications of I2E to Clinical Safety and Pharmacovigilance

By Eric Lewis, GSK

The Clinical Safety and Pharmacovigilance departments of major pharmaceutical companies are information intensive operations. Their workflow is also highly regulated and subject to auditing by Regulatory Authorities. A major pharma company can be responsible for the pharmacovigilance of  hundreds of products; products which are in development, marketed and generic. The case load (CIOMS reports) for a major pharmaceutical company is on the order of 100,000s of adverse event reports sent to regulatory agencies every year. In addition to these individual reports the Sponsor company is also responsible for reviewing the medical literature for evidence of new “safety signals”. This must be done according to ICH regulations every week for marketed products and also (though less frequently) for products in development.

A sample of 20 marketed products for a major pharmaceutical company were tracked and the number of literature references received during a defined period were ascertained. Over a 218 day period, 13125 new references (some duplicates) were retrieved from two databases (Embase & Searchlight). This equates to approximately 60 references per day. Assuming this sample is representative, a Big Pharma company with 200 marketed molecules could receive approximately 220,000 “new” references in a given year.  So the question arises, is there a “new signal”. 

Another and perhaps more important factor is the need for development teams to answer questions in “real-time”. This is not unlike the need of medical teams during rounds having a desire to answer questions pertinent to patient management.

Experience with I2E has demonstrated that a text mining tool can effectively sift through the mountains of free text information for which pharmaceutical companies are legally responsible. The advantages of I2E though extend well beyond the routine and required signal detection exercise. The greater value of a tool like I2E is in its efficiency and the ability to find answers to questions and thereby create knowledge. Relational database tools can be employed to further index and curate this knowledge for future easy access.

Generating Actionable Insights from Real World Data

By Thierry Breyette, NovoNordisk

The density and variability of the information landscape is making it increasingly difficult to identify meaningful trends in data.  Traditional data sources such as clinical trial data and publication data are one piece of an increasingly complex information puzzle.

As data capture and publishing platforms explode, newer and highly varied data sources are available for analysis, including internally generated data, social data, patient data, clinician data, market data, hospital data, etc. Building a forward looking analytics framework to tackle these new data challenges requires both extensible and flexible tools, and creative thinking.   

This talk with cover some of the real world data projects we have worked on: social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations, gleaning clinical insights based on discussions between US based medical liaisons and health care providers, and patient and caregiver ethnographic data to discern patterns in patient sentiment, compliance, routines, behaviors, and overall treatment satisfaction and outcomes. The discussion will focus on our approach to each of these projects, the outcome and impact.

Natural Language Processing Validation of High Risk Patient Characteristics using Linguamatics I2E

By Yoni Dvorkis, Atrius Health

Under the guidance of our new CEO, Atrius Health is committed to returning joy to the practice of medicine for clinicians and staff. Providers currently feel overburdened with documentation and often work long hours after their time in clinic in order to document in patients’ charts.

Structured data elements are useful to the organization for reporting purposes, however often times documenting structured data can be time consuming and less fulfilling than writing free text progress notes that are richer in detail and convey more about the patient’s history course of treatment. To fulfill this goal, there is a need to investigate tools that can mine free text data to yield valuable insights that can be aggregated for large populations.

To that end, Atrius Health has partnered with Linguamatics to pilot their I2E software that uses Natural Language Processing techniques to extract valuable insights from free text to create a structure that can then be reported for all patients.

We tested this software using three high risk cohorts: patients with COPD, patients with CHF, and patients who need assistance with Activities of Daily Living (ADLs). We then compared the results from NLP to our internally validated algorithms.

Specifically we report:

COPD: 84% agreement between NLP vs. internal algorithm that relies on primary diagnoses captured during encounters (Kappa coefficient = 0.66)

CHF: 88% agreement between NLP and chart review, identifying patients with ejection fractions less than 40%. (Kappa coefficient = 0.59)

ADL: 67% agreement identifying patients who need help with bathing, the ADL with the highest volume for our patients (Kappa coefficient = 0.04)

The results show reliable agreement rates between the two sources which we found encouraging. The ADL analysis demonstrated a smaller agreement rate when we compared to an internal assessment conducted by Case Managers, however that opened a new discussion as to whether Case Managers are accurately tracking this information over time as they tend to conduct their assessment only upon enrolling patients into Case Management programs.

We are currently investigating whether NLP can extract valuable insight on patients with pulmonary nodules from scanned radiology chest CT scans in order to document potential misdiagnosis.

Systematic Drug Repositioning Using Text Mining Tools on

By Eric Su, Lilly

Drug repositioning (aka drug repurposing) is the process of discovering new indications (diseases) for marketed drugs. Historically, such discoveries were the results of serendipity.  However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically repurpose drugs.  A case of mining data using I2E and PolyAnalyst is presented here.  The I2E query extracts “Serious Adverse Events” (SAE) data from randomized trials where the disease synonyms are not in the “Condition” but in the SAE region.  Through a statistical algorism, a PolyAnalyst workflow ranks the drugs where the drug arm has less SAE (disease) than the control arm.  Hypotheses could then be generated for the new use of these drugs.  One outcome of the presented I2E-PolyAnalyst workflow is the hypothesis that vitamin K1 might help prevent cancer.

Automated identification of potential Drug Safety Events

By Stuart Murray, Agios

Agios continues to advance programs in the clinic; it is essential that we develop an efficient and compliant system to manage safety data in a responsible way.  To accomplish this Agios is in the process of implementing an Adverse Event Reporting System (AERS).  The AERS will support the collection of appropriate data as per FDA/global regulations.

Natural Language Programming (NLP) is being used in multiple places in our workflow: to mine AE reports, extract case-data from call center records, and assist with initial coding of reported events and WHO drugs.  We are using data generated by NLP to help us to understand the progression of AEs in our ongoing clinical trials.

Future innovations: R&D update

David Milward, Linguamatics

Over the last few years, the I2E core platform has become both more powerful and more configurable. I2E 5.0 now allows unstructured or semi-structured text in different languages to be queried as if it were structured, despite variability in how concepts and relationship are expressed. However, as I2E becomes more powerful, we need to make it as easy as possible to re-use existing work, allowing new queries to be built more easily from components of old queries i.e. compositionally.

For the next couple of releases, usability and compositionality will be the main focus. We will demonstrate an example of each:  a more convenient output editor for both queries and multi queries, and the ability to embed one query inside another. We will also cover other upcoming developments, such as spelling correction and incremental charts

Finally, we will show ideas for a simpler interface that could make smart queries easier to exploit by occasional users.

I2E in Healthcare: recent developments and use cases

Simon Beaulah, Linguamatics

The rapid growth of electronic health records (EHRs) provides an abundant source of valuable data, with the potential to discover insights about patients and their response to treatment. However, with up to 80% of the richest information within the unstructured text, hospitals and medical researchers need better ways to leverage this vital information. This talk will highlight key customer applications where I2E is supporting precision medicine, population health and clinical research. It will also cover new product developments to support real-time and large scale mining of patient data, a new forum for sharing best practices and Q&A and the results of the Hackathon 2016

I2E in Life Sciences: recent developments and use cases

Jane Reed, Linguamatics

We are in the Information Age, and, as with the Agricultural and Industrial revolutions, we need the right tools to work with the raw material. In our case, the raw material is unstructured data, and I2E is a powerful tool to extract the value from unstructured data, and enable users to gain actionable insights for key decision-making. I2E’s flexibility means it can be beneficial in many applications and use cases. This talk will provide an overview of some recent customer use cases from a range of different disciplines, and highlight some of the solution areas where significant benefit has been found.

What's new in I2E in 2016?

Guy Singh, Linguamatics

This presentation provides an overview of the major product release, I2E 5.0, in 2016. As well as providing a summary of all the features within I2E 5.0, the presentation will focus on some significant new capabilities introduced to the user community: EASL (Extraction And Search Language - new query language for I2E), Normalisation, Range Search and Charting enhancements. A summary of major improvements to the content and infrastructure on ur cloud based service, I2E OnDemand will be outlined. The talk includes an introduction into a couple of major new developments designed for the I2E user community,  the Linguamatics Community Forum and the Linguamatics Developer Program.

NLP Implementation at Knight Cancer Institute

Tracy Edinger, Oregon Health & Science University

The Knight Cancer Institute (KCI) is a National Cancer Institute-designated cancer center at Oregon Health & Science University where researchers focus on cancer biology, leukemia and other blood cancers, solid tumors, and early detection. The Translational Research Hub (TRH) was created to support KCI by developing a central resource for secondary use of clinical data for cancer research. The TRH uses Linguamatics I2E to extract structured data from progress notes, discharge summaries, radiology reports, and pathology reports. This data is combined with structured data from other sources to provide information for researchers. It is also used to support clinical trial recruitment by identifying patients who are most likely to meet eligibility criteria.

Use of Natural Language Processing to Support Efforts to Predict and Prevent Non-Elective Rehospitalization: Promises and Challenges

Gabriel J. Escobar, MD, Regional Director for Hospital Operations Research

Non-elective rehospitalization remains a major problem for patients and health care organizations. The Centers for Medicare and Medicaid Services continue applying financial sanctions to hospitals that have "excess" rehospitalizations for many conditions. One major obstacle to decreasing such rehospitalizations is the difficulty in predicting which patients are at high enough risk to warrant special interventions. Although non-clinical predictors for rehospitalization (e.g., social support, functional status) may be important adjuncts for predictive models, large datasets containing these predictors as well as clinical ones do not exist. In theory, at least some information on these predictors could be extracted from free text notes in the hospital record. In this report, Dr. Escobar will describe an effort to employ natural language processing to extract such information from a cohort ofof 360,036 adults who experienced 609,393 hospitalizations at 21 Kaiser Permanente Northern California hospitals from 6/1/10-12/31/13. This report will include (a) a description of how his team developed an ontology, (b) the query strategy employed to instantiate this ontology, (c) an audit process comparing NLP to a gold standard (data from patient interviews), and (d) promises and challenges identified when data were analyzed.

Cancer Surveillance: The Next Generation. Advancement through NLP

Marina Matatova, NIH, and Glenn Abastillas, IMS Health

Roundtables: Tuesday, October 18

Hands-on Training: Monday, October 17 & Wednesday, October 19

We have hands-on training workshops run by our text mining experts available for total beginners, intermediate users and also advanced users.

Access the Hands-On Training Workshop selection guide, schedule and descriptions.

Linguamatics I2E Query Hackathon: Monday, October 17


Following a highly successful first Healthcare Hackathon last year, Linguamatics will be hosting another Hackathon at the 2016 Text Mining Summit in Cape Cod. This half-day event utilizing Linguamatics I2E platform will allow teams to solve real-life challenges. Sign up for the 2016 Hackathon and benefit from a greater understanding of query strategies and learn new skills from colleagues and Linguamatics NLP text mining experts. 

The challenge

There is significant interest and scope for the application of NLP to improve clinical trials. Identification of patient populations is difficult to scale due to complex eligibility criteria and medical records, which make manual chart review a long and laborious process. Many trials suffer from slow accrual and may miss target numbers because of these issues.

The 2016 Hackathon will focus on a Diabetes clinical trial definition that the Linguamatics Health Science Center (LHSC) wish to run.

The first part of the challenge will be to assess diabetes-related criteria from funded clinical trials. Writing a successful grant is challenging, competition for funds increases yearly.  It’s beneficial to see what trials are receiving funding to compare to your grant application. This process will involve mining clinical to ensure your success.

The second part of the challenge will be to mine a patient population for matches to the LHSC clinical trial and to export values associated with the eligibility criteria of the trial. The resulting data set will enable a much faster identification of potential trial subjects to ensure successful recruitment. An unannotated set of medical transcripts will be provided to participants to evaluate and build queries against.

Both sets of data will have been pre-indexed with appropriate disease and medication ontologies and any region detection preprocessing that the data set requires. In addition, a small set of example medical transcripts and clinical trials will be provided with annotations, a “gold standard”, to demonstrate how results should appear.  


At the end of the session we will present some hints and tips for getting good results.

Unseen annotated test data set will be used to assess each team’s efforts, and the best results will be presented during the next day.

Who should attend?

The I2E Query Hackathon is for existing I2E users rather than new users. Participation for the Hackathon is free but registration is required and spaces are limited so please sign up early. If you plan to stay on to attend the rest of the Text Mining Summit you will need to sign up to that.

Entries will be submitted at the end of the day and judged by Linguamatics CTO, David Milward and Elizabeth Marshall, Director, Clinical Analytics

Past comments

"One of the best summits and training sessions I have ever attended. The pioneering NLP efforts, and users responsiveness is unmatched by any other text mining (NLP) vendor. The Linguamatics approach & I2E system is relatively intuitive, easy to manage, powerful and useful.”

"Both workshops were very useful. I enjoyed the interactive format"

"This is such a great meeting, it's so good to hear from other people all the different ways in which they're using I2E"

"I have been going to conferences for 15 years and this one is the best one”

"Congrats on a great user meeting - the talks were quite excellent."

"Extremely useful conference."

Ready to get started?

Request a Demo

Questions? Ask our experts