Linguamatics Text Mining Summit 2016
What does the conference include?
- Customer presentations featuring best practice, case studies and insights on practical approaches to text mining and knowledge discovery
- Presentations covering what's new in I2E, looking ahead at developments in the pipeline and future directions for text mining and knowledge discovery
- Roundtable Discussions covering important topics and challenges in the field of text mining and knowledge discovery
- Opportunities to network with peers and with Linguamatics experts
- Hands-on workshops, giving new and experienced users the opportunity to explore the full capabilities of I2E, and discuss best practice in consultation with Linguamatics experts
- Healthcare track to discuss best practice and use of NLP in healthcare
- Evening social events
- Partner presentations and exhibits
- Meals and refreshments provided during the conference are included with the registration fee
Reasons to attend
- Gain first-hand knowledge and experience on how structured and unstructured content can be mined to uncover valuable information
- Gain hands-on experience of NLP text mining through workshops and training
- Understand the challenges other pharmaceutical and healthcare professionals are facing and explore
- solutions to these challenges
- Gain a better understanding of NLP text mining and where it can fit into your organization
- Network and exchange ideas with peers and text mining experts
Monday, October 17
- 11:00AM - 12:00PM: Registration
- 12:00PM - 1:00PM: Lunch
- 1:00PM - 5:00PM: Hackathon
- 1:00PM - 5:00PM: Workshop A: Introduction to I2E - General
- 1:00PM - 5:00PM: Workshop B: Introduction to I2E - Healthcare
*Preliminary agenda - subject to change
Tuesday, October 18
- 9:00AM - 12:00PM: Main Presentations
- 12:00PM - 1:00PM: Lunch
- 1:00PM - 2:00PM: Roundtable Discussions
- 2:00PM - 5:00PM: Main Presentations (continued)
*Preliminary agenda - subject to change
Wednesday, October 19
- 9:00AM - 12:00PM: Main Presentations (continued)
- 12:00PM - 1:00PM: Lunch
- 1:00PM - 2:00PM: Roundtable Presentations
- 2:00PM - 5:00PM: Workshop A: I2E Hints and Tips - Ask the Expert
- 2:00PM - 5:00PM: Workshop B: Exploiting new I2E Features
- 2:00PM - 5:00PM: Workshop C: Developing Applications for I2E
- 5:00PM: Meeting Adjourned
*Preliminary agenda - subject to change
Thierry is the Senior Manager of Information Analytics at Novo Nordisk Inc., located in Princeton, NJ. Currently, Thierry’s primary data focuses are real world, market, and clinical trial data. Thierry is influenced by design thinking principles and enjoys exploring novel solutions to information problems. In his work, Thierry utilizes a combination of tools and techniques, including natural language processing, information visualization for discovery and presentation, and conducting descriptive and predictive analyses.
Some of Thierry’s past projects include working on social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations.
When Thierry is not playing around with data, he enjoys spending time with his wife and two daughters, admiring vintage cars, collecting transit maps, and sipping the occasional whisky.
Presenting on generating actionable insights from real world data.
Eric Su is a Principal Research Scientist at Eli Lilly and Company. He has research experience in molecular biology, bioinformatics, data mining and text mining. Eric mines various texts including scientific literatures, surveys, and call center scripts. He received a Ph.D. from UC Berkeley, and did post-doctoral research at DFCI/Harvard University.
Yoni Dvorkis, MPH, CHDA is a Data Consultant with ten years of experience in health care, data analytics, information technology, and health informatics. He is a key member of the Department of Population Health at Atrius and is certified as a Health Data Analyst by the American Health Information Management Association. Yoni received his Master's in Public Health from the Harvard School of Public Health and his Bachelor of Arts in Mathematics with a minor in Economics from Tufts University.
Eric (Rick) Lewis MD is a Medical Director in the GlaxoSmithKline department of Global Clinical Safety and Pharmacovigilance. He has worked in the pharmaceutical industry for over 30 years and has development experience in clinical research, biometrics and data management, clinical pharmacology and drug safety.
Nina Mian (MSc, MBA), Head of Biomedical Informatics, AstraZeneca
Nina leads the Biomedical Informatics group, part of the Advanced Analytics Centre at AstraZeneca. This global team pioneers the use of novel visualization methods and advanced analytical tools to improve drug development decision-making. Nina’s research interests include initiatives centred around creating patient-centred insights. Nina holds degrees in Biochemistry, Informatics, and an MBA from Manchester Business School, UK.
Connect on LinkedIn
Drexel University College of Medicine
Walt Niemczura is the Director of Application Development for Drexel University Department of Information & Resource Technology (IRT). In his role, Walt oversees application development and several major third party products including Microsoft SharePoint, Vanderbilt’s REDCap (Research Electronic Data Capture), and Linguamatics I2E Natural Language Processor. Walt’s team will also be responsible for programs that employ Allscripts Unity, the application interface to Drexel’s EMR system. Although assigned primarily to the college of medicine, Walt and his team provides support for I2E and REDCap across the entire University.
Walt previously served on committees that governed the selection and use of Drexel’s NLP solution (I2E) and secured data repository (REDCap), and currently serves on the college of medicine’s data warehousing committee.
Walt is a graduate of Drexel University and has over thirty years’ experience in application development, including real time systems and large database applications, and information technology management.
Stuart is speaking on Automated identification of potential Drug Safety Events.
OHSU Knight Cancer Institute
Tracy Edinger, ND, MS, is an NLP Data Scientist at Oregon Health & Science University’s Knight Cancer Institute, where she is part of a team developing data pipelines to support oncology research. Following her clinical training, she completed a postdoctoral research fellowship and MS in clinical informatics at OHSU, with a research focus on retrieving patient cohorts from clinical text.
David Milward is chief technology officer (CTO) at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years experience of product development, consultancy and research in natural language processing (NLP). After receiving a PhD from the University of Cambridge, he was a researcher and lecturer at the University of Edinburgh. He has published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.
Jane Reed is the head of life science strategy. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 15 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech - with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics.
John Brimacombe is a serial entrepreneur. After graduating in Law and Computer Science from Trinity College, Cambridge, he founded Jobstream Group plc, which provides specialist ERP software to the international financial services sector. Brimacombe subsequently co-founded pioneering mobile entertainment start-up nGame Ltd., which was acquired by Hands-On Mobile Inc. He served as President/COO of HandsOn Mobile for over 2 years, leading the company through 7 major M&A transactions and massive global expansion. In addition to an ongoing commitment to Jobstream, Brimacombe currently chairs the enterprise natural language search tools provider, Linguamatics, is a Partner at Sussex Place Ventures, the venture-capital arm of the London Business School, is a seed-investor in multiple US and UK start-ups and has extensive experience as a non-executive director from start-up to Public markets.
Simon Beaulah is director of healthcare strategy at Linguamatics. With so much unstructured text associated with patient records and a major market requirement for natural language processing (NLP) in healthcare, this is a key focus area for Linguamatics. Simon is leading the development and marketing of healthcare products based on I2E that improve patient outcomes, reduce risk and enhance insight. With extensive experience in translational medicine, healthcare analytics and ontology building, Simon is well placed to work with the Linguamatics team to deliver the expected growth in this market for use of NLP. Simon has been working in life science and healthcare informatics for more than 20 years, initially in research and over the past 14 years for informatics vendors including Synomics, LION bioscience, BioWisdom, InforSense, and IDBS. Simon has degrees from Aston University and Cranfield Institute of Technology.
Guy Singh is senior manager, product and strategic alliances at Linguamatics. He has a joint role that spans both managing the I2E product and the partner ecosystem around it. In his product role, he is responsible for product management, marketing and strategy guidance of I2E, an award winning agile, high performance enterprise text mining software. His strategic alliances role holds responsibility for recruiting and managing partners across technology, content and services for text mining. Guy joined Linguamatics from information intelligence organization I2 (now part of IBM), where he was responsible for the launch of their first ever search based product. He has held posts in R&D, Engineering, Product Marketing and Management at Vodafone, Baltimore Technologies, Oracle and IXI. With over 20 years of experience in the IT industry working in a wide range of environments, his achievements include building a new multi-million dollar search product line from scratch and managing product businesses with turnovers in excess of $40 million.
Gabriel J. Escobar, MD, is a research scientist at the Kaiser Permanente Northern California Division of Research; director of the Division of Research Systems Research Initiative (a research program focusing on adult hospital processes and outcomes); and Regional Director for Hospital Operations Research for Kaiser Permanente Northern California, in which capacity he works to improve Kaiser Permanente's internal reporting and quality measurement. Dr. Escobar received his medical degree from Yale University School of Medicine; completed his pediatrics residency at University of California, San Francisco; and was a Robert Wood Johnson Clinical Scholar at Stanford University School of Medicine.
Dr. Escobar's research now focuses on the processes and outcomes in the care of hospitalized adults. His interests include risk adjustment, predictive modeling, severity-of-illness scoring, the use of comprehensive inpatient and outpatient electronic medical records for health services research, and the use of real-time decision support tools that are embedded in the electronic record. Between 1991 and 2001, Dr. Escobar developed a research program in neonatology for Kaiser Permanente Northern California. In 2001, he began development of his current research program, and in 2009 he turned over the neonatology research program to Dr. Michael Kuzniewicz at the Division of Research so that he could focus his work on adult hospital research. Dr. Escobar also practices at the Kaiser Permanente Medical centers in Walnut Creek and Antioch, where he works in the neonatal intensive care nursery and as a hospital-based pediatrician.
National Cancer Institute at the National Institutes of Health
Glenn is a computational linguist with a background in medical NLP development and research. Currently, he works with the National Cancer Institute at the National Institutes of Health to bolster oncology research by extracting biomarker related features from big unstructured textual data for machine learning and other types of analysis. His personal linguistic research investigates the phenomenon and recognition of code-switching in unstructured texts containing two or more language systems. He also speaks more than 8 languages including English, Cebuano, Spanish, German, Portuguese, French, Arabic, Norwegian and Sign Language.
National Cancer Institute at the National Institutes of Health
Marina is an informatics program manager and information technology specialist focusing on surveillance informatics initiatives for the Surveillance Research Program with the National Cancer Institute at the National Institutes of Health. She has spent the past decade working on health informatics and data analytics initiatives focusing on cancer research. She enjoys uncovering fundamental user needs and creating informatics solutions that drive change.
Presentations: Tuesday, October 18 & Wednesday, October 19
By John Brimacombe, Executive Chairman, Linguamatics
John will welcome attendees with an update on company progress and industry trends. He will introduce and set the context for the forthcoming major I2E 5.0 release and also talk about the critical importance of the I2E user community and initiatives to better support users.
Using Natural Language Processing for Mining the Electronic Medical Record
By Walt Niemczura, Drexel University
An overview of the process employed to extract information from Drexel’s EMR system for processing via natural language processing, the de-identification process, and delivery to a secured environment. Presentation will include examples that the highlight success of using Linguamatics I2E to reduce labor hours and improve results for research and operational success.
“You make me sick” – how text mining can help
By Nina Mian, AstraZeneca
Clinicians rely on drug product labels, derived from clinical trial data, as a primary source of Adverse Reaction (AR) to inform their prescribing. Nausea is a common and often debilitating AR associated with many medicines. Yet, under-reporting of a subjective AR such as Nausea may give rise to a potential disparity in clinical results and a real world occurrence.
Through AstraZeneca’s collaboration with PatientsLikeMe (PLM) , this research aims to examine differences in nausea AR frequencies between patient on-line self-reported and drug product label data extracted using i2e. The results of this analysis will help to determine if PLM data could provide an supplementary data source for AR frequencies, in addition to the drug product labels.
This is an update to the work presented at the Linguamatics spring conference 2016 by James Loudon-Griffiths
1. AstraZeneca. 2015. AstraZeneca And PatientsLikeMe Announce Global Research Collaboration. [Press release]. [Accessed 13 Apr. 2016]. Available from: www.astrazeneca.com
Applications of I2E to Clinical Safety and Pharmacovigilance
By Eric Lewis, GSK
The Clinical Safety and Pharmacovigilance departments of major pharmaceutical companies are information intensive operations. Their workflow is also highly regulated and subject to auditing by Regulatory Authorities. A major pharma company can be responsible for the pharmacovigilance of hundreds of products; products which are in development, marketed and generic. The case load (CIOMS reports) for a major pharmaceutical company is on the order of 100,000s of adverse event reports sent to regulatory agencies every year. In addition to these individual reports the Sponsor company is also responsible for reviewing the medical literature for evidence of new “safety signals”. This must be done according to ICH regulations every week for marketed products and also (though less frequently) for products in development.
A sample of 20 marketed products for a major pharmaceutical company were tracked and the number of literature references received during a defined period were ascertained. Over a 218 day period, 13125 new references (some duplicates) were retrieved from two databases (Embase & Searchlight). This equates to approximately 60 references per day. Assuming this sample is representative, a Big Pharma company with 200 marketed molecules could receive approximately 220,000 “new” references in a given year. So the question arises, is there a “new signal”.
Another and perhaps more important factor is the need for development teams to answer questions in “real-time”. This is not unlike the need of medical teams during rounds having a desire to answer questions pertinent to patient management.
Experience with I2E has demonstrated that a text mining tool can effectively sift through the mountains of free text information for which pharmaceutical companies are legally responsible. The advantages of I2E though extend well beyond the routine and required signal detection exercise. The greater value of a tool like I2E is in its efficiency and the ability to find answers to questions and thereby create knowledge. Relational database tools can be employed to further index and curate this knowledge for future easy access.
Generating Actionable Insights from Real World Data
By Thierry Breyette, NovoNordisk
The density and variability of the information landscape is making it increasingly difficult to identify meaningful trends in data. Traditional data sources such as clinical trial data and publication data are one piece of an increasingly complex information puzzle.
As data capture and publishing platforms explode, newer and highly varied data sources are available for analysis, including internally generated data, social data, patient data, clinician data, market data, hospital data, etc. Building a forward looking analytics framework to tackle these new data challenges requires both extensible and flexible tools, and creative thinking.
This talk with cover some of the real world data projects we have worked on: social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations, gleaning clinical insights based on discussions between US based medical liaisons and health care providers, and patient and caregiver ethnographic data to discern patterns in patient sentiment, compliance, routines, behaviors, and overall treatment satisfaction and outcomes. The discussion will focus on our approach to each of these projects, the outcome and impact.
Natural Language Processing Validation of High Risk Patient Characteristics using Linguamatics I2E
By Yoni Dvorkis, Atrius Health
Under the guidance of our new CEO, Atrius Health is committed to returning joy to the practice of medicine for clinicians and staff. Providers currently feel overburdened with documentation and often work long hours after their time in clinic in order to document in patients’ charts.
Structured data elements are useful to the organization for reporting purposes, however often times documenting structured data can be time consuming and less fulfilling than writing free text progress notes that are richer in detail and convey more about the patient’s history course of treatment. To fulfill this goal, there is a need to investigate tools that can mine free text data to yield valuable insights that can be aggregated for large populations.
To that end, Atrius Health has partnered with Linguamatics to pilot their I2E software that uses Natural Language Processing techniques to extract valuable insights from free text to create a structure that can then be reported for all patients.
We tested this software using three high risk cohorts: patients with COPD, patients with CHF, and patients who need assistance with Activities of Daily Living (ADLs). We then compared the results from NLP to our internally validated algorithms.
Specifically we report:
COPD: 84% agreement between NLP vs. internal algorithm that relies on primary diagnoses captured during encounters (Kappa coefficient = 0.66)
CHF: 88% agreement between NLP and chart review, identifying patients with ejection fractions less than 40%. (Kappa coefficient = 0.59)
ADL: 67% agreement identifying patients who need help with bathing, the ADL with the highest volume for our patients (Kappa coefficient = 0.04)
The results show reliable agreement rates between the two sources which we found encouraging. The ADL analysis demonstrated a smaller agreement rate when we compared to an internal assessment conducted by Case Managers, however that opened a new discussion as to whether Case Managers are accurately tracking this information over time as they tend to conduct their assessment only upon enrolling patients into Case Management programs.
We are currently investigating whether NLP can extract valuable insight on patients with pulmonary nodules from scanned radiology chest CT scans in order to document potential misdiagnosis.
Systematic Drug Repositioning Using Text Mining Tools on Clinicaltrials.gov
By Eric Su, Lilly
Drug repositioning (aka drug repurposing) is the process of discovering new indications (diseases) for marketed drugs. Historically, such discoveries were the results of serendipity. However, the rapid growth in electronic clinical data and text mining tools makes it feasible to systematically repurpose drugs. A case of mining clinicaltrials.gov data using I2E and PolyAnalyst is presented here. The I2E query extracts “Serious Adverse Events” (SAE) data from randomized trials where the disease synonyms are not in the “Condition” but in the SAE region. Through a statistical algorism, a PolyAnalyst workflow ranks the drugs where the drug arm has less SAE (disease) than the control arm. Hypotheses could then be generated for the new use of these drugs. One outcome of the presented I2E-PolyAnalyst workflow is the hypothesis that vitamin K1 might help prevent cancer.
Automated identification of potential Drug Safety Events
By Stuart Murray, Agios
Future innovations: R&D update
David Milward, Linguamatics
Over the last few years, the I2E core platform has become both more powerful and more configurable. I2E 5.0 now allows unstructured or semi-structured text in different languages to be queried as if it were structured, despite variability in how concepts and relationship are expressed. However, as I2E becomes more powerful, we need to make it as easy as possible to re-use existing work, allowing new queries to be built more easily from components of old queries i.e. compositionally.
For the next couple of releases, usability and compositionality will be the main focus. We will demonstrate an example of each: a more convenient output editor for both queries and multi queries, and the ability to embed one query inside another. We will also cover other upcoming developments, such as spelling correction and incremental charts
Finally, we will show ideas for a simpler interface that could make smart queries easier to exploit by occasional users.
I2E in Healthcare: recent developments and use cases
Simon Beaulah, Linguamatics
The rapid growth of electronic health records (EHRs) provides an abundant source of valuable data, with the potential to discover insights about patients and their response to treatment. However, with up to 80% of the richest information within the unstructured text, hospitals and medical researchers need better ways to leverage this vital information. This talk will highlight key customer applications where I2E is supporting precision medicine, population health and clinical research. It will also cover new product developments to support real-time and large scale mining of patient data, a new forum for sharing best practices and Q&A and the results of the Hackathon 2016
I2E in Life Sciences: recent developments and use cases
Jane Reed, Linguamatics
What's new in I2E in 2016?
Guy Singh, Linguamatics
This presentation provides an overview of the major product release, I2E 5.0, in 2016. As well as providing a summary of all the features within I2E 5.0, the presentation will focus on some significant new capabilities introduced to the user community: EASL (Extraction And Search Language - new query language for I2E), Normalisation, Range Search and Charting enhancements. A summary of major improvements to the content and infrastructure on ur cloud based service, I2E OnDemand will be outlined. The talk includes an introduction into a couple of major new developments designed for the I2E user community, the Linguamatics Community Forum and the Linguamatics Developer Program.
NLP Implementation at Knight Cancer Institute
Tracy Edinger, Oregon Health & Science University
The Knight Cancer Institute (KCI) is a National Cancer Institute-designated cancer center at Oregon Health & Science University where researchers focus on cancer biology, leukemia and other blood cancers, solid tumors, and early detection. The Translational Research Hub (TRH) was created to support KCI by developing a central resource for secondary use of clinical data for cancer research. The TRH uses Linguamatics I2E to extract structured data from progress notes, discharge summaries, radiology reports, and pathology reports. This data is combined with structured data from other sources to provide information for researchers. It is also used to support clinical trial recruitment by identifying patients who are most likely to meet eligibility criteria.
Use of Natural Language Processing to Support Efforts to Predict and Prevent Non-Elective Rehospitalization: Promises and Challenges
Gabriel J. Escobar, MD, Regional Director for Hospital Operations Research
Non-elective rehospitalization remains a major problem for patients and health care organizations. The Centers for Medicare and Medicaid Services continue applying financial sanctions to hospitals that have "excess" rehospitalizations for many conditions. One major obstacle to decreasing such rehospitalizations is the difficulty in predicting which patients are at high enough risk to warrant special interventions. Although non-clinical predictors for rehospitalization (e.g., social support, functional status) may be important adjuncts for predictive models, large datasets containing these predictors as well as clinical ones do not exist. In theory, at least some information on these predictors could be extracted from free text notes in the hospital record. In this report, Dr. Escobar will describe an effort to employ natural language processing to extract such information from a cohort ofof 360,036 adults who experienced 609,393 hospitalizations at 21 Kaiser Permanente Northern California hospitals from 6/1/10-12/31/13. This report will include (a) a description of how his team developed an ontology, (b) the query strategy employed to instantiate this ontology, (c) an audit process comparing NLP to a gold standard (data from patient interviews), and (d) promises and challenges identified when data were analyzed.
Roundtables: Tuesday, October 18
Hands-on Training: Monday, October 17 & Wednesday, October 19
We have hands-on training workshops run by our text mining experts available for total beginners, intermediate users and also advanced users.
Linguamatics I2E Query Hackathon: Monday, October 17
Following a highly successful first Healthcare Hackathon last year, Linguamatics will be hosting another Hackathon at the 2016 Text Mining Summit in Cape Cod. This half-day event utilizing Linguamatics I2E platform will allow teams to solve real-life challenges. Sign up for the 2016 Hackathon and benefit from a greater understanding of query strategies and learn new skills from colleagues and Linguamatics NLP text mining experts.
There is significant interest and scope for the application of NLP to improve clinical trials. Identification of patient populations is difficult to scale due to complex eligibility criteria and medical records, which make manual chart review a long and laborious process. Many trials suffer from slow accrual and may miss target numbers because of these issues.
The 2016 Hackathon will focus on a Diabetes clinical trial definition that the Linguamatics Health Science Center (LHSC) wish to run.
The first part of the challenge will be to assess diabetes-related criteria from funded clinical trials. Writing a successful grant is challenging, competition for funds increases yearly. It’s beneficial to see what trials are receiving funding to compare to your grant application. This process will involve mining clinical trial.gov to ensure your success.
The second part of the challenge will be to mine a patient population for matches to the LHSC clinical trial and to export values associated with the eligibility criteria of the trial. The resulting data set will enable a much faster identification of potential trial subjects to ensure successful recruitment. An unannotated set of medical transcripts will be provided to participants to evaluate and build queries against.
Both sets of data will have been pre-indexed with appropriate disease and medication ontologies and any region detection preprocessing that the data set requires. In addition, a small set of example medical transcripts and clinical trials will be provided with annotations, a “gold standard”, to demonstrate how results should appear.
At the end of the session we will present some hints and tips for getting good results.
Unseen annotated test data set will be used to assess each team’s efforts, and the best results will be presented during the next day.
Who should attend?
The I2E Query Hackathon is for existing I2E users rather than new users. Participation for the Hackathon is free but registration is required and spaces are limited so please sign up early. If you plan to stay on to attend the rest of the Text Mining Summit you will need to sign up to that.
Entries will be submitted at the end of the day and judged by Linguamatics CTO, David Milward and Elizabeth Marshall, Director, Clinical Analytics
"One of the best summits and training sessions I have ever attended. The pioneering NLP efforts, and users responsiveness is unmatched by any other text mining (NLP) vendor. The Linguamatics approach & I2E system is relatively intuitive, easy to manage, powerful and useful.”
"Both workshops were very useful. I enjoyed the interactive format"
"This is such a great meeting, it's so good to hear from other people all the different ways in which they're using I2E"
"I have been going to conferences for 15 years and this one is the best one”
"Congrats on a great user meeting - the talks were quite excellent."
"Extremely useful conference."