
This year, Linguamatics NLP celebrates its 20th Anniversary. From a team of four founders in 2001, to being part of a 80,000+ organization, Linguamatics and has built an impressive client base including 19 of the top 20 pharma companies, the US FDA and many well-known Healthcare organizations. Today, Linguamatics NLP is an award-winning natural language processing platform for the Life Sciences and Healthcare industry.
In recognition of this significant milestone, we interviewed founders, Roger Hale, David Milward and James Thomas to find out more about the past, present and future of Linguamatics and NLP.
What were your motivations for starting Linguamatics?
[David] NLP had been an academic research area for many years, and we felt it was ready for some real applications.
We had seen a need for extracting relationships from scientific literature and could see that people were using things like regular expressions to do this. We felt there was not only a need but that we could build technology good enough to satisfy that need.
James and I had also worked on combining information extraction and information retrieval (“From Information Retrieval to Information Extraction”) and thought we could provide an interactive approach so that end users could people get structured answers to the thousands of different questions they might want to ask, rather than just find documents with a search engine or find pre-defined relationships with a fixed NLP pipeline.
Where did the names “Linguamatics” and “i2e” come from?
[James] We chose the name using some kind of transferrable voting system from a list that we came up with between us. I proposed Lingomatics and I think Roger tweaked it to be Linguamatics because of another company called Lingomotors that were around at the time. The only other choice that I can recall from the voting was something like Arti-q-late.
Early on the technology we were working on was called IEIR (Information Extraction/Information Retrieval) and then later IIE (Interactive Information Extraction). I2E comes from IIE in the same way that WWWC gives W3C for the World Wide Web Consortium.
What was your vision for Linguamatics 20 years ago and how does it differ from where it is now?
[James] One way that things differ is that in the beginning we thought we’d develop a single NLP engine that could serve as the core of both search and spoken dialog systems. The speech side was dropped in favor of search, but the idea was good – look at Google Assistant, Siri, Alexa and the like.
[David] We initially worked on information extraction and spoken dialogue systems (eg. Google Assistant, Siri, Alexa). Of the two, we felt information extraction could be ready in the near future and we wanted to exploit this early. We initially had EU funding for spoken dialogue research so we continued to work on this in the background and were intending to launch this later. Looking back, information extraction has been challenging enough, but the slow progress being made on the dialogue aspects of conversational systems (in contrast to the rapid progress on the speech recognition technology) means we might not have been so wrong: we just expected things to happen within a couple of years rather than a couple of decades. Progress always seems faster when you are on the outside of an area than from within it!
Although the focus in the last twenty years has been on natural language understanding, we now have natural language questioning on our roadmap for the Insights Hub so the original two sides of the business might come finally back together.
What was the need for a company like Linguamatics when it started in 2001, is there more of a need for it now? Ie. Need for NLP in commercial areas
[Roger] There was an emerging demand in the pharmaceutical sector. There was a huge literature resource in published articles and internal reports, etc. that could be used in areas like drug development and drug safety, but there was just too much of it and it was growing too quickly for researchers to read and keep on top of. Using machines to “read” the literature was an obvious way forward but a difficult problem to solve.
AZ was the first and is one of Linguamatics longest serving customers. How has Linguamatics been able to take on and maintain such a large company?
[David] Pharma, especially in the US, has been a good early adopter of technology from small companies. We were surprised how little people questioned us about our size and track record in the early days. Our longevity is due to the flexibility of the i2e platform and its ability to provide real value.
Linguamatics works with 19 of the top 20 pharma companies; what’s been the driving force with these companies to invest in Linguamatics?
[David] We gained a good reputation in pharma of being the NLP system that actually works. Other systems made large promises but for various reasons did not deliver good results.
[Roger] The quality of our technology and our expertise in life science and healthcare. Early on, Linguamatics was unique. Later, the range of problems addressed by Linguamatics is broader than for other products. It solves problems that other products cannot.
What would you say the key 3 milestones of the company are?
[David] Our first hands-on software demo workshop at the European Bioinformatics Institute. I think we had about 20 machines running i2e. To our relief the i2e software worked throughout, but Windows wasn’t quite as good at that point and went wrong on two machines.
Our first customers – AstraZeneca in pharma, and Kaiser Permanente in healthcare
Starting to bring new people in who have contributed so much to the development and direction of Linguamatics. The first employee who expects to be paid at the end of the month is a big step for any start-up, but also the first person for each of the specialties whether sales, application specialist, marketing, testing, HR, NLP specialist etc.
How does Linguamatics continue to keep up with the fast-paced pharma and healthcare markets?
[David] Listening and reacting to our customers has always been key, but also innovating beyond their immediate concerns.
How does the team continue to be innovative?
[David] In the past we were building everything from scratch. However, now there is a huge open source community so we are more likely to be building on other external components. To help this we are doing more evaluation work so that we can see which techniques can be brought into production either to help refine our own technology or terminology or to plug into the system directly (which may require not just from having good precision and recall, but also decent performance). For example we have been using deep learning techniques to spot mentions of diseases that are missed by our Diseases terminology. We have also been using deep learning techniques directly for anonymization of data e.g. to spot the names of people within text.
What did NLP look like 20 years ago and what do you think it will look like in the future/how will it evolve (e.g over next 5-10 years?)?
[David] 20 years ago there were a couple of specialty information extraction vendors. The NLP field is now growing fast and changing more quickly than ever before. Deep learning techniques are well established and showing great results for things like person name recognition. There are now also well-established open source toolkits allowing people to put together their own systems for specific purposes. The major cloud vendors have also recently entered the market to provide mark up for general healthcare information (e.g. diagnosis and medications). The applications of NLP are so wide there remains a lot of space in the market, whether competing directly or covering more specialist application areas.
How has the NLP market changed over the past 20 years and what has remained constant?
[David] There is now a real change from customers deciding whether to use NLP to deciding which NLP system to use. The key influencers for purchasing are also changing and may now be ML specialists, data scientists or NLP specialists.
Do you have a funny anecdote/surprising fact you could share about Linguamatics or the team?
[James] I was responsible for the practice of naming our computers after superheroes and villains. There probably wasn’t much forethought in that choice, it was made at the point where we were setting up the first couple of servers and needed names. If I remember right, they were Batman and Dr X and I think I misremembered Dr X as being a hero but he turned out to be an Action Man villain.
What, for you, has been Linguamatics’ biggest achievement – what’s your personal moment you’re most proud of whilst working for LM?
[James] It sounds a bit cheesy but enabling people to do good in the world is an amazing achievement for the company.
On a personal level, I’m proud of the teams I built and the way I strived to provide a good environment to work in and motivation for the work we were doing.
[David] We can be very proud of our role in getting NLP in real use making a difference in drug discovery and patient care in the last 20 years.
What was your first celebration for as team Linguamatics?
[David] I think getting our first pharmaceutical customer still seems the most important celebration.
[Roger] Starting the company.
