Scientific papers are mainly written in English, so it is not surprising that most scientific text mining has concentrated on just one language. However, as the use of text mining has become broader, moving from early research through to clinical and post-marketing, there is increasing need to be able to deal with other languages. In the pharmaceutical sector, this is seen in projects ranging across voice of the customer, analysis of sales reports, adverse event monitoring, patent analysis, and checking the quality of regulatory submission documents. In healthcare, hospitals often have a multinational presence, and a need to collect information from records written in several languages.
Multilingual processing not only allows text mining in other languages (for example, a French medic analysing French electronic medical records), but also allows easier mining of foreign language documents, or across different languages. A couple of examples:
- An English researcher can mine Chinese text using concepts they have found using the English synonyms, extract the relationships of interest, and then use something like Google translate to show the evidence within the original text.
- A French medic can automatically link their medical records with relevant clinical trials in English
Linguamatics recognized this growing need and, in I2E 4.4, has provided a platform that can deal with multiple languages. It can even deal with cases such as patent documents where a single document contains text written in multiple languages, ensuring that an English synonym for adverse events such as “die” does not hit the German determiner “die”.