Life sciences researchers who employ text mining and NLP (natural language processing) techniques to extract discrete facts and insights from scientific articles have generally relied on MEDLINE abstracts to define a corpus.
But there is increasing interest in mining full-text articles, with researchers experimenting on corpora sourced to Open Access repositories like PubMed Central (PMC). Organizations are eager to take advantage of the unique benefits full text provides, and rightly so.
Full-text content provides insights that researchers otherwise wouldn’t have had access to using abstracts alone. Here are three central benefits of mining a full-text corpus:
Volume. Full-text articles include more named entities and relationships between those entities than their corresponding abstracts – this is intuitively obvious when we consider the length of an abstract versus its full-text article. A study published in the Journal of Biomedical Informatics makes this point quantitatively: Only 7.84% of the scientific claims made in full-text articles are found in their abstracts.[i]