MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms

J Biomed Inform. 2011 Dec;44(6):919-26. doi: 10.1016/j.jbi.2011.05.009. Epub 2011 Jun 13.

Abstract

Motivation: PubMed is the most widely used database of biomedical literature. To the detriment of the user though, the ranking of the documents retrieved for a query is not content-based, and important semantic information in the form of assigned Medical Subject Headings (MeSH) terms is not readily presented or productively utilized. The motivation behind this work was the discovery of unanticipated information through the appropriate ranking of MeSH term pairs and, indirectly, documents. Such information can be useful in guiding novel research and following promising trends.

Methods: A web-based tool, called MeSHy, was developed implementing a mainly statistical algorithm. The algorithm takes into account the frequencies of occurrences, concurrences, and the semantic similarities of MeSH terms in retrieved PubMed documents to create MeSH term pairs. These are then scored and ranked, focusing on their unexpectedly frequent or infrequent occurrences.

Results: MeSHy presents results through an online interactive interface facilitating further manipulation through filtering and sorting. The results themselves include the MeSH term pairs, along with MeSH categories, the score, and document IDs, all of which are hyperlinked for convenience. To highlight the applicability of the tool, we report the findings of an expert in the pharmacology field on querying the molecularly-targeted drug imatinib and nutrition-related flavonoids. To the best of our knowledge, MeSHy is the first publicly available tool able to directly provide such a different perspective on the complex nature of published work.

Implementation and availability: Implemented in Perl and served by Apache2 at http://bat.ina.certh.gr/tools/meshy/ with all major browsers supported.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining*
  • Internet
  • MEDLINE / statistics & numerical data
  • Medical Subject Headings*
  • PubMed* / statistics & numerical data
  • Software
  • User-Computer Interface