MELODI: Mining Enriched Literature Objects to Derive Intermediates

Int J Epidemiol. 2018 Jan 12;47(2):369-379. doi: 10.1093/ije/dyx251. Online ahead of print.

Abstract

Background: The scientific literature contains a wealth of information from different fields on potential disease mechanisms. However, identifying and prioritizing mechanisms for further analytical evaluation presents enormous challenges in terms of the quantity and diversity of published research. The application of data mining approaches to the literature offers the potential to identify and prioritize mechanisms for more focused and detailed analysis.

Methods: Here we present MELODI, a literature mining platform that can identify mechanistic pathways between any two biomedical concepts.

Results: Two case studies demonstrate the potential uses of MELODI and how it can generate hypotheses for further investigation. First, an analysis of ETS-related gene ERG and prostate cancer derives the intermediate transcription factor SP1, recently confirmed to be physically interacting with ERG. Second, examining the relationship between a new potential risk factor for pancreatic cancer identifies possible mechanistic insights which can be studied in vitro.

Conclusions: We have demonstrated the possible applications of MELODI, including two case studies. MELODI has been implemented as a Python/Django web application, and is freely available to use at [www.melodi.biocompute.org.uk].

Keywords: Data mining; publications; risk factors.