Processing biological literature with customizable Web services supporting interoperable formats

Rafal Rak; Riza Theresa Batista-Navarro; Jacob Carter; Andrew Rowley; Sophia Ananiadou

doi:10.1093/database/bau064

Processing biological literature with customizable Web services supporting interoperable formats

Database (Oxford). 2014 Jul 8:2014:bau064. doi: 10.1093/database/bau064. Print 2014.

Authors

Rafal Rak¹, Riza Theresa Batista-Navarro², Jacob Carter³, Andrew Rowley³, Sophia Ananiadou³

Affiliations

¹ National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101 rafal.rak@manchester.ac.uk.
² National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101.
³ National Centre for Text Mining, School of Computer Science, University of Manchester, M1 7DN, UK and Department of Computer Science, University of the Philippines Diliman, Philippines 1101.

Abstract

Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomedical Research*
Computational Biology
Data Mining / methods*
Databases, Factual*
Internet*
Molecular Sequence Annotation
PubMed
Publications*
Software*
User-Computer Interface
Workflow

Grants and funding

Wellcome Trust/United Kingdom