Software engineering principles to improve quality and performance of R software

PeerJ Comput Sci. 2019 Feb 4:5:e175. doi: 10.7717/peerj-cs.175. eCollection 2019.

Abstract

Today's computational researchers are expected to be highly proficient in using software to solve a wide range of problems ranging from processing large datasets to developing personalized treatment strategies from a growing range of options. Researchers are well versed in their own field, but may lack formal training and appropriate mentorship in software engineering principles. Two major themes not covered in most university coursework nor current literature are software testing and software optimization. Through a survey of all currently available Comprehensive R Archive Network packages, we show that reproducible and replicable software tests are frequently not available and that many packages do not appear to employ software performance and optimization tools and techniques. Through use of examples from an existing R package, we demonstrate powerful testing and optimization techniques that can improve the quality of any researcher's software.

Keywords: Case study; Data science; Optimization; Profiling; R language; Reproducible research; Software engineering; Statistical computing; Unit testing.

Grants and funding

The University of Colorado Data Science to Patient Value initiative provided funding for this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.