GrimoireLab: A toolset for software development analytics

PeerJ Comput Sci. 2021 Jul 9:7:e601. doi: 10.7717/peerj-cs.601. eCollection 2021.

Abstract

Background: After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge.

Goal: To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated.

Method: Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments.

Results: GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis.

Conclusions: We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

Keywords: Datasets; Empirical software engineering; Mining software repositories; Software analytics; Software development; Software development visualization; Toolset.

Grants and funding

This work is supported by Ministerio de Ciencia y Tecnología of Spain under Project BugBirth, RTI2018-101963-B-I00 (Retos) and Grimoire as a Service, RTC-2017-6554-7 (Retos Colaboracion), and by Ministerio de Economia y Competitividad of Spain under Grant PTQ-15-07709 (Torres Quevedo). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.