Looking for related posts on GitHub discussions

PeerJ Comput Sci. 2023 Nov 9:9:e1567. doi: 10.7717/peerj-cs.1567. eCollection 2023.

Abstract

Software teams increasingly adopt different tools and communication channels to aid the software collaborative development model and coordinate tasks. Among such resources, software development forums have become widely used by developers. Such environments enable developers to get and share technical information quickly. In line with this trend, GitHub announced GitHub Discussions-a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. Since GitHub Discussions is a software development forum, it faces challenges similar to those faced by systems used for asynchronous communication, including the problems caused by related posts (duplicated and near-duplicated posts). These related posts can add noise to the platform and compromise project knowledge sharing. Hence, this article addresses the problem of detecting related posts on GitHub Discussions. To achieve this, we propose an approach based on a Sentence-BERT pre-trained general-purpose model: the RD-Detector. We evaluated RD-Detector using data from three communities hosted in GitHub. Our dataset comprises 16,048 discussion posts. Three maintainers and three Software Engineering (SE) researchers manually evaluated the RD-Detector results, achieving 77-100% of precision and 66% of recall. In addition, maintainers pointed out practical applications of the approach, such as providing knowledge to support merging the discussion posts and converting the posts to comments on other related posts. Maintainers can benefit from RD-Detector to address the labor-intensive task of manually detecting related posts.

Keywords: Communication tool; GitHub Discussions; Knowledge sharing; Related posts; Sentence-BERT; Software teams interaction.

Grants and funding

This work was supported by CNPq through processes number 314174/2020-6 and 313067/2020-1, CAPES financial code 001, FAPESP under grant 2020/05191-2, FAPEAM through process number 062.00150/2020. This research was carried out within the scope of the Samsung-UFAM Project for Education and Research (SUPER), according to Article 48 of Decree number 6.008/2006 (SUFRAMA). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.