Software defect prediction using learning to rank approach

Ali Bou Nassif; Manar Abu Talib; Mohammad Azzeh; Shaikha Alzaabi; Rawan Khanfar; Ruba Kharsa; Lefteris Angelis

doi:10.1038/s41598-023-45915-5

Software defect prediction using learning to rank approach

Sci Rep. 2023 Nov 2;13(1):18885. doi: 10.1038/s41598-023-45915-5.

Authors

Ali Bou Nassif¹, Manar Abu Talib², Mohammad Azzeh³, Shaikha Alzaabi⁴, Rawan Khanfar⁴, Ruba Kharsa², Lefteris Angelis⁵

Affiliations

¹ Department of Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates. anassif@sharjah.ac.ae.
² Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates.
³ Department of Data Science, Princess Sumaya University for Technology, Amman, Jordan.
⁴ Department of Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates.
⁵ Department of Statistics and Information Systems, Aristotle University of Thessaloniki, Thessaloniki, Greece.

Abstract

Software defect prediction (SDP) plays a significant role in detecting the most likely defective software modules and optimizing the allocation of testing resources. In practice, though, project managers must not only identify defective modules, but also rank them in a specific order to optimize the resource allocation and minimize testing costs, especially for projects with limited budgets. This vital task can be accomplished using Learning to Rank (LTR) algorithm. This algorithm is a type of machine learning methodology that pursues two important tasks: prediction and learning. Although this algorithm is commonly used in information retrieval, it also presents high efficiency for other problems, like SDP. The LTR approach is mainly used in defect prediction to predict and rank the most likely buggy modules based on their bug count or bug density. This research paper conducts a comprehensive comparison study on the behavior of eight selected LTR models using two target variables: bug count and bug density. It also studies the effect of using imbalance learning and feature selection on the employed LTR models. The models are empirically evaluated using Fault Percentile Average. Our results show that using bug count as ranking criteria produces higher scores and more stable results across multiple experiment settings. Moreover, using imbalance learning has a positive impact for bug density, but on the other hand it leads to a negative impact for bug count. Lastly, using the feature selection does not show significant improvement for bug density, while there is no impact when bug count is used. Therefore, we conclude that using feature selection and imbalance learning with LTR does not come up with superior or significant results.