Novel method of building train and test sets for evaluation of machine learning models related to software bugs assignment

Lukasz Chmielowski; Michal Kucharzak; Robert Burduk

doi:10.1038/s41598-023-48617-0

Novel method of building train and test sets for evaluation of machine learning models related to software bugs assignment

Sci Rep. 2023 Dec 6;13(1):21512. doi: 10.1038/s41598-023-48617-0.

Authors

Lukasz Chmielowski^{1

2}, Michal Kucharzak^{3

4}, Robert Burduk⁴

Affiliations

¹ Nokia Solutions and Networks sp. z o.o., 02-685, Warsaw, Poland. lukasz.chmielowski@nokia.com.
² Wroclaw University of Science and Technology, 50-370, Wroclaw, Poland. lukasz.chmielowski@nokia.com.
³ Nokia Solutions and Networks sp. z o.o., 02-685, Warsaw, Poland.
⁴ Wroclaw University of Science and Technology, 50-370, Wroclaw, Poland.

Abstract

Nowadays many tools are in use in processes related to handling bug reports, feature requests, supporting questions or similar related issues which should be handled during software development or maintenance. Part of them use machine learning techniques. In introduction is presented a review of fundamental methods used for evaluation of machine learning models. This paper points out weak points of currently used metrics for evaluation in specific context of the cases related to software development especially bug reports. The disadvantages of state of the art are related to disregarding time dependencies which are important to be applied for creating train and test sets as they may have impact on results. Extensive research of the art has been conducted and has not been found any article with the use of time dependencies for evaluation of machine learning models in the context of works related to software development applications like machine learning solutions to supporting bug tracking systems. This paper introduces a novel solution which is devoid of these drawbacks. Experimental research showed the effectiveness of the introduced method and significantly different results obtained compared to the state-of-the-art methods.