Clone detection for business process models

PeerJ Comput Sci. 2022 Aug 23:8:e1046. doi: 10.7717/peerj-cs.1046. eCollection 2022.

Abstract

Models are key in software engineering, especially with the rise of model-driven software engineering. One such use of modeling is in business process modeling, where models are used to represent processes in enterprises. As the number of these process models grow in repositories, it leads to an increasing management and maintenance cost. Clone detection is a means that may provide various benefits such as repository management, data prepossessing, filtering, refactoring, and process family detection. In model clone detection, highly similar model fragments are mined from larger model repositories. In this study, we have extended SAMOS (Statistical Analysis of Models) framework for clone detection of business process models. The framework has been developed to support different types of analytics on models, including clone detection. We present the underlying techniques utilized in the framework, as well as our approach in extending the framework. We perform three experimental evaluations to demonstrate the effectiveness of our approach. We first compare our tool against the Apromore toolset for a pairwise model similarity using a synthetic model mutation dataset. As indicated by the results, SAMOS seems to outperform Apromore in the coverage of the metrics in pairwise similarity of models. Later, we do a comparative analysis of the tools on model clone detection using a dataset derived from the SAP Reference Model Collection. In this case, the results show a better precision for Apromore, while a higher recall measure for SAMOS. Finally, we show the additional capabilities of our approach for different model scoping styles through another set of experimental evaluations.

Keywords: Business process models; Clustering; Model analytics; Model clone detection; Model-driven engineering; Repository mining; Software maintenance; Vector space model.

Grants and funding

This research is funded by ECSEL, the Electronic Components and Systems for European Leadership Joint Undertaking under grant agreement No 826452 (Arrowhead Tools project), supported by the European Union Horizon 2020 research and innovation programme and by the member states. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.