An inversion-based clustering approach for complex clusters

Mohammad Mahdi Barati Jozan; Aynaz Lotfata; Howard J Hamilton; Hamed Tabesh

doi:10.1186/s13104-024-06791-y

An inversion-based clustering approach for complex clusters

BMC Res Notes. 2024 May 12;17(1):133. doi: 10.1186/s13104-024-06791-y.

Authors

Mohammad Mahdi Barati Jozan¹, Aynaz Lotfata², Howard J Hamilton³, Hamed Tabesh⁴

Affiliations

¹ Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
² Department of Pathology, Microbiology, and Immunology, School Of Veterinary Medicine, University of California, Davis, USA.
³ Department of Computer Science, University of Regina, Regina, SK, Canada.
⁴ Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. Tabeshh@mums.ac.ir.

Abstract

Background: The choice of an appropriate similarity measure plays a pivotal role in the effectiveness of clustering algorithms. However, many conventional measures rely solely on feature values to evaluate the similarity between objects to be clustered. Furthermore, the assumption of feature independence, while valid in certain scenarios, does not hold true for all real-world problems. Hence, considering alternative similarity measures that account for inter-dependencies among features can enhance the effectiveness of clustering in various applications.

Methods: In this paper, we present the Inv measure, a novel similarity measure founded on the concept of inversion. The Inv measure considers the significance of features, the values of all object features, and the feature values of other objects, leading to a comprehensive and precise evaluation of similarity. To assess the performance of our proposed clustering approach that incorporates the Inv measure, we evaluate it on simulated data using the adjusted Rand index.

Results: The simulation results strongly indicate that inversion-based clustering outperforms other methods in scenarios where clusters are complex, i.e., apparently highly overlapped. This showcases the practicality and effectiveness of the proposed approach, making it a valuable choice for applications that involve complex clusters across various domains.

Conclusions: The inversion-based clustering approach may hold significant value in the healthcare industry, offering possible benefits in tasks like hospital ranking, treatment improvement, and high-risk patient identification. In social media analysis, it may prove valuable for trend detection, sentiment analysis, and user profiling. E-commerce may be able to utilize the approach for product recommendation and customer segmentation. The manufacturing sector may benefit from improved quality control, process optimization, and predictive maintenance. Additionally, the approach may be applied to traffic management and fleet optimization in the transportation domain. Its versatility and effectiveness make it a promising solution for diverse fields, providing valuable insights and optimization opportunities for complex and dynamic data analysis tasks.

Keywords: Adjusted Rand index; Clustering algorithm; Inversion-based similarity measure; Overlapping clusters.

MeSH terms

Algorithms*
Cluster Analysis
Computer Simulation
Humans