Speeding-Up Association Rule Mining With Inverted Index Compression

IEEE Trans Cybern. 2016 Dec;46(12):3059-3072. doi: 10.1109/TCYB.2015.2496175. Epub 2016 Jan 19.

Abstract

The growing interest in data storage has made the data size to be exponentially increased, hampering the process of knowledge discovery from these large volumes of high-dimensional and heterogeneous data. In recent years, many efficient algorithms for mining data associations have been proposed, facing up time and main memory requirements. Nevertheless, this mining process could still become hard when the number of items and records is extremely high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithms without modifying its original schema. Thus, our aim is to speed up the association rule mining process regardless the algorithm used to this end, enabling the performance of efficient implementations to be enhanced. The structure simplifies, reorganizes, and speeds up the data access by sorting data by means of a shuffling strategy based on the hamming distance, which achieve similar values to be closer, and considering both an inverted index mapping and a run length encoding compression. In the experimental study, we explore the bounds of the algorithms' performance by using a wide number of data sets that comprise either thousands or millions of both items and records. The results demonstrate the utility of the proposed data structure in enhancing the algorithms' runtime orders of magnitude, and substantially reducing both the auxiliary and the main memory requirements.