Selected Data Mining Tools for Data Analysis in Distributed Environment

Entropy (Basel). 2022 Oct 1;24(10):1401. doi: 10.3390/e24101401.

Abstract

In this paper, we deal with distributed data represented either as a finite set T of decision tables with equal sets of attributes or a finite set I of information systems with equal sets of attributes. In the former case, we discuss a way to the study decision trees common to all tables from the set T: building a decision table in which the set of decision trees coincides with the set of decision trees common to all tables from T. We show when we can build such a decision table and how to build it in a polynomial time. If we have such a table, we can apply various decision tree learning algorithms to it. We extend the considered approach to the study of test (reducts) and decision rules common to all tables from T. In the latter case, we discuss a way to study the association rules common to all information systems from the set I: building a joint information system for which the set of true association rules that are realizable for a given row ρ and have a given attribute a on the right-hand side coincides with the set of association rules that are true for all information systems from I, have the attribute a on the right-hand side, and are realizable for the row ρ. We then show how to build a joint information system in a polynomial time. When we build such an information system, we can apply various association rule learning algorithms to it.

Keywords: association rules; decision rules; decision tables; decision trees; distributed data; information systems; reducts; tests.

Grants and funding

Research funded by King Abdullah University of Science and Technology.