Rate Distortion Theory for Descriptive Statistics

Entropy (Basel). 2023 Mar 5;25(3):456. doi: 10.3390/e25030456.

Abstract

Rate distortion theory was developed for optimizing lossy compression of data, but it also has applications in statistics. In this paper, we illustrate how rate distortion theory can be used to analyze various datasets. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruction points, and assigning "descriptive confidence regions" to the reconstruction points. We study four models or datasets of increasing complexity: clustering, Gaussian models, linear regression, and a dataset describing orientations of early Islamic mosques. These examples illustrate how rate distortion analysis may serve as a common framework for handling different statistical problems.

Keywords: Anscombe quartet; Gaussian mixture models; clustering; descriptive statistics; early Islam; linear regression; outlier detection; qibla; quantizer; rate distortion theory.

Grants and funding

This research received no external funding.