LIC criterion for optimal subset selection in distributed interval estimation

J Appl Stat. 2022 Mar 24;50(9):1900-1920. doi: 10.1080/02664763.2022.2053949. eCollection 2023.

Abstract

Distributed interval estimation in linear regression may be computationally infeasible in the presence of big data that are normally stored in different computer servers or in cloud. The existing challenge represents the results from the distributed estimation may still contain redundant information about the population characteristics of the data. To tackle this computing challenge, we develop an optimization procedure to select the best subset from the collection of data subsets, based on which we perform interval estimation in the context of linear regression. The procedure is derived based on minimizing the length of the final interval estimator and maximizing the information remained in the selected data subset, thus is named as the LIC criterion. Theoretical performance of the LIC criterion is studied in this paper together with a simulation study and real data analysis.

Keywords: 62H12; 62J05; 68W15; Distributed estimation; LIC criterion; distributed linear regression; optimal subset selection.

Grants and funding

This work was supported by a grant from Natural Science Foundation of Shandong Province under project ID ZR2020MA022 and 2020KJI003.