Sequential linear regression with online standardized data

PLoS One. 2018 Jan 18;13(1):e0191186. doi: 10.1371/journal.pone.0191186. eCollection 2018.

Abstract

The present study addresses the problem of sequential least square multidimensional linear regression, particularly in the case of a data stream, using a stochastic approximation process. To avoid the phenomenon of numerical explosion which can be encountered and to reduce the computing time in order to take into account a maximum of arriving data, we propose using a process with online standardized data instead of raw data and the use of several observations per step or all observations until the current step. Herein, we define and study the almost sure convergence of three processes with online standardized data: a classical process with a variable step-size and use of a varying number of observations per step, an averaged process with a constant step-size and use of a varying number of observations per step, and a process with a variable or constant step-size and use of all observations until the current step. Their convergence is obtained under more general assumptions than classical ones. These processes are compared to classical processes on 11 datasets for a fixed total number of observations used and thereafter for a fixed processing time. Analyses indicate that the third-defined process typically yields the best results.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Data Interpretation, Statistical
  • Linear Models*
  • Stochastic Processes

Grants and funding

This work is supported by a public grant overseen by the French National Research Agency (ANR) as part of the second “Investissements d’Avenir” programme (reference: ANR-15-RHU-0004). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.