A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy)

Environ Monit Assess. 2010 Jan;160(1-4):1-22. doi: 10.1007/s10661-008-0653-3.

Abstract

Environmental time series are often affected by the "presence" of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the "most similar" monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.

MeSH terms

  • Environmental Monitoring / methods*
  • Italy
  • Rain*
  • Rivers*
  • Statistics as Topic / methods*