Modeling and comparison of count data containing zero values: a case study of Setipinna taty in the south inshore of Zhejiang, China

Environ Sci Pollut Res Int. 2021 Sep;28(34):46827-46837. doi: 10.1007/s11356-021-13440-5. Epub 2021 Mar 20.

Abstract

To effectively use the fishery count data containing zero values, Setipinna taty in the coastal waters of south inshore of Zhejiang in China from 2017 to 2019 was used in this study. Environmental factors, such as water temperature, water depth, and salinity, were selected to establish models and compare based on the generalized additive model (GAM) of the Tweedie distribution (Tweedie-GAM) and two-stage GAM, Ad hoc method, and generalized additive mixed model (GAMM). The results showed that each station accounted for a higher proportion of zero values and the two-stage GAM model had a higher deviation interpretation rate, and GAM I and GAM II had 19.6% and 60.4% deviation interpretation rates. The cross-validation results showed that the performance evaluation of the two-stage GAM model was the best and showed the highest R2 value, the lowest average absolute error, and the relatively small root mean square error. This study found that the abundance of S. taty in the south inshore of Zhejiang was highest at around 21°C and 18°C in spring and autumn, and the abundance reached the highest at a water depth of about 20 m. In spatial distribution, the high value of the abundance of S. taty was mostly distributed in the coastal waters in the south of 28°N. In future research, models should be fitted and compared for different sampling zero-value ratios, and more environmental factors should be included to accurately find an optimal model and provide references for the conservation of fishery resources.

Keywords: Count data; Generalized additive model; Optimal model; Setipinna taty; Two-stage GAM; Zero values.

Publication types

  • Review

MeSH terms

  • Animals
  • China
  • Fisheries*
  • Fishes*
  • Salinity
  • Seasons
  • Temperature