A Robust and Low Computational Cost Pitch Estimation Method

Desheng Wang; Yangjie Wei; Yi Wang; Jing Wang

doi:10.3390/s22166026

A Robust and Low Computational Cost Pitch Estimation Method

Sensors (Basel). 2022 Aug 12;22(16):6026. doi: 10.3390/s22166026.

Authors

Desheng Wang¹, Yangjie Wei¹, Yi Wang¹, Jing Wang²

Affiliations

¹ Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.
² School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China.

Abstract

Pitch estimation is widely used in speech and audio signal processing. However, the current methods of modeling harmonic structure used for pitch estimation cannot always match the harmonic distribution of actual signals. Due to the structure of vocal tract, the acoustic nature of musical equipment, and the spectrum leakage issue, speech and audio signals' harmonic frequencies often slightly deviate from the integer multiple of the pitch. This paper starts with the summation of residual harmonics (SRH) method and makes two main modifications. First, the spectral peak position constraint of strict integer multiple is modified to allow slight deviation, which benefits capturing harmonics. Second, a main pitch segment extension scheme with low computational cost feature is proposed to utilize the smooth prior of pitch more efficiently. Besides, the pitch segment extension scheme is also integrated into the SRH method's voiced/unvoiced decision to reduce short-term errors. Accuracy comparison experiments with ten pitch estimation methods show that the proposed method has better overall accuracy and robustness. Time cost experiments show that the time cost of the proposed method reduces to around 1/8 of the state-of-the-art fast NLS method on the experimental computer.

Keywords: harmonic structure; harmonic summation (HS); pitch estimation; smooth prior.

MeSH terms

Computers
Signal Processing, Computer-Assisted
Voice*

Grants and funding

61973059/Natural Science Foundation of China