Harmonic-aware tri-path convolution recurrent network for singing voice separation

Yih-Liang Shen; Ya-Ching Lai; Tai-Shih Chi

doi:10.1121/10.0019997

Harmonic-aware tri-path convolution recurrent network for singing voice separation

JASA Express Lett. 2023 Jul 1;3(7):074801. doi: 10.1121/10.0019997.

Authors

Yih-Liang Shen¹, Ya-Ching Lai¹, Tai-Shih Chi¹

Affiliation

¹ Department of Electronics and Electrical Engineering, National Yang Ming Chiao Tung University, Hsinchu City, Taiwanyihliang.ee06@nycu.edu.tw; r7.ee08@nycu.edu.tw; tschi@nycu.edu.tw.

PMID: 37404168
DOI: 10.1121/10.0019997

Abstract

Temporal coherence and spectral regularity are critical cues for human auditory streaming processes and are considered in many sound separation models. Some examples include the Conv-tasnet model, which focuses on temporal coherence using short length kernels to analyze sound, and the dual-path convolution recurrent network (DPCRN) model, which uses two recurring neural networks to analyze general patterns along the temporal and spectral dimensions on a spectrogram. By expanding DPCRN, a harmonic-aware tri-path convolution recurrent network model via the addition of an inter-band RNN is proposed. Evaluation results on public datasets show that this addition can further boost the separation performances of DPCRN.