Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model

Emmanouil Benetos; Simon Dixon

doi:10.1121/1.4790351

Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model

J Acoust Soc Am. 2013 Mar;133(3):1727-41. doi: 10.1121/1.4790351.

Authors

Emmanouil Benetos¹, Simon Dixon

Affiliation

¹ Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom. emmanouilb@eecs.qmul.ac.uk

PMID: 23464042
DOI: 10.1121/1.4790351

Abstract

A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Markov Chains
Models, Theoretical*
Music*
Pattern Recognition, Automated
Signal Processing, Computer-Assisted*
Sound Spectrography
Sound*
Time Factors