Robust Real-Time Music Transcription with a Compositional Hierarchical Model

Matevž Pesek; Aleš Leonardis; Matija Marolt

doi:10.1371/journal.pone.0169411

Robust Real-Time Music Transcription with a Compositional Hierarchical Model

PLoS One. 2017 Jan 3;12(1):e0169411. doi: 10.1371/journal.pone.0169411. eCollection 2017.

Authors

Matevž Pesek¹, Aleš Leonardis^{1

2}, Matija Marolt¹

Affiliations

¹ University of Ljubljana, Faculty of Computer and Information Science, Laboratory for computer graphics and multimedia, Ljubljana, Slovenia.
² University of Birmingham, School of Computer Science, Centre for Computational Neuroscience and Cognitive Robotics, Birmingham, United Kingdom of Great Britain and Northern Ireland.

Abstract

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

MeSH terms

Algorithms
Humans
Learning*
Models, Statistical
Models, Theoretical
Music*
Perception
Pitch Perception*
Software*

Grants and funding

The author(s) received no specific funding for this work.