Sparse feature learning for instrument identification: Effects of sampling and pooling methods

J Acoust Soc Am. 2016 May;139(5):2290. doi: 10.1121/1.4946988.

Abstract

Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyzed for both of proposed sampling methods. Regarding summarization of the feature activation, a standard deviation pooling method is used and compared with the commonly used max- and average-pooling techniques. Using more than 47 000 recordings of 24 instruments from various performers, playing styles, and dynamics, a number of tuning parameters are experimented including the analysis frame size, the dictionary size, and the type of frequency scaling as well as the different sampling and pooling methods. The results show that the combination of proportional sampling and standard deviation pooling achieve the best overall performance of 95.62% while the optimal parameter set varies among the instrument classes.

Publication types

  • Research Support, Non-U.S. Gov't