Kiñit classification in Ethiopian chants, Azmaris and modern music: A new dataset and CNN benchmark

Ephrem Afele Retta; Richard Sutcliffe; Eiad Almekhlafi; Yosef Kefyalew Enku; Eyob Alemu; Tigist Demssice Gemechu; Michael Abebe Berwo; Mustafa Mhamed; Jun Feng

doi:10.1371/journal.pone.0284560

Kiñit classification in Ethiopian chants, Azmaris and modern music: A new dataset and CNN benchmark

PLoS One. 2023 Apr 20;18(4):e0284560. doi: 10.1371/journal.pone.0284560. eCollection 2023.

Authors

Ephrem Afele Retta¹, Richard Sutcliffe^{1

2}, Eiad Almekhlafi¹, Yosef Kefyalew Enku³, Eyob Alemu⁴, Tigist Demssice Gemechu⁵, Michael Abebe Berwo⁶, Mustafa Mhamed¹, Jun Feng¹

Affiliations

¹ School of Information Science and Technology, Northwest University, Xi'an, China.
² School of Computer Science and Electronic Engineering, University of Essex, Colchester, United Kingdom.
³ School of Telecommunications Engineering, Xidian University, Xi'an, China.
⁴ School of Computer Science and Technology, Xidian University, Xi'an, China.
⁵ School of Information and Civil Engineering, Chang'an University, Xi'an, China.
⁶ School of Information Science and Technology, Chang'an University, Xi'an, China.

Abstract

In this paper, we create EMIR, the first-ever Music Information Retrieval dataset for Ethiopian music. EMIR is freely available for research purposes and contains 600 sample recordings of Orthodox Tewahedo chants, traditional Azmari songs and contemporary Ethiopian secular music. Each sample is classified by five expert judges into one of four well-known Ethiopian Kiñits, Tizita, Bati, Ambassel and Anchihoye. Each Kiñit uses its own pentatonic scale and also has its own stylistic characteristics. Thus, Kiñit classification needs to combine scale identification with genre recognition. After describing the dataset, we present the Ethio Kiñits Model (EKM), based on VGG, for classifying the EMIR clips. In Experiment 1, we investigated whether Filterbank, Mel-spectrogram, Chroma, or Mel-frequency Cepstral coefficient (MFCC) features work best for Kiñit classification using EKM. MFCC was found to be superior and was therefore adopted for Experiment 2, where the performance of EKM models using MFCC was compared using three different audio sample lengths. 3s length gave the best results. In Experiment 3, EKM and four existing models were compared on the EMIR dataset: AlexNet, ResNet50, VGG16 and LSTM. EKM was found to have the best accuracy (95.00%) as well as the fastest training time. However, the performance of VGG16 (93.00%) was found not to be significantly worse (P < 0.01). We hope this work will encourage others to explore Ethiopian music and to experiment with other models for Kiñit classification.

Copyright: © 2023 Retta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking / classification
Datasets as Topic / classification
Ethiopia
Humans
Music*
Singing*

Grants and funding

This work was supported by the National Key Research and Development Program of China under grant 2020YFC1523300. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.