Evaluation of aspiration problems in L2 English pronunciation employing machine learning

Magdalena Piotrowska; Andrzej Czyżewski; Tomasz Ciszewski; Gražina Korvel; Adam Kurowski; Bożena Kostek

doi:10.1121/10.0005480

Evaluation of aspiration problems in L2 English pronunciation employing machine learning

J Acoust Soc Am. 2021 Jul;150(1):120. doi: 10.1121/10.0005480.

Authors

Magdalena Piotrowska¹, Andrzej Czyżewski², Tomasz Ciszewski³, Gražina Korvel⁴, Adam Kurowski², Bożena Kostek⁵

Affiliations

¹ AGH University of Science and Technology, Faculty of Mechanical Engineering and Robotics, Department of Mechanics and Vibroacoustics, Al. Mickiewicza 30, 30-059 Krakow, Poland.
² Department of Multimedia Systems, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland.
³ Institute of English and American Studies, Faculty of Languages, University of Gdańsk, Gdańsk, Poland.
⁴ Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania.
⁵ Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland.

PMID: 34340465
DOI: 10.1121/10.0005480

Abstract

The approach proposed in this study includes methods specifically dedicated to the detection of allophonic variation in English. This study aims to find an efficient method for automatic evaluation of aspiration in the case of Polish second-language (L2) English speakers' pronunciation when whole words are analyzed instead of particular allophones extracted from words. Sample words including aspirated and unaspirated allophones were prepared by experts in English phonetics and phonology. The datasets created include recordings of words pronounced by nine native English speakers of standard southern British accent and 20 Polish L2 English users. Complete unedited words are treated as input data for feature extraction and classification algorithms such as k-nearest neighbors, naive Bayes method, long-short term memory, and convolutional neural network (CNN). Various signal representations, including low-level audio features, the so-called mid-term and feature trajectory, and spectrograms, are tested in the context of their usability for the detection of aspiration. The results obtained show high potential for an automated evaluation of pronunciation focused on a particular phonological feature (aspiration) when classifiers analyze whole words. Additionally, CNN returns satisfying results for the automated classification of words containing aspirated and unaspirated allophones produced by Polish L2 speakers.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Language*
Machine Learning
Multilingualism*
Phonetics