Evaluating the Language ENvironment Analysis System for Korean

Margarethe McDonald; Taeahn Kwon; Hyunji Kim; Youngki Lee; Eon-Suk Ko

doi:10.1044/2020_JSLHR-20-00489

Evaluating the Language ENvironment Analysis System for Korean

J Speech Lang Hear Res. 2021 Mar 17;64(3):792-808. doi: 10.1044/2020_JSLHR-20-00489. Epub 2021 Mar 2.

Authors

Margarethe McDonald¹, Taeahn Kwon², Hyunji Kim³, Youngki Lee⁴, Eon-Suk Ko³

Affiliations

¹ Department of Communication Sciences and Disorders, University of Wisconsin-Madison.
² Department of Computer Science, Yonsei University, Seoul, South Korea.
³ Department of English Language and Literature, Chosun University, Gwangju, South Korea.
⁴ Department of Computer Science and Engineering, Seoul National University, South Korea.

PMID: 33651954
DOI: 10.1044/2020_JSLHR-20-00489

Abstract

Purpose The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean. Method We sampled sixty 5-min recording clips involving 38 key children aged 7-18 months from a larger data set. We establish the identification error rate, precision, and recall of LENA classification compared to human coders. We then examine the correlation between standard LENA measures of adult word count, child vocalization count, and conversational turn count and human counts of the same measures. Results Our identification error rate (64% or 67%), including false alarm, confusion, and misses, was similar to the rate found in Cristia, Lavechin, et al. (2020). The correlation between LENA and human counts for adult word count (r = .78 or .79) was similar to that found in the other studies, but the same measure for child vocalization count (r = .34-.47) was lower than the value in Cristia, Lavechin, et al., though it fell within ranges found in other non-European languages. The correlation between LENA and human conversational turn count was not high (r = .36-.47), similar to the findings in other studies. Conclusions LENA technology is similarly reliable for Korean language environments as it is for other non-English language environments. Factors affecting the accuracy of diarization include speakers' pitch, duration of utterances, age, and the presence of noise and electronic sounds.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Child
Communication
Humans
Language Development*
Language*
Republic of Korea