Evaluating the Language ENvironment Analysis System for Korean

J Speech Lang Hear Res. 2021 Mar 17;64(3):792-808. doi: 10.1044/2020_JSLHR-20-00489. Epub 2021 Mar 2.

Abstract

Purpose The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean. Method We sampled sixty 5-min recording clips involving 38 key children aged 7-18 months from a larger data set. We establish the identification error rate, precision, and recall of LENA classification compared to human coders. We then examine the correlation between standard LENA measures of adult word count, child vocalization count, and conversational turn count and human counts of the same measures. Results Our identification error rate (64% or 67%), including false alarm, confusion, and misses, was similar to the rate found in Cristia, Lavechin, et al. (2020). The correlation between LENA and human counts for adult word count (r = .78 or .79) was similar to that found in the other studies, but the same measure for child vocalization count (r = .34-.47) was lower than the value in Cristia, Lavechin, et al., though it fell within ranges found in other non-European languages. The correlation between LENA and human conversational turn count was not high (r = .36-.47), similar to the findings in other studies. Conclusions LENA technology is similarly reliable for Korean language environments as it is for other non-English language environments. Factors affecting the accuracy of diarization include speakers' pitch, duration of utterances, age, and the presence of noise and electronic sounds.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Child
  • Communication
  • Humans
  • Language Development*
  • Language*
  • Republic of Korea