A Semi-automated Approach for Bengali Neologism

SN Comput Sci. 2023;4(5):428. doi: 10.1007/s42979-023-01866-2. Epub 2023 Jun 7.

Abstract

Neologisms refer to newly coined words or phrases adopted by a language, and it is a slow but ongoing process that occurs in all languages. Sometimes, rarely used or obsolete words are also considered neologisms. Certain events, such as wars, the emergence of new diseases, or advancements like computers and the internet, can trigger the creation of new words or neologisms. The COVID-19 pandemic is one such event that has rapidly led to an explosion of neologisms in the context of the disease and several other social contexts. Even the term COVID-19 itself is a newly coined term. Studying such adaptation or change and quantifying it is essential from a linguistic perspective. However, identifying newly coined terms or extracting neologisms computationally is a challenging task. The standard tools and techniques for finding newly coined terms in English-like languages may not be suitable for Bengali and other Indic languages. This study aims to use a semi-automated approach to investigate the emergence or modification of new words in the Bengali language amidst the COVID-19 pandemic. To conduct this study, a Bengali web corpus was compiled consisting of COVID-19 related articles sourced from various web sources in Bengali. The current experiment focuses solely on COVID-19-related neologisms, but the method can be adapted for general purposes and extended to other languages as well.

Keywords: Bengali; COVID-19; Corpus; Language.; Linguistic analysis; Neologisms; Word formation.