Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Muhammad Yasir; Li Chen; Amna Khatoon; Muhammad Amir Malik; Fazeel Abid

doi:10.1155/2021/8415333

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Comput Intell Neurosci. 2021 Dec 10:2021:8415333. doi: 10.1155/2021/8415333. eCollection 2021.

Authors

Muhammad Yasir¹, Li Chen¹, Amna Khatoon², Muhammad Amir Malik³, Fazeel Abid⁴

Affiliations

¹ School of Information Science and Technology, Northwest University, Xi'an, Shaanxi, China.
² Department of Information Engineering, Chang'an University, Xi'an, Shaanxi, China.
³ Department of Computer Science, Islamic International University, Islamabad, Pakistan.
⁴ Department of Information System, University of Management and Technology, Lahore, Pakistan.

Abstract

Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing.

MeSH terms

Natural Language Processing*
Research Design