What does Chinese BERT learn about syntactic knowledge?

Jianyu Zheng; Ying Liu

doi:10.7717/peerj-cs.1478

What does Chinese BERT learn about syntactic knowledge?

PeerJ Comput Sci. 2023 Jul 26:9:e1478. doi: 10.7717/peerj-cs.1478. eCollection 2023.

Authors

Jianyu Zheng¹, Ying Liu¹

Affiliation

¹ Department of Chinese Language and Literature, Tsinghua University, Haidian Distinct, Beijing, China.

Abstract

Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks.

Keywords: BERT; Chinese; Fine-tune; NLP; Syntax.

Grants and funding

This work was supported by the Major Program of the National Social Science Fund of China (18ZDA238), the Tsinghua University Initiative Scientific Research Program (2019THZWJC38), Beihang University Sponsored Projects for Core Young Researchers in the Disciplines of Social Sciences and Humanities (KG16183801) and the Tianjin Postgraduate Scientific Research Innovation Program (No. 2022BKY024). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.