Digital Hebrew Paleography: Script Types and Modes

J Imaging. 2022 May 21;8(5):143. doi: 10.3390/jimaging8050143.

Abstract

Paleography is the study of ancient and medieval handwriting. It is essential for understanding, authenticating, and dating historical texts. Across many archives and libraries, many handwritten manuscripts are yet to be classified. Human experts can process a limited number of manuscripts; therefore, there is a need for an automatic tool for script type classification. In this study, we utilize a deep-learning methodology to classify medieval Hebrew manuscripts into 14 classes based on their script style and mode. Hebrew paleography recognizes six regional styles and three graphical modes of scripts. We experiment with several input image representations and network architectures to determine the appropriate ones and explore several approaches for script classification. We obtained the highest accuracy using hierarchical classification approach. At the first level, the regional style of the script is classified. Then, the patch is passed to the corresponding model at the second level to determine the graphical mode. In addition, we explore the use of soft labels to define a value we call squareness value that indicates the squareness/cursiveness of the script. We show how the graphical mode labels can be redefined using the squareness value. This redefinition increases the classification accuracy significantly. Finally, we show that the automatic classification is on-par with a human expert paleographer.

Keywords: Hebrew medieval manuscripts; convolutional neural network; deep-learning based classification; digital paleography; handwritten style analysis; script type classification.

Grants and funding

The participation of Vasyutinsky Shapira in this project is funded by Israeli Ministery of Science, Technology and Space, Yuval Ne’eman scholarship n. 3-16784.