A Deep Learning Approach for Arabic Manuscripts Classification

Sensors (Basel). 2023 Sep 28;23(19):8133. doi: 10.3390/s23198133.

Abstract

For centuries, libraries worldwide have preserved ancient manuscripts due to their immense historical and cultural value. However, over time, both natural and human-made factors have led to the degradation of many ancient Arabic manuscripts, causing the loss of significant information, such as authorship, titles, or subjects, rendering them as unknown manuscripts. Although catalog cards attached to these manuscripts might contain some of the missing details, these cards have degraded significantly in quality over the decades within libraries. This paper presents a framework for identifying these unknown ancient Arabic manuscripts by processing the catalog cards associated with them. Given the challenges posed by the degradation of these cards, simple optical character recognition (OCR) is often insufficient. The proposed framework uses deep learning architecture to identify unknown manuscripts within a collection of ancient Arabic documents. This involves locating, extracting, and classifying the text from these catalog cards, along with implementing processes for region-of-interest identification, rotation correction, feature extraction, and classification. The results demonstrate the effectiveness of the proposed method, achieving an accuracy rate of 92.5%, compared to 83.5% with classical image classification and 81.5% with OCR alone.

Keywords: Arabic manuscript; Faster R-CNN; LSTM; Region of Interest (RoI); deep learning.

Grants and funding

This research received no external funding.