Arabic handwritten alphabets, words and paragraphs per user (AHAWP) dataset

Data Brief. 2022 Feb 13:41:107947. doi: 10.1016/j.dib.2022.107947. eCollection 2022 Apr.

Abstract

This article presents a handwritten Arabic alphabets, words and paragraphs dataset (AHAWP). The dataset contains 65 different Arabic alphabets (with variations on begin, end, middle and regular alphabets), 10 different Arabic words (that encompass all Arabic alphabets) and 3 different paragraphs. The dataset was collected anonymously from 82 different users. Each user was asked to write each alphabet and word 10 times. A userid uniquely but anonymously identifies the writer of each alphabet, word and paragraph. In total, the dataset consists of 53199 alphabet images, 8144 words images and 241 paragraphs images. This dataset can be used for multiple purposes. It can be used for optical handwriting recognition of alphabets and words. It can also be used for writer identification (or verification) of handwritten Arabic text. It is also possible to evaluate difference in writing styles of isolated alphabets as compared to the same alphabet written as part of the word or in paragraph by the same user using this dataset. The dataset is publicly available at https://data.mendeley.com/datasets/2h76672znt/1.

Keywords: Arabic Text recognition; Handwritten Arabic alphabets; Handwritten Arabic paragraphs; Handwritten Arabic words; Writer identification.