A Novel Adaptive Deskewing Algorithm for Document Images

Sensors (Basel). 2022 Oct 18;22(20):7944. doi: 10.3390/s22207944.

Abstract

Document scanning often suffers from skewing, which may seriously influence the efficiency of Optical Character Recognition (OCR). Therefore, it is necessary to correct the skewed document before document image information analysis. In this article, we propose a novel adaptive deskewing algorithm for document images, which mainly includes Skeleton Line Detection (SKLD), Piecewise Projection Profile (PPP), Morphological Clustering (MC), and the image classification method. The image type is determined firstly based on the image's layout feature. Thus, adaptive correcting is applied to deskew the image according to its type. Our method maintains high accuracy on the Document Image Skew Estimation Contest (DISEC'2013) and PubLayNet datasets, which achieved 97.6% and 80.1% accuracy, respectively. Meanwhile, extensive experiments show the superiority of the proposed algorithm.

Keywords: adaptive strategy; deskewing; document image; image classification; skew estimation.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Image Processing, Computer-Assisted / methods
  • Pattern Recognition, Automated* / methods