Using Paper Texture for Choosing a Suitable Algorithm for Scanned Document Image Binarization

J Imaging. 2022 Oct 5;8(10):272. doi: 10.3390/jimaging8100272.

Abstract

The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance their image. Image binarization is the process of producing a monochromatic image having its color version as input. It is a key step in the document processing pipeline. The recent Quality-Time Binarization Competitions for documents have shown that no binarization algorithm is good for any kind of document image. This paper uses a sample of the texture of the scanned historical documents as the main document feature to select which of the 63 widely used algorithms, using five different versions of the input images, totaling 315 document image-binarization schemes, provides a reasonable quality-time trade-off.

Keywords: DIB dataset; binarization algorithms; binarization competitions; document binarization; historical documents; scanned documents.

Grants and funding

The research reported in this paper was mainly sponsored by the RD&I project Callidus Academy signed between the Universidade do Estado do Amazonas (UEA) and Callidus Indústria through the Lei de Informática/SUFRAMA. Rafael Dueire Lins was also partly sponsored by CNPq —Brazil.