MSdB-NMF: MultiSpectral Document Image Binarization Framework via Non-negative Matrix Factorization Approach

IEEE Trans Image Process. 2020 Sep 17:PP. doi: 10.1109/TIP.2020.3023613. Online ahead of print.

Abstract

In this paper, we propose a novel method for Multispectral document image binarization (MSdB) through the Non-negative Matrix Factorization (NMF) approach. We propose a three-step MSdB-NMF framework: i) NMF-based feature extraction algorithm by introducing a new optimization problem; ii) post-processing method iii); apply any existing gray/RGB binarization scheme. In the first step, we extract N features out of B spectral bands (N < B) and their corresponding coefficient matrix. We introduce a novel objective formulation that considers the robustness (related to the noise and various types of degradations) and sparseness (related to the ratio of text pixels versus the background). We employ the multiplicative updating rules to solve the proposed minimization problem and prove the convergence of the proposed feature extraction algorithm. In the next step, we select an appropriate feature vector, equivalently the corresponding coefficient vector. We propose to select it either visually or automatically via a post-processing method, which uses the benchmark binarization methods as baseline. In the last step, we apply some existing binarization methods such as Sauvola and Howe over the selected coefficient vector. Our proposed binarization framework is applicable for any kind of MS or hyperspectral (HS) document image without considering any prior knowledge such as the side information about the spectral bands of MS/HS document image. We evaluate our proposed binarization framework over two MS document image datasets. The experimental results confirm that our proposed framework outperforms several state-of-theart binarization schemes including the winner of the contest in MS-TEx-2015.