Robust table recognition for printed document images

Math Biosci Eng. 2020 Apr 23;17(4):3203-3223. doi: 10.3934/mbe.2020182.

Abstract

The recognition and analysis of tables on printed document images is a popular research field of the pattern recognition and image processing. Existing table recognition methods usually require high degree of regularity, and the robustness still needs significant improvement. This paper focuses on a robust table recognition system that mainly consists of three parts: Image preprocessing, cell location based on contour mutual exclusion, and recognition of printed Chinese characters based on deep learning network. A table recognition app has been developed based on these proposed algorithms, which can transform the captured images to editable text in real time. The effectiveness of the table recognition app has been verified by testing a dataset of 105 images. The corresponding test results show that it could well identify high-quality tables, and the recognition rate of low-quality tables with distortion and blur reaches 81%, which is considerably higher than those of the existing methods. The work in this paper could give insights into the application of the table recognition and analysis algorithms.

Keywords: binarization algorithm; character recognition; deep learning; recurrent neural network; table image recognition.

Publication types

  • Research Support, Non-U.S. Gov't