Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks

Florian Côme Fizaine; Patrick Bard; Michel Paindavoine; Cécile Robin; Edouard Bouyé; Raphaël Lefèvre; Annie Vinter

doi:10.3390/jimaging10030065

Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks

J Imaging. 2024 Mar 5;10(3):65. doi: 10.3390/jimaging10030065.

Authors

Florian Côme Fizaine^{1

2}, Patrick Bard¹, Michel Paindavoine¹, Cécile Robin^{2

3}, Edouard Bouyé², Raphaël Lefèvre⁴, Annie Vinter¹

Affiliations

¹ LEAD-CNRS, Université de Bourgogne, 21000 Dijon, France.
² Archives Départementales de Côte d'Or, 21000 Dijon, France.
³ Institut National du Patrimoine, 75002 Paris, France.
⁴ Société Nationale des Chemins de fer Français, 93200 Saint Denis, France.

Abstract

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance.

Keywords: Mask-RCNN; U-Net; deep learning; historical document analysis; instance segmentation; line segmentation.

Grants and funding

This research received no external funding.