Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research

Arch Sci (Dordr). 2022;22(3):367-392. doi: 10.1007/s10502-022-09397-0. Epub 2022 Jun 17.

Abstract

Handwritten Text Recognition (HTR) technology is now a mature machine learning tool, becoming integrated in the digitisation processes of libraries and archives, speeding up the transcription of primary sources and facilitating full text searching and analysis of historic texts at scale. However, research into how HTR is changing our information environment is scant. This paper presents a systematic literature review regarding how researchers are using one particular HTR platform, Transkribus, to indicate the domains where HTR is applied, the approach taken, and how the technology is understood. 381 papers from 2015 to 2020 were gathered from Google Scholar, Scopus, and Web of Science, then grouped and coded into categories using quantitative and qualitative approaches. Published research that mentions Transkribus is international and rapidly growing. Transkribus features primarily in archival and library science publications, while a long tail of broad and eclectic disciplines, including history, computer science, citizen science, law and education, demonstrate the wider applicability of the tool. The most common paper categories were humanities applications (67%), technological (25%), users (5%) and tutorials (3%). This paper presents the first overarching review of HTR as featured in published research, while also elucidating how HTR is affecting the information environment.

Keywords: Artificial intelligence; Digital library; Digitisation; Handwritten text recognition; Systematic literature review; Transkribus.