Document Liveness Challenge Dataset (DLC-2021)

Dmitry V Polevoy; Irina V Sigareva; Daria M Ershova; Vladimir V Arlazarov; Dmitry P Nikolaev; Zuheng Ming; Muhammad Muzzamil Luqman; Jean-Christophe Burie

doi:10.3390/jimaging8070181

Document Liveness Challenge Dataset (DLC-2021)

J Imaging. 2022 Jun 28;8(7):181. doi: 10.3390/jimaging8070181.

Authors

Dmitry V Polevoy^{1

2

3}, Irina V Sigareva^{1

4}, Daria M Ershova^{1

5}, Vladimir V Arlazarov^{1

2}, Dmitry P Nikolaev^{1

6}, Zuheng Ming⁷, Muhammad Muzzamil Luqman⁷, Jean-Christophe Burie⁷

Affiliations

¹ Smart Engines Service LLC, 117312 Moscow, Russia.
² Federal Research Center "Computer Science and Control" RAS, 119333 Moscow, Russia.
³ National University of Science and Technology MISIS, 119049 Moscow, Russia.
⁴ Moscow Institute of Physics and Technology, 141701 Dolgoprodny, Russia.
⁵ Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia.
⁶ Institute for Information Transmission Problems (Kharkevich Institute) RAS, 127051 Moscow, Russia.
⁷ L3i Laboratory, La Rochelle University, 17042 La Rochelle, France.

Abstract

Various government and commercial services, including, but not limited to, e-government, fintech, banking, and sharing economy services, widely use smartphones to simplify service access and user authorization. Many organizations involved in these areas use identity document analysis systems in order to improve user personal-data-input processes. The tasks of such systems are not only ID document data recognition and extraction but also fraud prevention by detecting document forgery or by checking whether the document is genuine. Modern systems of this kind are often expected to operate in unconstrained environments. A significant amount of research has been published on the topic of mobile ID document analysis, but the main difficulty for such research is the lack of public datasets due to the fact that the subject is protected by security requirements. In this paper, we present the DLC-2021 dataset, which consists of 1424 video clips captured in a wide range of real-world conditions, focused on tasks relating to ID document forensics. The novelty of the dataset is that it contains shots from video with color laminated mock ID documents, color unlaminated copies, grayscale unlaminated copies, and screen recaptures of the documents. The proposed dataset complies with the GDPR because it contains images of synthetic IDs with generated owner photos and artificial personal information. For the presented dataset, benchmark baselines are provided for tasks such as screen recapture detection and glare detection. The data presented are openly available in Zenodo.

Keywords: document analysis; document anti-fraud; document forgery detection; document recognition; identity documents; liveness detection; mobile recognition; open data; screen recapture detection.

Grants and funding

This research received no external funding.