From text to structured data: Converting a word-processed floristic checklist into Darwin Core Archive format

PhytoKeys. 2012:(9):1-13. doi: 10.3897/phytokeys.9.2770. Epub 2012 Jan 30.

Abstract

The paper describes a pilot project to convert a conventional floristic checklist, written in a standard word processing program, into structured data in the Darwin Core Archive format. After peer-review and editorial acceptance, the final revised version of the checklist was converted into Darwin Core Archive by means of regular expressions and published thereafter in both human-readable form as traditional botanical publication and Darwin Core Archive data files. The data were published and indexed through the Global Biodiversity Information Facility (GBIF) Integrated Publishing Toolkit (IPT) and significant portions of the text of the paper were used to describe the metadata on IPT. After publication, the data will become available through the GBIF infrastructure and can be re-used on their own or collated with other data.

Keywords: Darwin Core Archive; Data mining; taxonomic checklists.