dc.contributor.author
Dillen, Mathias
dc.contributor.author
Groom, Quentin
dc.contributor.author
Chagnoux, Simon
dc.contributor.author
Güntsch, Anton
dc.contributor.author
Hardisty, Alex
dc.contributor.author
Haston, Elspeth
dc.contributor.author
Livermore, Laurence
dc.contributor.author
Runnel, Veljo
dc.contributor.author
Schulman, Leif
dc.contributor.author
Willemse, Luc
dc.date.accessioned
2019-07-19T09:32:34Z
dc.date.available
2019-07-19T09:32:34Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/25126
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-2881
dc.description.abstract
Background
More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons.
New information
To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding transcribed data. These images originate from nine different collections and include specimens that reflect the multiple potential obstacles that transcription methods may encounter, such as differences in language, text format (printed or handwritten), specimen age and nomenclatural type status. We are making these specimens available with a Creative Commons Zero licence waiver and with permanent online storage of the data. By doing this, we are minimising the obstacles to the use of these images for transcription training. This benchmark dataset of images may also be used where a defined and documented set of herbarium specimens is needed, such as for the extraction of morphological traits, handwriting recognition and colour analysis of specimens.
en
dc.format.extent
15 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
herbarium specimen
en
dc.subject
digitization
en
dc.subject
transcription
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::580 Pflanzen (Botanik)::580 Pflanzen (Botanik)
dc.title
A benchmark dataset of herbarium specimen images with label data
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
e31817
dcterms.bibliographicCitation.doi
10.3897/BDJ.7.e31817
dcterms.bibliographicCitation.journaltitle
Biodiversity Data Journal
dcterms.bibliographicCitation.volume
7
dcterms.bibliographicCitation.url
https://doi.org/10.3897/BDJ.7.e31817
refubium.affiliation
Botanischer Garten und Botanisches Museum Berlin-Dahlem (BGBM)
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.issn
1314–2836
dcterms.isPartOf.eissn
1314–2828
refubium.resourceType.provider
WoS-Alert