dc.contributor.author
Miranda, Fábio M.
dc.contributor.author
Azevedo, Vasco C.
dc.contributor.author
Ramos, Rommel J.
dc.contributor.author
Renard, Bernhard Y.
dc.contributor.author
Piro, Vitor C.
dc.date.accessioned
2024-07-03T06:44:07Z
dc.date.available
2024-07-03T06:44:07Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/44069
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-43778
dc.description.abstract
Background
Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors.
Results
Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI.
Conclusions
HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac.
en
dc.format.extent
13 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Local Hierarchical Classification
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Hitac: a hierarchical taxonomic classifier for fungal ITS sequences compatible with QIIME2
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
228
dcterms.bibliographicCitation.doi
10.1186/s12859-024-05839-x
dcterms.bibliographicCitation.journaltitle
BMC Bioinformatics
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
25
dcterms.bibliographicCitation.url
https://doi.org/10.1186/s12859-024-05839-x
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.funding
Springer Nature DEAL
refubium.note.author
Die Publikation wurde aus Open Access Publikationsgeldern der Freien Universität Berlin gefördert.
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1471-2105