ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

Piro, Vitor C.; Dadi, Temesgen H.; Seiler, Enrico; Reinert, Knut; Renard, Bernhard Y.

doi:10.1093/bioinformatics/btaa458

ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

Metadaten

dc.contributor.author

Piro, Vitor C.

dc.contributor.author

Dadi, Temesgen H.

dc.contributor.author

Seiler, Enrico

dc.contributor.author

Reinert, Knut

dc.contributor.author

Renard, Bernhard Y.

dc.date.accessioned

2020-12-09T09:14:06Z

dc.date.available

2020-12-09T09:14:06Z

dc.date.issued

2020

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/29016

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-28766

dc.description.abstract

Motivation: The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. Results: Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification.

dc.format.extent

9 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by-nc/4.0/

dc.subject

algorithms

dc.subject

alignment

dc.subject

metagenomics classification

dc.subject.ddc

000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::000 Informatik, Informationswissenschaft, allgemeine Werke

dc.title

ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

dc.type

Wissenschaftlicher Artikel

dcterms.bibliographicCitation.doi

10.1093/bioinformatics/btaa458

dcterms.bibliographicCitation.journaltitle

Bioinformatics

dcterms.bibliographicCitation.number

Supplement_1

dcterms.bibliographicCitation.volume

dcterms.bibliographicCitation.url

https://doi.org/10.1093/bioinformatics/btaa458

refubium.affiliation

Mathematik und Informatik

refubium.affiliation.other

Institut für Bioinformatik

refubium.resourceType.isindependentpub

dcterms.accessRights.openaire

open access

dcterms.isPartOf.issn

1367-4803

dcterms.isPartOf.eissn

1460-2059

refubium.resourceType.provider

WoS-Alert

Zur Kurzanzeige

Das Dokument erscheint in:

Dokumente FU

Dateien zu dieser Ressource

btaa458.pdf

Größe: 1.069MB

Format: PDF

Prüfsumme (MD5): 944d78722287d2257be107b5e3556352

Öffnen

ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

Refubium - Repositorium der Freien Universität Berlin

ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

Metadaten

Das Dokument erscheint in:

Dateien zu dieser Ressource

Metadaten exportieren