dc.contributor.author
Seiler, Enrico
dc.contributor.author
Mehringer, Svenja
dc.contributor.author
Darvish, Mitra
dc.contributor.author
Turc, Etienne
dc.contributor.author
Reinert, Knut
dc.date.accessioned
2021-11-09T13:03:15Z
dc.date.available
2021-11-09T13:03:15Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/32629
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-32353
dc.description.abstract
We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.
en
dc.format.extent
19 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
bioinformatics
en
dc.subject
high-performance computing in bioinformatics
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
102782
dcterms.bibliographicCitation.doi
10.1016/j.isci.2021.102782
dcterms.bibliographicCitation.journaltitle
iScience
dcterms.bibliographicCitation.number
7
dcterms.bibliographicCitation.volume
24
dcterms.bibliographicCitation.url
https://doi.org/10.1016/j.isci.2021.102782
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2589-0042
refubium.resourceType.provider
WoS-Alert