Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

Darvish, Mitra; Seiler, Enrico; Mehringer, Svenja; Rahn, Rene; Reinert, Knut

doi:10.1093/bioinformatics/btac492

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

Haupttitel:

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

Autor*in:

Darvish, Mitra; Seiler, Enrico; Mehringer, Svenja; Rahn, Rene; Reinert, Knut

Erscheinungsjahr:

2022

Datum der Freigabe:

2022-10-06T09:02:48Z

Abstract:

Motivation

The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. Results

As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/35923
http://dx.doi.org/10.17169/refubium-35638

Teil des Identifiers:

e-ISSN (online): 1460-2059

Sprache:

Englisch

Freie Schlagwörter:

bioinformatics
sequencing data
quantification

DDC-Klassifikation:

004 Datenverarbeitung; Informatik

Publikationstyp:

Wissenschaftlicher Artikel