dc.contributor.author
Darvish, Mitra
dc.contributor.author
Seiler, Enrico
dc.contributor.author
Mehringer, Svenja
dc.contributor.author
Rahn, Rene
dc.contributor.author
Reinert, Knut
dc.date.accessioned
2022-10-06T09:02:48Z
dc.date.available
2022-10-06T09:02:48Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/35923
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-35638
dc.description.abstract
Motivation
The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data.
Results
As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query.
en
dc.format.extent
9 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
bioinformatics
en
dc.subject
sequencing data
en
dc.subject
quantification
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
btac492
dcterms.bibliographicCitation.doi
10.1093/bioinformatics/btac492
dcterms.bibliographicCitation.journaltitle
Bioinformatics
dcterms.bibliographicCitation.number
17
dcterms.bibliographicCitation.pagestart
4100
dcterms.bibliographicCitation.pageend
4108
dcterms.bibliographicCitation.volume
38
dcterms.bibliographicCitation.url
https://doi.org/10.1093/bioinformatics/btac492
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1460-2059
refubium.resourceType.provider
WoS-Alert