dc.contributor.author
Schwab, Remy M.
dc.contributor.author
Gottlieb, Simon Gene
dc.contributor.author
Reinert, Knut
dc.date.accessioned
2025-05-05T05:33:43Z
dc.date.available
2025-05-05T05:33:43Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/47521
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-47239
dc.description.abstract
The scale of modern datasets has necessitated innovations to solve even the most traditional and fundamental of computational problems. Set membership and set cardinality are both examples of simple queries that, for large enough datasets, quickly become challenging using traditional approaches. Interestingly, we find a need for these innovations within the field of biology. Despite the vast differences among living organisms, there exist functions so critical to life that they are conserved in the genomes and proteomes across seemingly unrelated species. Regular expressions (regexes) can serve as a convenient way to represent these conserved sequences compactly. However, despite the strong theoretical foundation and maturity of tools available, the state-of-the-art regex search falls short of what is necessary to quickly scan an enormous database of biological sequences. In this work, we describe a novel algorithm for regex search that reduces the search space by leveraging a recently developed compact data structure for set membership, the hierarchical interleaved bloom filter. We show that the runtime of our method combined with a traditional search outperforms state-of-the-art tools.
en
dc.format.extent
13 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
index-accelerated search
en
dc.subject
highly conserved motifs
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
TetRex: a novel algorithm for index-accelerated search of highly conserved motifs
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
lqaf039
dcterms.bibliographicCitation.doi
10.1093/nargab/lqaf039
dcterms.bibliographicCitation.journaltitle
NAR Genomics and Bioinformatics
dcterms.bibliographicCitation.number
2
dcterms.bibliographicCitation.volume
7
dcterms.bibliographicCitation.url
https://doi.org/10.1093/nargab/lqaf039
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2631-9268
refubium.resourceType.provider
WoS-Alert