dc.contributor.author
Pan, Chenxu
dc.contributor.author
Reinert, Knut
dc.date.accessioned
2024-05-14T11:38:11Z
dc.date.available
2024-05-14T11:38:11Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/43533
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-43249
dc.description.abstract
Motivation
The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer bias), and computational efficiency are three critical measurements for minimizer selection schemes. However, there exist trade-offs between kinds of minimizer variants. Generic, effective, and efficient are always the requirements for high-performance minimizer algorithms.
Results
We propose a simple minimizer operator as a refinement of the standard canonical minimizer. It takes only a few operations to compute. However, it can improve the k-mer repetitiveness, especially for the lexicographic order. It applies to other selection schemes of total orders (e.g. random orders). Moreover, it is computationally efficient and the density is close to that of the standard minimizer. The refined minimizer may benefit high-performance applications like binning and read mapping.
Availability and implementation
The source code of the benchmark in this work is available at the github repository https://github.com/xp3i4/mini_benchmark
en
dc.format.extent
8 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
minimizer concept
en
dc.subject
data structure
en
dc.subject
sequence sketching
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
A simple refined DNA minimizer operator enables 2-fold faster computation
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
btae045
dcterms.bibliographicCitation.doi
10.1093/bioinformatics/btae045
dcterms.bibliographicCitation.journaltitle
Bioinformatics
dcterms.bibliographicCitation.number
2
dcterms.bibliographicCitation.volume
40
dcterms.bibliographicCitation.url
https://doi.org/10.1093/bioinformatics/btae045
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1367-4811
refubium.resourceType.provider
WoS-Alert