dc.contributor.author
Gottlieb, Simon Gene
dc.contributor.author
Reinert, Knut
dc.date.accessioned
2025-04-09T07:35:36Z
dc.date.available
2025-04-09T07:35:36Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/47238
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-46956
dc.description.abstract
Finding approximate occurrences of a query in a text using a full-text index is a central problem in stringology with many applications, especially in bioinformatics. The recent work has shown significant speed-ups by combining bidirectional indices and employing variations of search schemes. Search schemes partition a query and describe how to search the resulting parts with a given error bound. The performance of search schemes can be approximated by the node count, which represents an upper bound of the number of search steps. Finding optimum search schemes is a difficult combinatorial optimization problem that becomes hard to solve for four and more errors. This paper improves on a few topics important to search scheme based searches: First, we show how search schemes can be used to model previously published approximate search strategies such as suffix filters, 01*0-seeds, or the pigeonhole principle. This unifies these strategies in the search scheme framework, makes them easily comparable and results in novel search schemes that allow for any number of errors. Second, we present a search scheme construction heuristic, which is on par with optimum search schemes and has a better node count than any known search scheme for equal or above four errors. Finally, using the different search schemes, we show that the node count measure is not an ideal performance metric and therefore propose an improved performance metric called the weighted node count, which approximates a search algorithm’s run time much more accurately.
en
dc.format.extent
9 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
full-text index
en
dc.subject
bioinformatics
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
SeArcH schemes for Approximate stRing mAtching
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
lqaf025
dcterms.bibliographicCitation.doi
10.1093/nargab/lqaf025
dcterms.bibliographicCitation.journaltitle
NAR Genomics and Bioinformatics
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
7
dcterms.bibliographicCitation.url
https://doi.org/10.1093/nargab/lqaf025
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2631-9268
refubium.resourceType.provider
WoS-Alert