dc.contributor.author
Winkler, Jörg
dc.date.accessioned
2023-09-18T12:47:02Z
dc.date.available
2023-09-18T12:47:02Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/40867
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-40588
dc.description.abstract
The function of non-coding RNA sequences is largely determined by their spatial conformation. This is the secondary structure of the molecule, which is formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. Essential tasks for discovering yet unknown RNA families and inferring their possible functions are the structural alignment of RNAs and the subsequent search of the derived structural motifs. These tasks demand a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains pseudoknots, which are overlapping interactions that add additional complexity to the analysis and are often ignored in available software.
In this thesis, I present LaRA 2 and MaRs, two SeqAn-based software tools that implement algorithms for finding sequence-structure motifs in genomic sequences. In contrast to other programs, my tools can handle arbitrary pseudoknots. They use multithreading for parallel execution and are implemented in modern C++ code for maximal longevity and performance.
LaRA2 is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. It uses a new heuristic for computing a lower boundary to the solution and employs vectorization techniques for speeding up the time-critical parts of the algorithm.
MaRs can be applied in a workflow right after LaRA2 and derives sequence-structure motifs from the structural alignments. The motifs are descriptors of the RNA sequences’ properties and drive the search for homologs in genomic sequences. MaRs employs a bidirectional index on the genomic sequences and an optimized multithreaded search strategy for finding the matches really fast. The use of a thread pool, effective pruning strategies, and a low memory footprint ensure that MaRs is capable of processing extremely large data sets.
en
dc.format.extent
XIII, 137 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc/4.0/
dc.subject
Bioinformatics
en
dc.subject
RNA secondary structure
en
dc.subject
Alignment algorithm
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::005 Computerprogrammierung, Programme, Daten
dc.title
Algorithms for finding RNA sequence-structure motifs
dc.contributor.gender
male
dc.contributor.firstReferee
Reinert, Knut
dc.contributor.furtherReferee
Marsico, Annalisa
dc.date.accepted
2023-09-08
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-40867-6
dc.title.translated
Algorithmen zum Finden von RNA Sequenz-Struktur-Motiven
ger
refubium.affiliation
Mathematik und Informatik
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access