A combinatorial approach to RNA sequence-structure alignments

Bauer, Markus

A combinatorial approach to RNA sequence-structure alignments

Metadaten

dc.contributor.author

Bauer, Markus

dc.date.accessioned

2018-06-07T18:16:21Z

dc.date.available

2008-07-30T14:02:27.686Z

dc.date.issued

2008

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/4820

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-9019

dc.description.abstract

Until a couple of years ago the scientific mainstream held that genetic information, stored as DNA strands, is transcribed to RNA, and RNA sequences are in turn translated to proteins, the actual functional units in the cell. RNA was generally believed to be a helper molecule in the cell until the beginning of the new millennium. This view changed. We see the potential of RNA as one of the key cellular players. In this thesis we present a novel framework for computing sequence-structure alignments of RNA sequences. Our contribution is twofold: first, we give a graph-theoretic model for the computation of multiple sequence-structure alignments. We phrase the model as an integer linear program (ILP) and show how we can relax the ILP such that we are able to compute optimal or near-optimal solutions for the original problem. In a subsequent step, we augment the initial model with stacking energies. Stacking base pairs greatly contribute to the energetic stability of the overall structure and should therefore be additionally rewarded. We extend the original ILP such that stacking energies are incorporated. Second, we give extensive computational results on real data from the RFAM database. We compare the performance of truly multiple sum-of-pairs sequence-structure alignments to heuristic sequence-structure alignments. We show that the objective function value of the sum-of-pairs model is generally higher compared to the heuristically inferred alignments. At the same time, we sketch the computational limits for the sum-of-pairs multiple sequence-structure model. The computational costs for computing exact multiple sequence-structure alignments are generally very high. To validate our approach on a larger test set, we run two implementations that take two sequences as their input. LaRA and SLaRA---based on the initial and the stack model---compute all pairwise sequence-structure alignments and use the external program TCOFFEE to infer a consistency-based multiple sequence-structure alignment. Additionally, we run the progressive versions PLaRA and PSLaRA on the same input data set. Our experiments on the BRAliBase benchmark set show that our tools are top-ranked for all input classes. Furthermore, our implementations need less running time compared to similar approaches. Subsequently, we compare two different algorithms for computing the optimal value of the Lagrangian dual and show that in our test setting the conceptually easier subgradient method is superior to the bundle method. Finally, we incorporate our Lagrangian relaxation approach into a branch-and-bound framework. We show for which instances we are able to compute provably optimal solutions and compare our results with previously published results of a branch-and-bound approach for the related quadratic knapsack problem.

dc.description.abstract

Wissenschaftliche Entdeckungen der letzten Jahre haben die Molekulargenetik revolutioniert: bis dahin ging man von einem linearen Informationsfluss aus, in dem DNA zu RNA, und RNA in Proteine uebersetzt wird. RNA nahm dabei die Rolle eines Hilfsmolekuels ein, das selbst---bis auf wenige Ausnahmen--- keinerlei katalytische Eigenschaften hat. In den letzten Jahren zeigte sich jedoch, dass man von einer viel komplexeren Organisation der zellulaeren Prozesse ausgehen muss: Nichtkodierende RNA-Sequenzen, d.h. RNA-Sequenzen die keine Proteine kodieren, spielen dabei eine wesentliche Rolle. Bei der Analyse von RNA-Sequenzen ist es wichtig, Strukturinformation zu beachten, da die sogenannte Sekundaerstruktur, und nicht so sehr die eigentliche Sequenzinformation erhalten bleibt. Alignmentprogramme von divergenten RNA- Sequenzen muessen deshalb Strukturinformation miteinbeziehen, um zuverlaessige Alignments zu erstellen. In dieser Arbeit stellen wir ein neues Modell fuer das Berechnen von multiplen Sequenz-Struktur-Alignments von RNA-Sequenzen vor. Wir beschreiben Struktur-Alignments als graphentheoretisches Problem und zeigen danach, wie man dieses Modell als ganzzahliges lineares Programm (ILP) formulieren kann. Wir relaxieren das ILP im Folgenden in einer Lagrangeschen Weise, d.h. wir verschieben eine Klasse von Bedingungen---versehen mit einem Strafterm-Vektor---in die Zielfunktion und loesen das resultierende ILP. Zusaetzlich beschreiben wir eine Erweiterung des ILPs, bei der sogenannte Stackingenergien in die Berechnung des Sequenz-Struktur-Alignments einfließen. Im Rahmen einer umfangreichen Auswertung vergleichen wir die Implementierungen unserer Modelle mit zahlreichen anderen aktuellen Programmen. Unsere Programme liefern auf einem kuerzlich publizierten Benchmark-Datensatz die besten Ergebnisse fuer alle Klassen von Eingabedaten. Zusaetzlich geben wir einen Vergleich zwischen dem Subgradienten-Verfahren und der Buendel-Methode zum Loesen des dualen Problems. Wir koennen zeigen, dass fuer Standard- Eingabeinstanzen das Subgradienten-Verfahren normalerweise bessere Ergebnisse liefert. Den Abschluss der praktischen Auswertung bildet die Beschreibung eines Branch-und-Bound-Verfahrens, das---gegeben die Schranken aus dem Subgradienten-Verfahren---beweisbar optimale Loesungen berechnet. Wir zeigen, dass der Anwendungsrahmen dieses Ansatzes in etwa dem entspricht, was fuer das verwandte Problem des quadratischen Rucksackproblems publiziert wurde.

dc.format.extent

117 S.

dc.language

eng

dc.rights.uri

http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen

dc.subject.ddc

500 Naturwissenschaften und Mathematik

dc.subject.ddc

000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik

dc.subject.ddc

500 Naturwissenschaften und Mathematik::510 Mathematik

dc.title

A combinatorial approach to RNA sequence-structure alignments

dc.type

Dissertation

dcterms.format

Text

dc.contributor.contact

mbauer@inf.fu-berlin.de

dc.contributor.gender

dc.contributor.firstReferee

Prof. Dr. Knut Reinert

dc.contributor.furtherReferee

Dr. Gunnar W. Klau

dc.contributor.furtherReferee

Prof. Dr. Peter F. Stadler

dc.date.accepted

2008-07-10

dc.identifier.urn

urn:nbn:de:kobv:188-fudissthesis000000004656-4

dc.title.translated

Ein kombinatorischer Ansatz für RNA Sequenz-Struktur-Alignments

refubium.affiliation

Mathematik und Informatik

refubium.mycore.fudocsId

FUDISS_thesis_000000004656

refubium.mycore.derivateId

FUDISS_derivate_000000004148

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

Zur Kurzanzeige

Das Dokument erscheint in:

Dissertationen FU

Dateien zu dieser Ressource

thesis.pdf

Größe: 1.566MB

Format: PDF

Prüfsumme (MD5): 09a163d821220635aa11a4d124a9f939

Öffnen

A combinatorial approach to RNA sequence-structure alignments

Refubium - Repositorium der Freien Universität Berlin

A combinatorial approach to RNA sequence-structure alignments

Metadaten

Das Dokument erscheint in:

Dateien zu dieser Ressource

Metadaten exportieren