Robust algorithms for improved reproducible ChIP-seq and ChIP-nexus peak calling

Hansen, Peter

Robust algorithms for improved reproducible ChIP-seq and ChIP-nexus peak calling

Metadata

dc.contributor.author

Hansen, Peter

dc.date.accessioned

2019-05-24T08:24:55Z

dc.date.available

2019-05-24T08:24:55Z

dc.date.issued

2019

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/24639

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-2402

dc.description.abstract

Transcriptional regulation of gene expression is a central topic in biological research. The DNA sequencing methodology ChIP-seq is used to infer binding sites of transcription factors and histone proteins in a genome-wide fashion. The ChIP-nexus protocol is a further development of ChIP-seq that allows to predict binding sites with much greater accuracy. This thesis is about the primary analysis of ChIP-seq and ChIP-nexus data. The first part provides an application example in which ChIP-seq was used to elucidate the pathomechanism underlying the phenotype of a patient with severe hand and foot malformations carrying a mutation of the gene encoding for the transcription factor HOXD13. In the second part, the ChIP-seq peak caller Q is introduced that addresses shortcomings of the recommended software identified in the course of practical applications. Improvements regarding efficiency and reproducibility were verified within the framework of the ENCODE standards using 38 publicly available datasets. Furthermore, Q was used to characterize a signature of RNA polymerase II and histone modification H3K4me3 peaks that is consistent with the concept of paused open promoters. In the final part, the first bespoke software package Q-nexus for the analysis of ChIP-nexus data is presented. The ChIP-seq caller Q was extended by additional modules that are required for the analysis of ChIP-nexus data. The software makes use of the random barcodes introduced with the ChIP-nexus protocol and was used to characterize specific binding patterns of transcription factors at binding sites. Finally, Q-nexus was compared to two other peak callers with respect to reproducibility of peak calling.

dc.description.abstract

Diese Arbeit handelt von der Entwicklung bioinformatischer Methoden und Software zur Vorhersage von DNA-Protein Interaktionen aus ChIP-seq- und ChIP-nexus-Daten. Die Regulation der Genexpression ist ein zentrales Thema in den Lebens wissenschaften. Die Zellen eines menschlichen Organismus enthalten dieselbe Erbinformation in Form von DNA. Dabei haben verschiedene Zelltypen unterschiedliche Gestalt und Funktion. Auf molekularer Ebene unterscheiden sich Zelltypen vor allem darin, welche der rund 30000 Gene aktiv sind. Damit ein Gen aktiv wird, muss seine genetische Information in funktionelle Moleküle (vorwiegend Proteine) übersetzt werden. Der erste Schritt dieses Vorgangs wird als Transkription bezeichnet und findet direkt an der DNA im Zellkern statt. DNA-bindende Proteine, wie Transkriptionsfaktoren oder Histonproteine, spielen daher eine wichtige Rolle bei der Regulation der Transkription. Inzwischen werden kostengünstige Hochdurchsatzmethoden zur Sequenzierung von DNA, die üblicherweise als Next-Generation-Sequencing (NGS) bezeichnet werden, auch auf Fragestellungen angewendet, die über das reine Erfassen von Basenabfolgen hinaus gehen. Ein Beispiel einer NGS-Anwendung ist ChIP-seq, welche dazu verwendet werden kann, genomweit Protein-DNA Interaktionen für ein gegebenes Zielprotein zu bestimmen. ChIP-nexus ist eine Weiterentwicklung von ChIP-seq mit deutlich erhöhter Auflösung. Im Allgemeinen sind NGS-Daten sehr umfangreich und es hängt vom zugrunde liegenden experimentellen Protokoll ab, wie die Daten auszuwerten sind. Dies erfordert effiziente Algorithmen, die individuelle Lösungen umsetzen und typischerweise auch statistische Modelle beinhalten. Für die vorliegende Arbeit wurden eine Reihe von innovativen Algorithmen entwickelt, die verschiedene Teilprobleme bei der Vorhersage von Protein-DNA Interaktionen aus ChIP-seq- und ChIP-nexus-Daten adressieren. Beispielsweise wurde für die Sättigung genomischer Regionen mit mappierten NGS-Reads, die anhand von Sequenzidentität Positionen im Genom eindeutig zugeordnet werden können, im Rahmen des klassischen Occupancy-Problems statistisch modelliert um ChIP-seq peaks zu bewerten. Dabei stellt das Maß der Sättigung eine Alternative zur konventionellen Read-Tiefe dar und ist über ChIP-seq hinaus auch auf andere NGS-Anwendungen anwendbar. Darüber hinaus wurde für diese Arbeit umfangreiche Software entwickelt, die begleitet von zwei von Publikationen in den Fachzeitschriften Genome Research und BMC Genomics auf der Entwickler-Plattform GitHub bereitgestellt wurde: http://charite.github.io/Q/. Diese Software wurde von der wissenschaftlichen Gemeinschaft bereits diskutiert und angewendet.

dc.format.extent

91 Seiten

dc.language

eng

dc.rights.uri

http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen

dc.subject

ChIP-seq

dc.subject

peak calling

dc.subject

data analysis

dc.subject

reproducibility

dc.subject.ddc

000 Computer science, information, and general works::000 Computer Science, knowledge, systems::004 Data processing and Computer science

dc.title

Robust algorithms for improved reproducible ChIP-seq and ChIP-nexus peak calling

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Robinson, Peter

dc.contributor.furtherReferee

Andrade, Miguel

dc.date.accepted

2019-04-11

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-24639-8

dc.title.translated

Robuste Algorithmen für eine verbesserte reproduzierbare Vorhersage von Proteinbindungstellen

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

dcterms.accessRights.proquest

Show Simple Item Record

This Item appears in the following Collection(s)

Dissertationen FU

Files in This Item

thesis_Hansen.pdf

Size: 7.421MB

Format: PDF

Checksum (MD5): f08bbd24d5b19b4249541f8b921ee3d5

View/Open

Robust algorithms for improved reproducible ChIP-seq and ChIP-nexus peak calling

Refubium - Freie Universität Berlin Repository

Robust algorithms for improved reproducible ChIP-seq and ChIP-nexus peak calling

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata