dc.contributor.author
Bansal, Vikas
dc.date.accessioned
2018-06-07T21:36:48Z
dc.date.available
2016-08-08T10:21:55.892Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/8178
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-12377
dc.description.abstract
The advent of the high-throughput sequencing (HTS) technology has greatly
accelerated research in life sciences. Due to its low cost and high
efficiency, it is nowadays commonly used to answer various biological
questions. In general, in HTS, the sequence of millions of DNA fragments is
determined in parallel and these fragments can in turn be generated using
different sequencing methods. With the rapid advancement of HTS technologies,
their applications seem almost endless, for example it is now possible to
sequence an entire genome in less than one day. Besides whole genome
sequencing, HTS has various other applications like targeted resequencing,
quantification of gene expression profiles (RNA-seq) and genome-wide
identification of protein-DNA interactions such as transcription factor
binding sites or chromatin histone marks (ChIP-seq). However, the analysis of
the massive datasets generated by HTS is only possible with sophisticated
bioinformatics methods. In this thesis, I have presented computational
approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq
and ChIP-seq, aimed at answering biological questions regarding cardiac
disease and skeletal muscle development. First, a novel copy number variation
(CNV) calling method was developed to identify individual disease-relevant
CNVs using exome or targeted resequencing data of small sets of samples.
Detecting CNVs from targeted resequencing data is difficult due to non-uniform
read-depth between captured regions. Moreover, a method was needed to detect
personalized CNVs from small cohort of patients without using controls. Thus,
we developed such a method and evaluated it using publicly available data of
eight HapMap samples, and subsequently applied it to a small number of
Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two
publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified
more CNVs for HapMap samples as compared to CoNIFER and our method; however,
the positive predictive value was very low. Therefore, we decided not to use
ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we
identified more CNVs in both the HapMap samples as well as in our TOF cohort.
In the TOF cohort (comprising eight cases), we found four copy number gains in
three patients. All four gains could be validated and, in addition, the three
genes affected by CNVs were found to be important regulators of heart
development (NOTCH1, ISL1) or were located in a region already associated with
cardiac malformations (PRODH). The second study presented in this thesis was
focused on the stable enrichment patterns of histone modifications (H3K4me2
and H3K4me3) in combination with a tissue-specific transcription factor (MyoD)
that regulate myogenic differentiation. Here, we found specific H3K4me2/3
profiles on muscle-relevant genes. In general, the average profile of H3K4me3
was enriched directly downstream of transcription start sites, whereas H3K4me2
was located further over the gene body. Furthermore, our study revealed a
significant stronger binding of MyoD to this particular subset of genes, with
a predominantly repressive role of MyoD. Interestingly, the results suggested
that MyoD binds and down-regulates Patz1 during myogenic differentiation,
which might provide an important regulatory mechanism to promote myogenic
differentiation. Finally, a pipeline was developed to identify differential
exon usage from RNA-seq data, with the intention of identifying the exons that
are excluded or included. Almost a decade ago, the Sperling lab identified
Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression
was significantly up-regulated in the right ventricle of TOF patients. It was
shown that Dpf3 is specifically expressed in heart and somites and binds
methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is
known that several proteins, which bind chromatin histone modifications,
interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing,
we compared gene expression profiles (mRNA-seq) generated from the right and
left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice.
Basically, the established pipeline for the identification of the differential
exon usage is based on the estimation of percent-spliced-in (PSI). The results
suggested that Dpf3 might not play a significant role in splicing; however,
further investigations are required. In summary, within this thesis, I have
developed and applied different computational methods for analyzing CNVs in
small cohorts of patients, patterns of histone modifications and differential
exon usage.
de
dc.description.abstract
The advent of the high-throughput sequencing (HTS) technology has greatly
accelerated research in life sciences. Due to its low cost and high
efficiency, it is nowadays commonly used to answer various biological
questions. In general, in HTS, the sequence of millions of DNA fragments is
determined in parallel and these fragments can in turn be generated using
different sequencing methods. With the rapid advancement of HTS technologies,
their applications seem almost endless, for example it is now possible to
sequence an entire genome in less than one day. Besides whole genome
sequencing, HTS has various other applications like targeted resequencing,
quantification of gene expression profiles (RNA-seq) and genome-wide
identification of protein-DNA interactions such as transcription factor
binding sites or chromatin histone marks (ChIP-seq). However, the analysis of
the massive datasets generated by HTS is only possible with sophisticated
bioinformatics methods. In this thesis, I have presented computational
approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq
and ChIP-seq, aimed at answering biological questions regarding cardiac
disease and skeletal muscle development. First, a novel copy number variation
(CNV) calling method was developed to identify individual disease-relevant
CNVs using exome or targeted resequencing data of small sets of samples.
Detecting CNVs from targeted resequencing data is difficult due to non-uniform
read-depth between captured regions. Moreover, a method was needed to detect
personalized CNVs from small cohort of patients without using controls. Thus,
we developed such a method and evaluated it using publicly available data of
eight HapMap samples, and subsequently applied it to a small number of
Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two
publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified
more CNVs for HapMap samples as compared to CoNIFER and our method; however,
the positive predictive value was very low. Therefore, we decided not to use
ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we
identified more CNVs in both the HapMap samples as well as in our TOF cohort.
In the TOF cohort (comprising eight cases), we found four copy number gains in
three patients. All four gains could be validated and, in addition, the three
genes affected by CNVs were found to be important regulators of heart
development (NOTCH1, ISL1) or were located in a region already associated with
cardiac malformations (PRODH). The second study presented in this thesis was
focused on the stable enrichment patterns of histone modifications (H3K4me2
and H3K4me3) in combination with a tissue-specific transcription factor (MyoD)
that regulate myogenic differentiation. Here, we found specific H3K4me2/3
profiles on muscle-relevant genes. In general, the average profile of H3K4me3
was enriched directly downstream of transcription start sites, whereas H3K4me2
was located further over the gene body. Furthermore, our study revealed a
significant stronger binding of MyoD to this particular subset of genes, with
a predominantly repressive role of MyoD. Interestingly, the results suggested
that MyoD binds and down-regulates Patz1 during myogenic differentiation,
which might provide an important regulatory mechanism to promote myogenic
differentiation. Finally, a pipeline was developed to identify differential
exon usage from RNA-seq data, with the intention of identifying the exons that
are excluded or included. Almost a decade ago, the Sperling lab identified
Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression
was significantly up-regulated in the right ventricle of TOF patients. It was
shown that Dpf3 is specifically expressed in heart and somites and binds
methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is
known that several proteins, which bind chromatin histone modifications,
interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing,
we compared gene expression profiles (mRNA-seq) generated from the right and
left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice.
Basically, the established pipeline for the identification of the differential
exon usage is based on the estimation of percent-spliced-in (PSI). The results
suggested that Dpf3 might not play a significant role in splicing; however,
further investigations are required. In summary, within this thesis, I have
developed and applied different computational methods for analyzing CNVs in
small cohorts of patients, patterns of histone modifications and differential
exon usage.
en
dc.format.extent
vi, 153 Seiten
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject
High-throughput sequencing
dc.subject
copy number variations
dc.subject
tetralogy of fallot
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::005 Computerprogrammierung, Programme, Daten
dc.subject.ddc
500 Naturwissenschaften und Mathematik
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::576 Genetik und Evolution
dc.title
Computational Analysis of High-Throughput Sequencing Data in Cardiac Disease
and Skeletal Muscle Development
dc.contributor.firstReferee
Prof. Dr. Martin Vingron
dc.contributor.furtherReferee
Prof. Dr. Silke Rickert-Sperling
dc.date.accepted
2016-07-21
dc.identifier.urn
urn:nbn:de:kobv:188-fudissthesis000000102646-8
dc.title.translated
Computerbasierte Analyse von Hochdurchsatz-Sequenzierungsdaten hinsichtlich
der Herzerkrankung und Skelettmuskelentwicklung
de
refubium.affiliation
Mathematik und Informatik
de
refubium.mycore.fudocsId
FUDISS_thesis_000000102646
refubium.mycore.derivateId
FUDISS_derivate_000000019812
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access