Computational Analysis of High-Throughput Sequencing Data in Cardiac Disease
and Skeletal Muscle Development

Bansal, Vikas

dc.contributor.author

Bansal, Vikas

dc.date.accessioned

2018-06-07T21:36:48Z

dc.date.available

2016-08-08T10:21:55.892Z

dc.date.issued

2016

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/8178

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-12377

dc.description.abstract

The advent of the high-throughput sequencing (HTS) technology has greatly accelerated research in life sciences. Due to its low cost and high efficiency, it is nowadays commonly used to answer various biological questions. In general, in HTS, the sequence of millions of DNA fragments is determined in parallel and these fragments can in turn be generated using different sequencing methods. With the rapid advancement of HTS technologies, their applications seem almost endless, for example it is now possible to sequence an entire genome in less than one day. Besides whole genome sequencing, HTS has various other applications like targeted resequencing, quantification of gene expression profiles (RNA-seq) and genome-wide identification of protein-DNA interactions such as transcription factor binding sites or chromatin histone marks (ChIP-seq). However, the analysis of the massive datasets generated by HTS is only possible with sophisticated bioinformatics methods. In this thesis, I have presented computational approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq and ChIP-seq, aimed at answering biological questions regarding cardiac disease and skeletal muscle development. First, a novel copy number variation (CNV) calling method was developed to identify individual disease-relevant CNVs using exome or targeted resequencing data of small sets of samples. Detecting CNVs from targeted resequencing data is difficult due to non-uniform read-depth between captured regions. Moreover, a method was needed to detect personalized CNVs from small cohort of patients without using controls. Thus, we developed such a method and evaluated it using publicly available data of eight HapMap samples, and subsequently applied it to a small number of Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified more CNVs for HapMap samples as compared to CoNIFER and our method; however, the positive predictive value was very low. Therefore, we decided not to use ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we identified more CNVs in both the HapMap samples as well as in our TOF cohort. In the TOF cohort (comprising eight cases), we found four copy number gains in three patients. All four gains could be validated and, in addition, the three genes affected by CNVs were found to be important regulators of heart development (NOTCH1, ISL1) or were located in a region already associated with cardiac malformations (PRODH). The second study presented in this thesis was focused on the stable enrichment patterns of histone modifications (H3K4me2 and H3K4me3) in combination with a tissue-specific transcription factor (MyoD) that regulate myogenic differentiation. Here, we found specific H3K4me2/3 profiles on muscle-relevant genes. In general, the average profile of H3K4me3 was enriched directly downstream of transcription start sites, whereas H3K4me2 was located further over the gene body. Furthermore, our study revealed a significant stronger binding of MyoD to this particular subset of genes, with a predominantly repressive role of MyoD. Interestingly, the results suggested that MyoD binds and down-regulates Patz1 during myogenic differentiation, which might provide an important regulatory mechanism to promote myogenic differentiation. Finally, a pipeline was developed to identify differential exon usage from RNA-seq data, with the intention of identifying the exons that are excluded or included. Almost a decade ago, the Sperling lab identified Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression was significantly up-regulated in the right ventricle of TOF patients. It was shown that Dpf3 is specifically expressed in heart and somites and binds methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is known that several proteins, which bind chromatin histone modifications, interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing, we compared gene expression profiles (mRNA-seq) generated from the right and left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice. Basically, the established pipeline for the identification of the differential exon usage is based on the estimation of percent-spliced-in (PSI). The results suggested that Dpf3 might not play a significant role in splicing; however, further investigations are required. In summary, within this thesis, I have developed and applied different computational methods for analyzing CNVs in small cohorts of patients, patterns of histone modifications and differential exon usage.

de

dc.description.abstract

The advent of the high-throughput sequencing (HTS) technology has greatly accelerated research in life sciences. Due to its low cost and high efficiency, it is nowadays commonly used to answer various biological questions. In general, in HTS, the sequence of millions of DNA fragments is determined in parallel and these fragments can in turn be generated using different sequencing methods. With the rapid advancement of HTS technologies, their applications seem almost endless, for example it is now possible to sequence an entire genome in less than one day. Besides whole genome sequencing, HTS has various other applications like targeted resequencing, quantification of gene expression profiles (RNA-seq) and genome-wide identification of protein-DNA interactions such as transcription factor binding sites or chromatin histone marks (ChIP-seq). However, the analysis of the massive datasets generated by HTS is only possible with sophisticated bioinformatics methods. In this thesis, I have presented computational approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq and ChIP-seq, aimed at answering biological questions regarding cardiac disease and skeletal muscle development. First, a novel copy number variation (CNV) calling method was developed to identify individual disease-relevant CNVs using exome or targeted resequencing data of small sets of samples. Detecting CNVs from targeted resequencing data is difficult due to non-uniform read-depth between captured regions. Moreover, a method was needed to detect personalized CNVs from small cohort of patients without using controls. Thus, we developed such a method and evaluated it using publicly available data of eight HapMap samples, and subsequently applied it to a small number of Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified more CNVs for HapMap samples as compared to CoNIFER and our method; however, the positive predictive value was very low. Therefore, we decided not to use ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we identified more CNVs in both the HapMap samples as well as in our TOF cohort. In the TOF cohort (comprising eight cases), we found four copy number gains in three patients. All four gains could be validated and, in addition, the three genes affected by CNVs were found to be important regulators of heart development (NOTCH1, ISL1) or were located in a region already associated with cardiac malformations (PRODH). The second study presented in this thesis was focused on the stable enrichment patterns of histone modifications (H3K4me2 and H3K4me3) in combination with a tissue-specific transcription factor (MyoD) that regulate myogenic differentiation. Here, we found specific H3K4me2/3 profiles on muscle-relevant genes. In general, the average profile of H3K4me3 was enriched directly downstream of transcription start sites, whereas H3K4me2 was located further over the gene body. Furthermore, our study revealed a significant stronger binding of MyoD to this particular subset of genes, with a predominantly repressive role of MyoD. Interestingly, the results suggested that MyoD binds and down-regulates Patz1 during myogenic differentiation, which might provide an important regulatory mechanism to promote myogenic differentiation. Finally, a pipeline was developed to identify differential exon usage from RNA-seq data, with the intention of identifying the exons that are excluded or included. Almost a decade ago, the Sperling lab identified Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression was significantly up-regulated in the right ventricle of TOF patients. It was shown that Dpf3 is specifically expressed in heart and somites and binds methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is known that several proteins, which bind chromatin histone modifications, interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing, we compared gene expression profiles (mRNA-seq) generated from the right and left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice. Basically, the established pipeline for the identification of the differential exon usage is based on the estimation of percent-spliced-in (PSI). The results suggested that Dpf3 might not play a significant role in splicing; however, further investigations are required. In summary, within this thesis, I have developed and applied different computational methods for analyzing CNVs in small cohorts of patients, patterns of histone modifications and differential exon usage.

en

dc.format.extent

vi, 153 Seiten

dc.language

eng

dc.rights.uri

http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen

dc.subject

High-throughput sequencing

dc.subject

copy number variations

dc.subject

MyoD

dc.subject

Patz1

dc.subject

Epigenetics

dc.subject

tetralogy of fallot

dc.subject.ddc

000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::005 Computerprogrammierung, Programme, Daten

dc.subject.ddc

500 Naturwissenschaften und Mathematik

dc.subject.ddc

500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie

dc.subject.ddc

500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::576 Genetik und Evolution

dc.title

Computational Analysis of High-Throughput Sequencing Data in Cardiac Disease and Skeletal Muscle Development

dc.type

Dissertation

dcterms.format

Text

de

dc.contributor.gender

m

dc.contributor.firstReferee

Prof. Dr. Martin Vingron

dc.contributor.furtherReferee

Prof. Dr. Silke Rickert-Sperling

dc.date.accepted

2016-07-21

dc.identifier.urn

urn:nbn:de:kobv:188-fudissthesis000000102646-8

dc.title.translated

Computerbasierte Analyse von Hochdurchsatz-Sequenzierungsdaten hinsichtlich der Herzerkrankung und Skelettmuskelentwicklung

de

refubium.affiliation

Mathematik und Informatik

de

refubium.mycore.fudocsId

FUDISS_thesis_000000102646

refubium.mycore.derivateId

FUDISS_derivate_000000019812

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

Show Simple Item Record

Computational Analysis of High-Throughput Sequencing Data in Cardiac Disease and Skeletal Muscle Development

Refubium - Freie Universität Berlin Repository

Computational Analysis of High-Throughput Sequencing Data in Cardiac Disease and Skeletal Muscle Development

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata