The advent of the high-throughput sequencing (HTS) technology has greatly accelerated research in life sciences. Due to its low cost and high efficiency, it is nowadays commonly used to answer various biological questions. In general, in HTS, the sequence of millions of DNA fragments is determined in parallel and these fragments can in turn be generated using different sequencing methods. With the rapid advancement of HTS technologies, their applications seem almost endless, for example it is now possible to sequence an entire genome in less than one day. Besides whole genome sequencing, HTS has various other applications like targeted resequencing, quantification of gene expression profiles (RNA-seq) and genome-wide identification of protein-DNA interactions such as transcription factor binding sites or chromatin histone marks (ChIP-seq). However, the analysis of the massive datasets generated by HTS is only possible with sophisticated bioinformatics methods. In this thesis, I have presented computational approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq and ChIP-seq, aimed at answering biological questions regarding cardiac disease and skeletal muscle development. First, a novel copy number variation (CNV) calling method was developed to identify individual disease-relevant CNVs using exome or targeted resequencing data of small sets of samples. Detecting CNVs from targeted resequencing data is difficult due to non-uniform read-depth between captured regions. Moreover, a method was needed to detect personalized CNVs from small cohort of patients without using controls. Thus, we developed such a method and evaluated it using publicly available data of eight HapMap samples, and subsequently applied it to a small number of Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified more CNVs for HapMap samples as compared to CoNIFER and our method; however, the positive predictive value was very low. Therefore, we decided not to use ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we identified more CNVs in both the HapMap samples as well as in our TOF cohort. In the TOF cohort (comprising eight cases), we found four copy number gains in three patients. All four gains could be validated and, in addition, the three genes affected by CNVs were found to be important regulators of heart development (NOTCH1, ISL1) or were located in a region already associated with cardiac malformations (PRODH). The second study presented in this thesis was focused on the stable enrichment patterns of histone modifications (H3K4me2 and H3K4me3) in combination with a tissue-specific transcription factor (MyoD) that regulate myogenic differentiation. Here, we found specific H3K4me2/3 profiles on muscle-relevant genes. In general, the average profile of H3K4me3 was enriched directly downstream of transcription start sites, whereas H3K4me2 was located further over the gene body. Furthermore, our study revealed a significant stronger binding of MyoD to this particular subset of genes, with a predominantly repressive role of MyoD. Interestingly, the results suggested that MyoD binds and down-regulates Patz1 during myogenic differentiation, which might provide an important regulatory mechanism to promote myogenic differentiation. Finally, a pipeline was developed to identify differential exon usage from RNA-seq data, with the intention of identifying the exons that are excluded or included. Almost a decade ago, the Sperling lab identified Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression was significantly up-regulated in the right ventricle of TOF patients. It was shown that Dpf3 is specifically expressed in heart and somites and binds methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is known that several proteins, which bind chromatin histone modifications, interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing, we compared gene expression profiles (mRNA-seq) generated from the right and left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice. Basically, the established pipeline for the identification of the differential exon usage is based on the estimation of percent-spliced-in (PSI). The results suggested that Dpf3 might not play a significant role in splicing; however, further investigations are required. In summary, within this thesis, I have developed and applied different computational methods for analyzing CNVs in small cohorts of patients, patterns of histone modifications and differential exon usage.
The advent of the high-throughput sequencing (HTS) technology has greatly accelerated research in life sciences. Due to its low cost and high efficiency, it is nowadays commonly used to answer various biological questions. In general, in HTS, the sequence of millions of DNA fragments is determined in parallel and these fragments can in turn be generated using different sequencing methods. With the rapid advancement of HTS technologies, their applications seem almost endless, for example it is now possible to sequence an entire genome in less than one day. Besides whole genome sequencing, HTS has various other applications like targeted resequencing, quantification of gene expression profiles (RNA-seq) and genome-wide identification of protein-DNA interactions such as transcription factor binding sites or chromatin histone marks (ChIP-seq). However, the analysis of the massive datasets generated by HTS is only possible with sophisticated bioinformatics methods. In this thesis, I have presented computational approaches for analyzing data obtained by targeted DNA resequencing, RNA-seq and ChIP-seq, aimed at answering biological questions regarding cardiac disease and skeletal muscle development. First, a novel copy number variation (CNV) calling method was developed to identify individual disease-relevant CNVs using exome or targeted resequencing data of small sets of samples. Detecting CNVs from targeted resequencing data is difficult due to non-uniform read-depth between captured regions. Moreover, a method was needed to detect personalized CNVs from small cohort of patients without using controls. Thus, we developed such a method and evaluated it using publicly available data of eight HapMap samples, and subsequently applied it to a small number of Tetralogy of Fallot (TOF) patients. In addition to our method, we used the two publicly available tools, namely ExomeDepth and CoNIFER. ExomeDepth identified more CNVs for HapMap samples as compared to CoNIFER and our method; however, the positive predictive value was very low. Therefore, we decided not to use ExomeDepth for detecting CNVs in the TOF patients. Compared to CoNIFER, we identified more CNVs in both the HapMap samples as well as in our TOF cohort. In the TOF cohort (comprising eight cases), we found four copy number gains in three patients. All four gains could be validated and, in addition, the three genes affected by CNVs were found to be important regulators of heart development (NOTCH1, ISL1) or were located in a region already associated with cardiac malformations (PRODH). The second study presented in this thesis was focused on the stable enrichment patterns of histone modifications (H3K4me2 and H3K4me3) in combination with a tissue-specific transcription factor (MyoD) that regulate myogenic differentiation. Here, we found specific H3K4me2/3 profiles on muscle-relevant genes. In general, the average profile of H3K4me3 was enriched directly downstream of transcription start sites, whereas H3K4me2 was located further over the gene body. Furthermore, our study revealed a significant stronger binding of MyoD to this particular subset of genes, with a predominantly repressive role of MyoD. Interestingly, the results suggested that MyoD binds and down-regulates Patz1 during myogenic differentiation, which might provide an important regulatory mechanism to promote myogenic differentiation. Finally, a pipeline was developed to identify differential exon usage from RNA-seq data, with the intention of identifying the exons that are excluded or included. Almost a decade ago, the Sperling lab identified Dpf3 (also known as Baf45c) as chromatin remodeling factor, whose expression was significantly up-regulated in the right ventricle of TOF patients. It was shown that Dpf3 is specifically expressed in heart and somites and binds methylated and acetylated lysine residues of histone 3 and 4. Moreover, it is known that several proteins, which bind chromatin histone modifications, interact with splicing factors. Thus, to dissect the role of Dpf3 in splicing, we compared gene expression profiles (mRNA-seq) generated from the right and left ventricle as well as skeletal muscle of Dpf3 knockout and wild-type mice. Basically, the established pipeline for the identification of the differential exon usage is based on the estimation of percent-spliced-in (PSI). The results suggested that Dpf3 might not play a significant role in splicing; however, further investigations are required. In summary, within this thesis, I have developed and applied different computational methods for analyzing CNVs in small cohorts of patients, patterns of histone modifications and differential exon usage.