Contributions to the detection of non-reference sequences in population-scale NGS data

Krannich, Thomas

Contributions to the detection of non-reference sequences in population-scale NGS data

Metadata

dc.contributor.author

Krannich, Thomas

dc.date.accessioned

2022-06-02T09:10:42Z

dc.date.available

2022-06-02T09:10:42Z

dc.date.issued

2022

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/35041

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-34757

dc.description.abstract

Non-reference sequence (NRS) variants are a less frequently investigated class of genomic structural variants (SV). Here, DNA sequences are found within an individual that are novel with respect to a given reference. NRS occur predominantly due to the fact that a linear reference genome lacks biological diversity and ancestral sequence if it was primarily derived from a single or few individuals. Therefore, newly sequenced individuals can yield genomic sequences which are absent from a reference genome. With the increasing throughput of sequencing technologies, SV detection has become possible across tens of thousands of individuals. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which is a complex computational problem and requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have a limited capability to process large sets of genomes. This thesis introduces novel contributions for the discovery of NRS variants in many genomes, which scale to considerably larger numbers of genomes than previous methods. A practical software tool, PopIns2, that was developed to apply the presented methods is elucidated in greater detail. The highlight among the new contributions is a procedure to merge contig assemblies of unaligned reads from many individuals into a single set of NRS by heuristically generating a weighted minimum path cover for a colored de Bruijn graph. Tests on simulated data show that PopIns2 ranks among the best approaches in terms of quality and reliability and that its approach yields the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets.

dc.format.extent

141 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by-sa/4.0/

dc.subject

Bioinformatics

dc.subject

Algorithms

dc.subject

Genomics

dc.subject

Structural variants

dc.subject

de Bruijn Graph

dc.subject.ddc

000 Computer science, information, and general works::000 Computer Science, knowledge, systems::004 Data processing and Computer science

dc.subject.ddc

500 Natural sciences and mathematics::570 Life sciences::570 Life sciences

dc.title

Contributions to the detection of non-reference sequences in population-scale NGS data

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Reinert, Knut

dc.contributor.furtherReferee

Kehr, Birte

dc.date.accepted

2022-04-29

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-35041-7

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

Show Simple Item Record

This Item appears in the following Collection(s)

Dissertationen FU

Files in This Item

PhD_Thesis.pdf

Description: PDF file of dissertation

Size: 12.13MB

Format: PDF

Checksum (MD5): dc84b8ad820c232e104e84d14e5ec7e4

View/Open

Contributions to the detection of non-reference sequences in population-scale NGS data

Refubium - Freie Universität Berlin Repository

Contributions to the detection of non-reference sequences in population-scale NGS data

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata