Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data

Darvish, Mitra Darja

Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data

Metadata

dc.contributor.author

Darvish, Mitra Darja

dc.date.accessioned

2024-03-07T10:43:27Z

dc.date.available

2024-03-07T10:43:27Z

dc.date.issued

2023

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/42334

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-42059

dc.description.abstract

Due to advances in sequencing technologies, the amount of sequencing data is continuously increasing and has reached an amount that calls for new data management methods, to actually utilize the sequencing data. In the last years, a number of different indices haven been developed to simplify the data, thereby reducing the amount of space needed and enabling analysis on large collections of sequencing data. In this thesis, the index Needle will be introduced, which allows (semi-)quantitative analyses on large data sets and outperforms other existing solutions with regards to both space and speed. Needle, like other indices, is based on alignment-free methods because in this way the costly step of classical sequence analyses, the alignment, can be omitted. Alignment-free methods are based on short subsequences of the actual sequence data. There are multiple different methods to determine these subsequences and this thesis provides a detailed analysis and comparison to determine the best method for such indices. Moreover, the benchmarking application minions is introduced, which will make comparisons between these methods easier as adding future new methods is simple. Needle is capable of utilizing large collections of sequencing data and determining their gene expressions. Three analyses are performed, which act as a proof of concept for how Needle can be utilized for large collections of sequencing data. Therefore, Needle is applied in this thesis to find cancer signatures, a newly annotated mouse transcript and tissue specific differentially expressed genes for different large data sets. In summary, indices like Needle are needed to actually take advantage of the data wealth currently present in the biological and medical research field

dc.format.extent

143 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by-nc-sa/4.0/

dc.subject

Alignment-free

dc.subject

NGS

dc.subject

sequencing data

dc.subject.ddc

000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::005 Computerprogrammierung, Programme, Daten

dc.title

Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

female

dc.contributor.firstReferee

Reinert, Knut

dc.contributor.furtherReferee

Iqbal, Zamin

dc.date.accepted

2023-12-08

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-42334-7

dc.title.translated

Nutzung alignment-freier Methoden für quantitative Geneexpressionsanalysen von großen Kollektionen von Sequenzdaten

ger

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

dcterms.accessRights.proquest

Show Simple Item Record

This Item appears in the following Collection(s)

Dissertationen FU

Files in This Item

PhD_Mitra_Darvish.pdf

Size: 11.72MB

Format: PDF

Checksum (MD5): ff736276e8e227ce7a5ac2b43ecad1eb

View/Open

Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data

Refubium - Freie Universität Berlin Repository

Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata