dc.contributor.author
Darvish, Mitra Darja
dc.date.accessioned
2024-03-07T10:43:27Z
dc.date.available
2024-03-07T10:43:27Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/42334
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-42059
dc.description.abstract
Due to advances in sequencing technologies, the amount of sequencing data is continuously increasing and has reached an amount that calls for new data management
methods, to actually utilize the sequencing data. In the last years, a number of
different indices haven been developed to simplify the data, thereby reducing the
amount of space needed and enabling analysis on large collections of sequencing
data.
In this thesis, the index Needle will be introduced, which allows (semi-)quantitative
analyses on large data sets and outperforms other existing solutions with regards to
both space and speed.
Needle, like other indices, is based on alignment-free methods because in this
way the costly step of classical sequence analyses, the alignment, can be omitted.
Alignment-free methods are based on short subsequences of the actual sequence data.
There are multiple different methods to determine these subsequences and this thesis
provides a detailed analysis and comparison to determine the best method for such
indices. Moreover, the benchmarking application minions is introduced, which will
make comparisons between these methods easier as adding future new methods is
simple.
Needle is capable of utilizing large collections of sequencing data and determining
their gene expressions. Three analyses are performed, which act as a proof of concept
for how Needle can be utilized for large collections of sequencing data. Therefore,
Needle is applied in this thesis to find cancer signatures, a newly annotated mouse
transcript and tissue specific differentially expressed genes for different large data
sets.
In summary, indices like Needle are needed to actually take advantage of the
data wealth currently present in the biological and medical research field
en
dc.format.extent
143 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject
Alignment-free
en
dc.subject
sequencing data
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::005 Computerprogrammierung, Programme, Daten
dc.title
Utilizing alignment-free methods to enable quantitative gene expression analysis of large collections of sequencing data
dc.contributor.gender
female
dc.contributor.firstReferee
Reinert, Knut
dc.contributor.furtherReferee
Iqbal, Zamin
dc.date.accepted
2023-12-08
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-42334-7
dc.title.translated
Nutzung alignment-freier Methoden für quantitative Geneexpressionsanalysen von großen Kollektionen von Sequenzdaten
ger
refubium.affiliation
Mathematik und Informatik
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access
dcterms.accessRights.proquest
accept