Due to advances in sequencing technologies, the amount of sequencing data is continuously increasing and has reached an amount that calls for new data management methods, to actually utilize the sequencing data. In the last years, a number of different indices haven been developed to simplify the data, thereby reducing the amount of space needed and enabling analysis on large collections of sequencing data. In this thesis, the index Needle will be introduced, which allows (semi-)quantitative analyses on large data sets and outperforms other existing solutions with regards to both space and speed. Needle, like other indices, is based on alignment-free methods because in this way the costly step of classical sequence analyses, the alignment, can be omitted. Alignment-free methods are based on short subsequences of the actual sequence data. There are multiple different methods to determine these subsequences and this thesis provides a detailed analysis and comparison to determine the best method for such indices. Moreover, the benchmarking application minions is introduced, which will make comparisons between these methods easier as adding future new methods is simple. Needle is capable of utilizing large collections of sequencing data and determining their gene expressions. Three analyses are performed, which act as a proof of concept for how Needle can be utilized for large collections of sequencing data. Therefore, Needle is applied in this thesis to find cancer signatures, a newly annotated mouse transcript and tissue specific differentially expressed genes for different large data sets. In summary, indices like Needle are needed to actually take advantage of the data wealth currently present in the biological and medical research field