Performance and Reliability Evaluation of Apache Kafka Messaging System

Wu, Han

Performance and Reliability Evaluation of Apache Kafka Messaging System

Metadaten

dc.contributor.author

Wu, Han

dc.date.accessioned

2021-02-08T13:47:43Z

dc.date.available

2021-02-08T13:47:43Z

dc.date.issued

2021

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/29377

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-29123

dc.description.abstract

Streaming data is now flowing across various devices and applications around us. This type of data means any unbounded, ever growing, infinite data set which is continuously generated by all kinds of sources. Examples include sensor data transmitted among different Internet of Things (IoT) devices, user activity records collected on websites and payment requests sent from mobile devices. In many application scenarios, streaming data needs to be processed in real-time because its value can be futile over time. A variety of stream processing systems have been developed in the last decade and are evolving to address rising challenges. A typical stream processing system consists of multiple processing nodes in the topology of a DAG (directed acyclic graph). To build real-time streaming data pipelines across those nodes, message middleware technology is widely applied. As a distributed messaging system with high durability and scalability, Apache Kafka has become very popular among modern companies. It ingests streaming data from upstream applications and store the data in its distributed cluster, which provides a fault-tolerant data source for stream processors. Therefore, Kafka plays a critical role to ensure the completeness, correctness and timeliness of streaming data delivery. However, it is impossible to meet all the user requirements in real-time cases with a simple and fixed data delivery strategy. In this thesis, we address the challenge of choosing a proper configuration to guarantee both performance and reliability of Kafka for complex streaming application scenarios. We investigate the features that have an impact on the performance and reliability metrics. We propose a queueing based prediction model to predict the performance metrics, including producer throughput and packet latency of Kafka. We define two reliability metrics, the probability of message loss and the probability of message duplication. We create an ANN model to predict these metrics given unstable network metrics like network delay and packet loss rate. To collect sufficient training data we build a Docker-based Kafka testbed with a fault injection module. We use a new quality-of-service metric, timely throughput to help us choosing proper batch size in Kafka. Based on this metric, we propose a dynamic configuration method, which reactively guarantees both performance and reliability of Kafka under complex operation conditions.

dc.format.extent

ix, 123 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by-nc-sa/4.0/

dc.subject

Streaming data

dc.subject

Performance

dc.subject

Reliability

dc.subject

Machine Learning

dc.subject.ddc

000 Computer science, information, and general works::000 Computer Science, knowledge, systems::000 Computer science, information, and general works

dc.title

Performance and Reliability Evaluation of Apache Kafka Messaging System

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Wolter, Katinka

dc.contributor.furtherReferee

van Moorsel, Aad

dc.date.accepted

2021-01-14

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-29377-7

refubium.affiliation

Mathematik und Informatik

refubium.note.author

sponsored by CSC (China Scholarship Council)

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

dcterms.accessRights.proquest

Zur Kurzanzeige

Das Dokument erscheint in:

Dissertationen FU

Dateien zu dieser Ressource

Thesis_HanWu.pdf

Größe: 2.671MB

Format: PDF

Prüfsumme (MD5): 240f7a356e8fe59060b59e76d5940ce7

Öffnen

Performance and Reliability Evaluation of Apache Kafka Messaging System

Refubium - Repositorium der Freien Universität Berlin

Performance and Reliability Evaluation of Apache Kafka Messaging System

Metadaten

Das Dokument erscheint in:

Dateien zu dieser Ressource

Metadaten exportieren