Divisive Variational Bayesian Algorithms for the Clustering of large and
complex Networks: With Applications to Earthquake Networks

Willenbockel, Christian Tobias

Divisive Variational Bayesian Algorithms for the Clustering of large and complex Networks

Title:

Divisive Variational Bayesian Algorithms for the Clustering of large and complex Networks

Subtitle:

With Applications to Earthquake Networks

Translated Title(s):

Aufteilende Variational Bayessche Algorithmen für das Clustering von großen und komplexen Netzwerken

Author(s):

Willenbockel, Christian Tobias

Year of publication:

2017

Available Date:

2017-09-19T11:25:58.732Z

Abstract:

The aim of this thesis was the development of new algorithms for the analysis of large and complex networks with applications to earthquake networks. Earthquake networks are directed and weighted networks which model the temporal and spatial succession of earthquakes. The mapping of the interdependency between the vertices of the networks representing the location of the earthquake and the temporal succession and occurrence of earthquakes through edges between the vertices leads to a considerable complexity. Assigning (clustering) the vertices with similar edge connection profiles to the same class (cluster) opens the possibility of a better understanding of the generation and topology of the network. One model offering such a cluster assignment is the Stochastic Block Model with Poisson distributions (SBM). It is a special feature of the SBM to consider the possible interdependence of each vertex to all other vertices of the network and thus capturing the complexity of the network. This network interdependency is a major challenge for the inference of a specific SBM for a given network. The application of the Variational Bayesian EM (VBEM) algorithm was proposed as an option for the solution to this problem some years ago. The estimation of the SBM with the VBEM algorithm leads to a non-convex optimisation problem. The objective is to find the optimal number of clusters and assignments of the vertices to the clusters corresponding to a global optimum of the model selection criterion. In order to solve this optimisation problem existing VBEM algorithms often only allow the separate initialisation for different numbers of clusters. Using this batch approach, the optimisation has to be performed each time with respect to all vertices in the network and for each number of clusters. For earthquake networks or comparable large and complex networks this approach leads to unsatisfactory results with respect to quality and computation time. At this point the present thesis comes in. A new type of algorithms for VBEM inference, called the Blockloading algorithms, is proposed. These algorithms combine the search of the optimal cluster assignment of the vertices and the number of clusters according to the model selection criterion in a divisive algorithm. Starting with the assignment of all vertices to a single cluster the search for an optimum is performed by splitting and refining the existing clusters. As this optimisation only involves subsets, two new, especially adapted VBEM algorithms, the BlockVB and BlockVB++ algorithms, were developed to meet this objective. The design of the Blockloading algorithms follows the observation, that the successful search for a global optimum with means of a divisive algorithm can be performed by the identification of several favourable local optima. These local optima are similar in the sense that by correctly splitting the right clusters they can be transformed into a global optimum. The Blockloading, automatic Blockloading, no reset Blockloading and Blockloading++ algorithms were designed with regard to this observation which lead to an efficient algorithmic design which was optimised for finding favourable local optima. As large and complex networks often exhibit irregularly and sparsely connected vertices the Stochastic Block Model with irrelevant Vertices (SBMIV) was developed which explicitly models these irrelevant vertices. Building on our Blockloading algorithms, the Relevance Blockloading algorithm together with the Relevance BlockVB EM algorithm was introduced for the estimation of the SBMIV. Numerical tests of the newly developed methods were performed on synthetic networks, generated by the SBM, and an earthquake network. The newly developed Blockloading algorithms reached a previously unattained quality of the results and computation time for the clustering of the earthquake network and the synthetic networks when compared to comparable methods for the estimation of the SBM. It was demonstrated for an earthquake network that inference with the Relevance Blockloading algorithm can reliably identify irrelevant vertices. This estimation lead to an even better value of the model selection criterion than the inference of the comparable Poisson SBM which demonstrates the superiority of this approach.

Ziel der Arbeit war die Entwicklung neuer Algorithmen zur Analyse von großen und komplexen Netzwerken mit Anwendung auf Erdbebennetzwerke. Erdbebennetzwerke sind gerichtete und gewichtete Netzwerke, welche die zeitliche und räumliche Abfolge von Erdbeben abbilden. Die Abbildung der wechselseitigen Abhängigkeit zwischen den Knoten des Netzwerks, welche den Ort des Erdbebens beschreiben, und der zeitlichen Abfolge und dem Auftreten von Erdbeben durch Kanten zwischen den Knoten führt zu erheblicher Komplexität. Um die Entstehung und Topologie des Netzwerks besser zu verstehen, bietet sich das Zusammenfassen (Clustern) von Knoten mit ähnlichen Kantenverbindungsprofilen in derselben Klasse (Cluster) an. Ein Modell für solche Zusammenfassungen ist das Stochastische Block Modell mit Poisson Verteilungen (SBM). Das SBM hat die Besonderheit, dass es die mögliche wechselseitige Abbängigkeit von jedem Knoten zu allen anderen Knoten berücksichtigt, und so die Komplexität des Netzwerks erfasst. Diese wechselseitige Abhängigkeit ist gleichzeitig eine besondere Herausforderung für ein Lösungsverfahren zur Schätzung des konkreten SBMs für ein gegebenes Netzwerk. Vor einigen Jahren wurde der Variational Bayesian EM (VBEM) Algorithmus als Option zur Lösung dieses Problems vorgestellt. Die Schätzung eines SBM mit dem VBEM Algorithmus führt auf ein nicht-konvexes Optimierungsproblem. Dabei ist die optimale Anzahl der Cluster und Zuordnung der Knoten zu den Clustern zu finden, die einem globalen Optimum des Modellauswahlkriteriums entsprechen. Zur Lösung dieses Optimierungsproblems erlaubten bisherige VBEM Algorithmen allerdings häufig nur die getrennte Anwendung für verschiedene Anzahlen von Clustern. Dabei wird die Optimierung jedesmal in Abhängigkeit von allen Knoten des gesamten Netzwerks und für jede Clusterzahl neu durchgeführt. Dieses Vorgehen führt jedoch für Erdbebennetzwerke oder vergleichbar große und komplexe Netzwerke zu unzureichenden Ergebnissen in Bezug auf die Qualität und Rechengeschwindigkeit. Genau an dieser Stelle setzt die vorliegende Arbeit an. Es wird eine neue Art von Algorithmen zur Anwendung von VBEM Algorithmen, sogenannte Blockloading Algorithmen vorgestellt, welche die Suche nach der optimalen Clusterzuordnung der Knoten und der Anzahl der Cluster gemäß des Modellauswahlkriteriums in einem Unterteilungalgorithmus kombinieren. Ausgehend von der Zuordnung aller Knoten zu einem einzigen Cluster wird ein Optimum durch das Aufteilen und Verfeinern der bestehenden Cluster gesucht. Da in dieser Optimierung nur Teilmengen der Knoten berücksichtigt werden, wurden mit den BlockVB und BlockVB++ Algorithmen zwei neue, dafür speziell angepasste VBEM Algorithmen entwickelt. Das Design der Blockloading Algorithmen folgt der Beobachtung, dass die erfolgreiche Suche nach einem globalen Optimum mit einem Unterteilungsalgorithmus durch die Identifizierung mehrerer günstiger lokaler Optima erfolgen kann. Diese lokalen Optima sind sich in dem Sinne ähnlich, dass sie alle durch das richtige Aufteilen der richtigen Cluster in ein globales Optimum überführt werden können. Mit den Blockloading, automatic Blockloading, no reset Blockloading und Blockloading++ Algorithmen wird diese Beobachtung durch ein effizientes algorithmisches Design berücksichtigt, welches für das Auffinden geeigneter lokaler Optima optimiert wurde. Da komplexe und große Netzwerke häufig unregelmäßig und wenig verbundene Knoten aufweisen, wurde zusätzlich das Stochastische Block Modell mit irrelevanten Knoten (SBMIV) entwickelt, welches irrelevante Knoten gesondert zu berücksichtigen erlaubt. Aufbauend auf den Blockloading und BlockVB Algorithmen wurde der Relevance Blockloading Algorithmus mit dem Relevance BlockVB EM Algorithmus zur Schätzung des SBMIV vorgestellt. Anhand von synthetischen Netzwerken, die mit dem SBM erzeugt wurden, und einem Erdbebennetzwerk wurden numerische Tests der entwickelten Methoden durchgeführt. Im Vergleich zu vergleichbaren Berechnungsmethoden zur Schätzung des SBMs waren die neu entwickelten Blockloading Algorithmen in der Lage, sowohl das Erdbebennetzwerk als auch die synthetische Netzwerke in bisher nicht erreichter Qualität und Geschwindigkeit zu clustern. Anhand des Erdbebennetzwerks konnte zusätzlich gezeigt werden, dass der neue Relevance Blockloading Algorithmus bei der Schätzung eines SBMIV zuverlässig irrelevante Knoten identifizieren kann. Dies führte sogar zu einem besseren Wert des Modellauswahlkriteriums als die Schätzung des vergleichbaren Poisson-SBM, was die Überlegenheit dieses Ansatzes demonstriert.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/4648
http://dx.doi.org/10.17169/refubium-8848
urn:nbn:de:kobv:188-fudissthesis000000105510-1

Language:

English

Keywords:

variational Bayesian inference
clustering
networks
stochastic block models
earthquake networks
irrelevant vertices

DDC-Classification:

510 Mathematik
519 Wahrscheinlichkeiten, angewandte Mathematik

Publication Type:

Dissertation

Department/institution:

Mathematik und Informatik