dc.contributor.author
Lemke, Oliver
dc.contributor.author
Keller, Bettina G.
dc.date.accessioned
2018-06-08T10:34:34Z
dc.date.available
2018-02-20T12:05:00.001Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/20682
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-23982
dc.description.abstract
Cluster analyses are often conducted with the goal to characterize an
underlying probability density, for which the data-point density serves as an
estimate for this probability density. We here test and benchmark the common
nearest neighbor (CNN) cluster algorithm. This algorithm assigns a spherical
neighborhood R to each data point and estimates the data-point density between
two data points as the number of data points N in the overlapping region of
their neighborhoods (step 1). The main principle in the CNN cluster algorithm
is cluster growing. This grows the clusters by sequentially adding data points
and thereby effectively positions the border of the clusters along an iso-
surface of the underlying probability density. This yields a strict
partitioning with outliers, for which the cluster represents peaks in the
underlying probability density—termed core sets (step 2). The removal of the
outliers on the basis of a threshold criterion is optional (step 3). The
benchmark datasets address a series of typical challenges, including datasets
with a very high dimensional state space and datasets in which the cluster
centroids are aligned along an underlying structure (Birch sets). The
performance of the CNN algorithm is evaluated with respect to these
challenges. The results indicate that the CNN cluster algorithm can be useful
in a wide range of settings. Cluster algorithms are particularly important for
the analysis of molecular dynamics (MD) simulations. We demonstrate how the
CNN cluster results can be used as a discretization of the molecular state
space for the construction of a core-set model of the MD improving the
accuracy compared to conventional full-partitioning models. The software for
the CNN clustering is available on GitHub.
en
dc.format.extent
21 Seiten
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.subject
density-based clustering
dc.subject
molecular dynamics simulations
dc.subject
Markov state models
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.subject.ddc
500 Naturwissenschaften und Mathematik::540 Chemie::541 Physikalische Chemie
dc.title
Common Nearest Neighbor Clustering - A Benchmark
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation
Algorithms 11 (2018), 2
dcterms.bibliographicCitation.doi
10.3390/a11020019
dcterms.bibliographicCitation.url
http://doi.org/10.3390/a11020019
refubium.affiliation
Biologie, Chemie, Pharmazie
de
refubium.affiliation.other
Institut für Chemie und Biochemie / Computational Chemistry and Theoretical Biophysics
refubium.funding
Institutional Participation
refubium.funding.id
MDPI
refubium.mycore.fudocsId
FUDOCS_document_000000029058
refubium.note.author
Die Publikation wurde aus Open Access Publikationsgeldern der Freien
Universität Berlin und der DFG gefördert.
refubium.resourceType.isindependentpub
no
refubium.mycore.derivateId
FUDOCS_derivate_000000009442
dcterms.accessRights.openaire
open access
dcterms.isPartOf.issn
1999-4893