|
Clustering aims to split a data set X with n elements into k clusters such that the elements in the same cluster are similar and the elements in different clusters are not. Although this is a classical problem of machine learning and has been studied since the 50s of the 20th centuries, it still attracts many researchers these days. The essence of the clustering is to discover hidden structures behind the data, detect the relationships between elements, and anomalies in the data.
Different clustering strategies have been reported over the years and can be categorized as Centroid-based (such as K-means, CLARA, Fuzzy C-Means), Graph-based (such as SNN, CHAMELEON), and Density-based (such as DBSCAN, OPTICS). However, most of these algorithms suffer from choosing their appropriate parameters for different datasets, for example the density threshold.
In 2014, Rodriguez and Laio proposed a novel density-based clustering algorithm through fast search and density peaking (named as DPC). The algorithm attracts many researchers' attention and has a thousand citations till now.
|