|
最近,美国《科学》杂志刊出了一篇论文《Clustering by fast search and find of density peaks》(原文链接为:http://www.sciencemag.org/content/344/6191/1492.full.html),论文的主要思想有两个:
对于数据聚类,聚类中心应该位于数据密度分布高的区域,即类与类(cluster)之间应该有一个数据密度分布很低的地带;
即使在数据密度分布高的区域,类中心与类中心也应该保持相对大的距离。
(The algorithm has its basis in the assumptions that cluster centers are surrounded by neighbors with lower local density and that they are at a relatively large distance from any points with a higher local density.)
对于这个想法,本人年前的论文《Adaptive Initialization Method Based on Spatial Local Information for K-Means Algorithm》(链接:http://www.hindawi.com/journals/mpe/2014/761468/),其实也是基于类似的想:
在k-means中,聚类中心应该处在数据密度分布较高的区域;
通过k-NN或e-邻域对数据密度做一个预估计,设定阈值选出密集区域的数据点作为候选初始中心;
最终的初始中心在候选中心中选取距离相对较远的数据点。
(if we choose initial centers from regions with high local density of data distribution, which is a kind of spatial local information of data points, outliers will be prevented from being chosen. And, by keeping them away with certain distance, we will get the suitable initial centers for k-means algorithm)
可惜的是,本人始终没有能够跳出k-means算法的框架,看得更远。眼界决定高度!
虽然看到别人的论文在《science》上发表,但毕竟idea还是很像的,也算是对自己自信心的一点点安慰吧。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-24 03:25
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社