开发工具:
文件大小: 863kb
下载次数: 0
上传时间: 2019-08-24
详细说明:The 5 Clustering Algorithms Data Scientists Need to Know
Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specic group. In theory, data points that are in the same group should have similar properties and/or features, while data points in dierent groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many elds.On the other hand, K-Means has a couple of disadvantages. Firstly, you
have to select how many groups/ classes there are. This isnt always
trivial and ideally with a clustcring algorithm we'd want it to figure
those out for us because the point of it is to gain some insight from the
data K-means also starts with a random choice of cluster centers and
therefore it may yield different clustering results on different runs of
the algorithm. Thus, the results may not be repeatable and lack
consistency. Other cluster methods are more consistent
K-Medians is another clustering algorithm related to K-Mcans, cxcept
instead of recomputing the group center points using the mean we use
the median vector of the group This method is less sensitive to outliers
(because of using the median) but is much slower for larger datasets as
sorting is required on each iteration when computing the median
vector
Mean-Shift Clustering
Mean shift clustering is a sliding-window-based algorithm that
attempts to find dense areas of data points. It is a centroid-based
algorithm meaning that the goal is to locate the center points of each
group/class, which works by updating candidates for center points to
be the mean of the points within the sliding-window. These candidate
windows are then filtered in a post-processing stage to eliminate near-
duplicates, forming the final set of center points and their
corresponding groups. Check out the graphic below for an illustration
points: 77
Mean-Shift Clustering for a single sliding window
1. To explain mean-shift we will consider a set of points in two-
dimensional space like the above illustration. We begin with a
circular sliding window centered at a point g (randomly selected)
and having radius r as the kernel. Mean shift is a hill climbing
algorithm which involves shifting this kernel iteratively to a higher
density region on each step until convergence
2. At every iteration the sliding window is shifted towards regions of
higher density by shifting the center point to the mean of the
points within the window (hence the name). The density within
the sliding window is proportional to the number of points inside
it. Naturally, by shifting to the mean of the points in the window it
will gradually move towards areas of higher point densit
3. We continue shifting the sliding window according to the mean
until there is no direction at which a shift can accommodate more
points inside the kernel. Check out the graphic above; we keep
moving the circle until we no longer are increasing the density (i.e
number of points in the window)
4. This process of steps 1 to 3 is done with many sliding windows
until all points lie within a window. When multiple sliding
windows overlap the window containing the most points is
preserved The data points are then clustered according to the
sliding window in which they reside
An illustration of the entire process from end-to-end with all of the
sliding windows is show below. Each black dot represents the centroid
of a sliding window and each gray dot is a data point
Iteration 8
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.