您好,欢迎光临本网站![请登录][注册会员]  
文件名称: 已知的5大聚类算法.pdf
  所属分类: 机器学习
  开发工具:
  文件大小: 863kb
  下载次数: 0
  上传时间: 2019-08-24
  提 供 者: home****
 详细说明:The 5 Clustering Algorithms Data Scientists Need to Know Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specic group. In theory, data points that are in the same group should have similar properties and/or features, while data points in dierent groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many elds.On the other hand, K-Means has a couple of disadvantages. Firstly, you have to select how many groups/ classes there are. This isnt always trivial and ideally with a clustcring algorithm we'd want it to figure those out for us because the point of it is to gain some insight from the data K-means also starts with a random choice of cluster centers and therefore it may yield different clustering results on different runs of the algorithm. Thus, the results may not be repeatable and lack consistency. Other cluster methods are more consistent K-Medians is another clustering algorithm related to K-Mcans, cxcept instead of recomputing the group center points using the mean we use the median vector of the group This method is less sensitive to outliers (because of using the median) but is much slower for larger datasets as sorting is required on each iteration when computing the median vector Mean-Shift Clustering Mean shift clustering is a sliding-window-based algorithm that attempts to find dense areas of data points. It is a centroid-based algorithm meaning that the goal is to locate the center points of each group/class, which works by updating candidates for center points to be the mean of the points within the sliding-window. These candidate windows are then filtered in a post-processing stage to eliminate near- duplicates, forming the final set of center points and their corresponding groups. Check out the graphic below for an illustration points: 77 Mean-Shift Clustering for a single sliding window 1. To explain mean-shift we will consider a set of points in two- dimensional space like the above illustration. We begin with a circular sliding window centered at a point g (randomly selected) and having radius r as the kernel. Mean shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region on each step until convergence 2. At every iteration the sliding window is shifted towards regions of higher density by shifting the center point to the mean of the points within the window (hence the name). The density within the sliding window is proportional to the number of points inside it. Naturally, by shifting to the mean of the points in the window it will gradually move towards areas of higher point densit 3. We continue shifting the sliding window according to the mean until there is no direction at which a shift can accommodate more points inside the kernel. Check out the graphic above; we keep moving the circle until we no longer are increasing the density (i.e number of points in the window) 4. This process of steps 1 to 3 is done with many sliding windows until all points lie within a window. When multiple sliding windows overlap the window containing the most points is preserved The data points are then clustered according to the sliding window in which they reside An illustration of the entire process from end-to-end with all of the sliding windows is show below. Each black dot represents the centroid of a sliding window and each gray dot is a data point Iteration 8
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 输入关键字,在本站1000多万海量源码库中尽情搜索: