已知的5大聚类算法.pdfThe 5 Clustering Algorithms Data Scie

文件名称: 已知的5大聚类算法.pdf

所属分类: 机器学习

开发工具:

文件大小: 863kb

下载次数: 0

上传时间: 2019-08-24

提供者: home****

下载 (863kb)

不能下载？报告错误

详细说明：The 5 Clustering Algorithms Data Scientists Need to Know Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specic group. In theory, data points that are in the same group should have similar properties and/or features, while data points in dierent groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many elds.On the other hand, K-Means has a couple of disadvantages. Firstly, you have to select how many groups/ classes there are. This isnt always trivial and ideally with a clustcring algorithm we'd want it to figure those out for us because the point of it is to gain some insight from the data K-means also starts with a random choice of cluster centers and therefore it may yield different clustering results on different runs of the algorithm. Thus, the results may not be repeatable and lack consistency. Other cluster methods are more consistent K-Medians is another clustering algorithm related to K-Mcans, cxcept instead of recomputing the group center points using the mean we use the median vector of the group This method is less sensitive to outliers (because of using the median) but is much slower for larger datasets as sorting is required on each iteration when computing the median vector Mean-Shift Clustering Mean shift clustering is a sliding-window-based algorithm that attempts to find dense areas of data points. It is a centroid-based algorithm meaning that the goal is to locate the center points of each group/class, which works by updating candidates for center points to be the mean of the points within the sliding-window. These candidate windows are then filtered in a post-processing stage to eliminate near- duplicates, forming the final set of center points and their corresponding groups. Check out the graphic below for an illustration points: 77 Mean-Shift Clustering for a single sliding window 1. To explain mean-shift we will consider a set of points in two- dimensional space like the above illustration. We begin with a circular sliding window centered at a point g (randomly selected) and having radius r as the kernel. Mean shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region on each step until convergence 2. At every iteration the sliding window is shifted towards regions of higher density by shifting the center point to the mean of the points within the window (hence the name). The density within the sliding window is proportional to the number of points inside it. Naturally, by shifting to the mean of the points in the window it will gradually move towards areas of higher point densit 3. We continue shifting the sliding window according to the mean until there is no direction at which a shift can accommodate more points inside the kernel. Check out the graphic above; we keep moving the circle until we no longer are increasing the density (i.e number of points in the window) 4. This process of steps 1 to 3 is done with many sliding windows until all points lie within a window. When multiple sliding windows overlap the window containing the most points is preserved The data points are then clustered according to the sliding window in which they reside An illustration of the entire process from end-to-end with all of the sliding windows is show below. Each black dot represents the centroid of a sliding window and each gray dot is a data point Iteration 8

(系统自动生成,下载前可以参看下载内容)
下载文件列表

相关说明

本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.

本站是交换下载平台，提供交流渠道，下载内容来自于网络，除下载问题外，其它问题请自行百度。

本站已设置防盗链，请勿用迅雷、QQ旋风等多线程下载软件下载资源，下载后用WinRAR最新版进行解压.

如果您发现内容无法下载，请稍后再次尝试；或者到消费记录里找到下载记录反馈给我们.

下载后发现下载的内容跟说明不相乎，请到消费记录里找到下载记录反馈给我们，经确认后退回积分.

如下载前有疑问，可以通过点击"提供者"的名字，查看对方的联系方式，联系对方咨询.

相关搜索: 已知的5大聚类算法.pdf

输入关键字，在本站1000多万海量源码库中尽情搜索：

下载资源分类

移动开发

开发技术

课程资源

网络技术

操作系统

安全技术

数据库

行业

服务器应用

存储

信息化

考试认证

云计算

大数据

跨平台

音视频

游戏开发

人工智能

区块链

资源分类

机器学习

深度学习

搜索引擎

计算广告

VR

本站统计

资源总数：630万个

资源大小：15TB

今日更新：468个

注册人数：225万

今日注册：838

加入“点数信息”会员

　　“点数信息”是专业的,大型的源码,编程资源等搜索,交换平台,旨在帮助软件开发人员提供源码,编程资源下载,技术交流等服务!目前源码资源大小已超过8TB。
　　超值价格，购买下载积分，即时到帐，无需等待马上可以下载你所需的资料。无限期使用，一次购买越多越优惠！

免费获取积分

　　免费获得积分的途径是通过会员下载您上传的资料，您的帐户即增加积分。
　　立即上传资料，越多越好，被搜索到的机会越大！越早上传越早得积分，下载次数越多，您的积分越多。

合作伙伴

CodeProject

搜珍网

建筑工程网

CSDN.net

建筑资料网