开发工具:
文件大小: 957kb
下载次数: 0
上传时间: 2019-07-07
详细说明:做异常检测常用的算法介绍都在这本书里哦,做异常检测常用的算法介绍都在这本书里哦,PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
To my wife, my daughter Sayani
and my late parents Dr. Prem Sarup and Mrs. Pushplata Aggarwal
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
Contents
1 An Introduction to Outlier Analysis
1.1 Introduction
1.2 The Data Model is Everything
1.2.1 Connections with Supervised modcls
11580
1.3 The basic outlier detection models
1. 3. 1 Feature Selection in Outlier detection
1.3.2 Extreme-Value Analysis
1. 3.3 Probabilistic and Statistical Models
012
1.3.4 Linear Models
1.3.4.1 Spectral Models
14
1.3.5 Proximity-Based Models
1. 3.6 Information-Theoretic Models
16
1.3.7 High-Dimcnsional Outlier Dctcction
1.4 Outlier ensembles
1.4.1 Sequential Ensembles
1. 4.2 Independent ensembles
20
1.5 The Basic Data Types for Analysis
21
1.5.1 Categorical
al. Text. and mixed Attributes
1.5.2 When the Data Values have Dependencies
1.5.2.1 Times-Series Data and Data streams
22
1.5.2.2 Discrete Sequences
24
1.5.2.3 Spatial Dat
24
1.5.2.4 Network and Graph Data
25
1.6 Supervised Outlier Detection
25
1.7 Outlier Evaluation Techniques
1.7.1 Interpreting the ROC AUC
29
1.7.2 Common Mistakes in Benchmarking
30
Conclusions and summary
31
1.9 Bibliographic Survey
31
1.10 Exercises
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
CONTENTS
2 Probabilistic models for outlier detection
35
35
2.2 Sta tistica. I Met hods for Fxtreme-Value Analysis
37
2. 2. 1 Probabilistic Tail Inequalities
37
2.2.1.1 Sum of bounded random variables
38
2.2.2 Statistica -Tail Confidence Tests
43
2.2.2.1 t-Value Test
43
2.2.2.2 Sum of Squares of Deviations
2.3 Extreme-Value Analysis in Multivariate Data ith
Values with box plots
2.2.2.3 Visualizing extreme values
45
46
2.3.1 Depth-Based Methods
2.3.2 Deviation-Based methods
2.3.3 Angle-Based Outlier Detection
49
2.3.4 Distancc Distribution-bascd Techniques: Thc Mahalanobis Mcthod
2.3.4.1 Strengths of the Mahalanobis Method
53
2.4 Probabilistic Mixture Modeling for Outlier Analysis
2.4.1 Rclationship with Clustcring Mcthods
2.4.2 The Special Case of a Single Mixture Component
2.4. 3 Other Ways of Leveraging the EM Model
58
2.4.4 An Application of EM for Converting Scores to Probabilities
59
2.5 Limitations of Probabilistic Modeling
60
2.6 Conclusions and summary
61
2.7 Bibliographic Survey
61
2. 8 Exercises
62
3 Linear models for outlier detection
65
3.1 Introduction
65
3.2 Lincar Regression Modcls
68
3. 2. 1 Modeling with Dependent variables
70
3.2.1.1 Applications of Dependent Variable Modeling
3.2. 2 Linear Modeling with Mean-Squared Projection Error
74
3.3 Principal Component Analysis
3. 3.1 Connections with the Mahalanobis method
3.3.2 Hard pca versus soft pca
79
3.3.3 Sensitivity to Noise
7
3.3.4 Normalization issues
80
3.3.5 Regularization Issues
3.3.6 Applications to Noise Correction
3.3.7 How Many Eigenvectors
3.3.8 Extension to Nonlinear Data distributions
3.3.8.1 Choice of Similarity Matrix
3.3.8.2 Practical Issues
3.3.8.3 Application to Arbitrary Data Types
3.4 Onc-Class Support Vector Machines
3.4. 1 Solving the Dual Optimization ProbleIn
92
3. 4.2 Practical issues
92
3.4.3 Connections to Support Vector Data Description and Other Kernel
93
3.5 A Matrix Factorization view of Linear Models
95
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
CONTENTS
3.5.1 Outlier Dctection in Incomplctc Data
3.5. 1.1 CoIllputing the Outlier Scores
3.6 Neural Networks: From Linear Models to Deep Learning
3.6.1G
to nonli
10
3.6.2 Replicator Neural Net works and Deep autoencoders
102
3. 6.3 Practical Is
105
3.6.4 The Broad Potential of Neural Networks
106
3.7 Limitations of Linear Modeling
106
3.8 Conclusions and Summary
107
3.9 Bibliographic Survey
108
3.10 Exercises
109
4 Proximity-Based Outlier Detection
111
4.1 Introduction
111
4.2 Clusters and Outliers: The Complementary Relationship
112
4.2.1 ExteNsions to Arbitrarily Shaped Clusters
4.2.1.1 Application to Arbitrary Data Types
118
4. 2. 2 Advantagcs and Disadvantages of Clustcring Mcthods
4.3 Distallce-Based Outlier Analysis
4.3.1 Scoring Outputs for Distance-Based Methods
4.3.2 Binary Outputs for Distancc-Bascd Mcthods
121
4.3.2. 1 Cell-Based Pruning
122
4.3.2.2 Sampling-Based Pruning
l24
1.3.2.3 Index-Based Pruning
126
4.3.3 Data-Dependent Similarity Measures
128
4.3.4 ODIN: A Reverse Nearest Neighbor Approach
129
1.3.5 Intensional Knowledge of Distance-Based Outliers
130
4.3.6 Discussion of Distance-Based Methods
131
4.4 Density-Based Outliers
131
4.4.1 LOF: Local Outlier Factor
132
4.4.1.1 Handling Duplicate Points and Stability Issues
134
4.4.2 LOCI: Local Correlation Integral
135
4.4.2.1 LOCI Plot
136
4.4.3 Histogram-Based Techniques
137
4.4.4 Kernel Density Estimation
138
4.4.4.1 Connection with Harmonic k-Nearest Neighbor Detector. 139
4.4.4.2 Local variations of Kernel methods
140
4.4.5 Ellseinble-Based Iinpleinentations of Histogralls and Kernel Methods 140
4.5 Limitations of Proximitv-Based Detection
141
4.6 Conclusions and Summary
,,,,142
7 Bibliographic Survey
142
4.8 Exercises
146
5 High-Dimensional Outlier Detection
149
5.1 Introduction
149
5.2 Axis-Parallel Subspaces
152
5.2.1 Genetic Algorithms for Outlier Detection
153
5.2.1.1 Defining Abnormal Lower-Dimensional Projections
153
5.2.1.2 Defining
ic Operators for Subspace search
154
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
CONTENTS
5. 2.2 Finding Distancc-Bascd Outlying Subspaces
..157
5.2.3 Feature Bagging: A Subspace SaInpling Perspective
157
5.2.4 Projected Clustering ensembles
158
5.2.5 Subspace histograms in Linear Time
160
5.2.6 Isolation Forests
161
5.2.6.1 Further Enhancements for Subspace Selection
163
5.2.6. 2 Early Termination
163
5.2.6. 3 Relationship to Clustering Ensembles and Ilistograms
164
5.2.7 Selecting High-Contrast Subspaces
164
5.2.8 Local Selection of Subspace Projections
166
5.2.9 Distance-Based Reference Sets
169
5.3 Generalized Subspaces
170
5.3.1 Generalized Projected Clustering Approach
171
5.3. 2 Leveraging Instancc-Spccific Rcfcrcncc Scts
172
5.3.3 Rotated Subspace Salnlpli
175
5.3.4 Nonlinear Subspaces
176
5.3.5 Regression Modeling Tcchniqucs
178
5.4 Discussion of Subspace Analysis
5.5 Conclusions and Summary
180
5.6 Bibliographic survey
181
5.7 Exercises
184
6 Outlier Ensembles
185
6.1 Introduction
185
6.2 Categorization and Design of Ensemble Methods
6.2.1 Basic Score Normalization and Combination Methods
189
6.3 Theoretical Foundations of Outlier Ensembles
191
6.3. 1 What is thc Expectation Computed Ovcr?
195
6.3.2 Relationship of Ensemble Analysis to Bias-Variance Trade-Off
195
6.4 Variance Reduction methods
196
6.4.1 Parametric Ensembles
197
6.4.2 Randomized Detector Averaging
199
6.4.3 Feature Bagging: An Ensemble-Centric Perspective
199
6.4.3.1 Connections to Representational Bias
200
6.4.3.2 Weaknesses of Feature bagging
202
6.4.4 Rotated bagging
202
6.4.5 Isolation Forests: An Ensemble- Centric View
203
6.4.6 Data-Centric Variance Reduction with Sampling
205
6.4.6. 1 Bagging
205
6.4.6.2 Subsampl
6.4.6.3 Variable sub
g
207
6.4.6.4 Variable Subsampling with Rotated Bagging (VR)
.209
6.4.7 Other Va
Reduction methods
209
6.5 Flying Blind with Bias Reductic
211
6.5.1 Bias Reduction by Data-Centric Pruning
2
6.5.2 Bias Reduction by Model-Centric Pruning
6.5.3 Combining Bi
Reduct
6.6 Model combination for Outlier Ensembles
6.6.1 Combining Scoring Methods with Ranks
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
CONTENTS
6.6.2 Combining bias and variance rcduction
6.7 ConclusiOns and SullInary
6.8 Bibliographic Survey
6.9 Exercises
7 Supervised Outlier Detection
219
7.1 Introduction
7. 2 Full Supervision: Rare Class Detection
221
7. 2. 1 Cost-Sensitive Learning
...223
7.2.1.1 Meta Cost: A Relabeling Approach
223
7. 2.1.2 Weighting Methods
225
7.2.2 Adaptive Re-sampling
228
7.2.2. 1 Relationship between Weighting and sampling
229
7.2.2.2 Synthetic Over-sampling: SMOTE
9
7. 2.3 Boosting Methods
230
7.3 SeInli-Supervision: Positive and Unlabeled Data
231
7.4
ai-Su
Partially observed cla
232
7.4.1 Onc-Class Learning with Anomalous Examples
233
7.4.2 One-Class Learning with Norlllal exalnples
234
7.4.3 Learning with a Subset of Labeled Classes
234
7.5 Unsupervised Fcaturc Enginccring in Supcrviscd Mcthods
35
7. 6 Active Learning
236
7.7 Supervised Models for Unsupervised Outlier Detection
239
7.7.1 Connections with PcA-Based methods
242
7.7.2 Group-wise Predictions for High-Dimensional Data.......... 243
7.7.3 Applicability to Mixed -Att ribute Data Sets
7.7.4 Incorporating Column-wise Knowledge
7.7.5 Other Classification Methods with Synthetic Outlier
244
7.8 Conclusions and Summary
245
7.9 Bibliographic Survey
245
7.10 Exercises
247
8 Categorical, Text, and Mixed Attribute Data
24
8.1 Introduction
249
8.2 Extending Probabilistic Models to Categorical Data
250
8.2.1 Modeling Mixed Dat
253
8.3 Extending Linear Models to Categorical and Mixed Data
254
8.3.1 Leveraging Supcrviscd Regression Modcls
254
8.4 Extending Proximity Models to Categorical Data
255
8.4.1A
256
8.4.2 Contextual Similarity
257
8.4.2. 1 Connections to Linear Models
258
8. 4.3 Issues with mixed Data
259
8.4.4 Density-Based Methods
259
8.4.5 Clustering Methods
259
8.5 Outlier Detection in Binary and Transaction Data
260
8.5.1 Subspace methods
260
8.5.2 Novelties in Temporal Transactions
262
8.6 Outlier detection in Text data
262
PdfDownloadablefromhttp://rd.springercom/book/10.1007/978-3-319-47578-3
CONTENTS
8.6.1 Probabilistic Modcl
262
8.6.2 Linear Models: Latent Sellalltic Anlalysis
264
8.6.2. 1 Probabilistic Latent Semantic Analysis(PISA)
265
8.6.3 Proximity-Based Models
268
8.6.3. 1 First Story Detection
269
8.7 Conclusions and Summary
270
8.8 Bibliographic Survey
270
8.9E
9 Time Series and Streaming Outlier Detection
273
9.1 Introduction
9.2 Predictive Outlier Detection in Streaming Time-Series
...276
9.2.1 Autoregressive Models
9.2.2 Multiple time Series Regression Models
9.2.2.1 Direct Generalization of Autoregressive Models
279
9.2.2.2 Time-Series selec
281
9.2.2.3 Principal Component Analvsis and Hidden Variable-Based
Models
282
9.2. 3 Relationship between Unsupervised Outlier Detection and Prediction 284
9.2. 4 Supervised Point Outlier Detection in Time Series
284
9.3 Time-Series of Unusual Shapes
286
9.3.1 Transformation to Other Representations
287
9.3.1.1 Numeric Multidimensional Transformations
288
9..1
Transformati
290
9.3.1.3 Leveraging Trajectory Representations of Time Series
9.3.2 Distance-Based Methods
9.3.2.1 Single Series versus Multiple Series
9.3.3 Probabilistic Modcls
9.3. 4 Linear models
295
9.3.4.1 Univariate Series
295
9.3.4.2 Multivariate Serie
9.3.4.3 Incorporating Arbitrary Similarity Functions
9.3.4.4 Leveraging Kernel Methods with Linear Models
298
9.3.5 Supervised Methods for Finding Unusual Time-Series Shapes
9. 4 Multidimensional Streaming Outlier Detection
9.4.1 Individual Data Points as Outliers
299
9.4.1.1 Proximity-Based Algorithms
299
9.4.1.2 Probabilistic Algorithms
301
9. 4.1.3 High-Dimensional scenario
9.4.2 Aggregate Change Points as Outliers
301
9.4.2. 1 Velocity Density Estimation Method
302
9.4.2.2 Statistically Significant Changes in Aggregate DistributiOns 304
9.4.3 Rare and novel class detection in Multidimensional data streams. 305
0.4.3. 1 Dctccting Rarc Classes
9.4.3.2 Detecting Novel Classes
9.4.3.3 Detect ing Infrequently Recurring Classes
9.5 Conclusions and Summary
307
9.6 Bibliographic Survey
307
9.7 Exercises
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.