文件名称:
Mastering-Predictive-Analytics-with-Python.pdf.pdf
开发工具:
文件大小: 6mb
下载次数: 0
上传时间: 2019-09-13
详细说明:Mastering-Predictive-Analytics-with-Python.pdfContents
Preface
Chapter 1: From Data to decisions- Getting started with
Analytic Applications
Designing an advanced analytic solution
Data layer warehouses, lakes, and streams
Modeling layer
468
Deployment layer
14
Reporting layer
15
Case study: sentiment analysis of social media feeds
Data input and transformation
Sanity checking
Model development
788
Scoring
19
Visualization and reporting
19
Case study: targeted e-mail campaigns
19
Data input and transformation
20
Sanity checking
21
Model development
21
Scoring
21
Visualization and reporting
21
Summary
23
Chapter 2: Exploratory Data Analysis and visualization in Python 25
Exploring categorical and numerical data in IPython
26
Installing IPython notebook
27
The notebook interface
27
oading and inspecting data
30
Basic manipulations- grouping filtering, mapping, and pivoting
33
Charting with Matplotlib
38
Time series analysis
46
Cleaning and converting
46
Time series diagnostics
48
Joining signals and correlation
50
Working with geospatial data
53
oading geospatial data
53
Working in the cloud
55
Introduction to PySpark
56
Creating the SparkContext
56
Creating an Rdd
58
Creating a spark dataFrame
59
Summary
Chapter 3: Finding patterns in the noise- Clustering and
Unsupervised Learning
63
Similarity and distance metrics
64
Numerical distance metrics
64
Correlation similarity metrics and time series
70
Similarity metrics for categorical data
78
K-means clustering
83
Affinity propagation -automatically choosing cluster numbers
89
k-medoids
93
Agglomerative clustering
94
Where agglomerative clustering fails
96
Streaming clustering in Spark
100
Summary
04
Chapter 4: Connecting the Dots with Models -Regression
Methods
105
Linear regression
106
Data preparation
109
Model fitting and evaluation
114
Statistical significance of regression outputs
119
Generalize estimating equations
124
Mixed effects models
126
Time series data
127
Generalized linear models
128
Applying regularization to linear models
129
Tree methods
132
Decision trees
132
Random forest
138
Scaling out with PySpark- predicting year of song release
141
Summar
143
Chapter 5: Putting Data in its Place- Classification Methods
and Analysis
145
Logistic regression
146
Multiclass logistic classifiers: multinomial regression
150
Formatting a dataset for classification problems
151
Learning pointwise updates with stochastic gradient descent
155
Jointly optimizing all parameters with second-order methods
158
Fitting the model
162
Evaluating classification models
165
Strategies for improving classification models
169
Separating Nonlinear boundaries with Support vector machines
172
Fitting and SVM to the census data
174
Boosting-combining small models to improve accuracy
Gradient boosted decision trees
177
Comparing classification methods
180
Case study: fitting classifier models in pyspark
182
Summary
184
Chapter 6: Words and pixels- Working with Unstructured data 185
Working with textual data
186
Cleaning textual data
186
Extracting features from textual data
189
Using dimensionality reduction to simplify datasets
192
Principal component analysis
193
Latent Dirichlet Allocation
205
Using dimensionality reduction in predictive modeling
209
Images
209
Cleaning image data
210
Thresholding images to highlight objects
213
Dimensionality reduction for image analysis
216
Case Study: Training a Recommender System in Py Spark
220
Summary
222
Chapter 7: Learning from the Bottom Up-Deep Networks and
Unsupervised Features
223
Learning patterns with neural networks
224
A network of one-the perceptron
224
Combining perceptrons -a single-layer neural network
226
Parameter fitting with back-propagation
229
Discriminative versus generative models
234
Vanishing gradients and explaining away
235
Pretraining belief networks
238
Using dropout to regularize networks
241
Convolutional networks and rectified units
242
Compressing Data with autoencoder networks
246
Optimizing the learning rate
247
The TensorFlow library and digit recognition
249
The mnist data
250
Constructing the network
252
Summary
256
Chapter 8: Sharing Models with Prediction Services
257
The architecture of a prediction service
258
Clients and making requests
260
The GET requests
260
The Post request
262
The HEAd request
262
The PUT request
262
The dElete request
263
Server- the web traffic controller
263
Application-the engine of the predictive services
265
Persisting information with database systems
266
Case study-logistic regression service
267
Setting up the database
268
The web server
271
The web application
273
The flow of a prediction service-training a model
274
On-demand and bulk prediction
283
Summary
287
Chapter 9: Reporting and Testing-Iterating on
Analytic Systems
289
Checking the health of models with diagnostics
290
Evaluating changes in model performance
290
Changes in feature importance
294
Changes in unsupervised model performance
295
Iterating on models through a/B testing
297
Experimental allocation -assigning customers to experiments
298
Deciding a sample size
299
Multiple hypothesis testing
302
Guidelines for communication
302
Translate terms to business values
303
Visualizing results
303
Case Study: building a reporting service
The report server
304
The report application
305
The visualization layer
306
Summary
310
Index
311
Preface
In Mastering Predictive Analytics with Python, you will work through a step-by-step
process to turn raw data into powerful insights. Power-packed with case studies and
code examples using popular open source python libraries this volume illustrates
the complete development process for analytic applications. The detailed examples
illustrate robust and scalable applications for common use cases. You will learn to
quickly apply these methods to your own data
What this book covers
Chapter l, From data to Decisions- Getting started with Analytic applications teaches
you to describe the core components of an analytic pipeline and the ways in which
they interact. We also examine the differences between batch and streaming
processes, and some use cases in which each type of application is well-suited. We
walk through examples of both basic applications using both paradigms and the
design decisions needed at each step
Chapter 2, Exploratory data Analysis and visualization in Python, examines many of the
tasks needed to start building analytical applications. Using the IPython notebook
well cover how to load data in a file into a data frame in pandas, rename columns
in the dataset, filter unwanted rows, convert types, and create new columns. In
addition, we'll join data from different sources and perform some basic statistica
analyses using aggregations and pivots
Chapter 3, Finding patterns in the noise -Clustering and unsupervised Learning, shows
you how to identify groups of similar items in a dataset. It's an exploratory analysis
that we might frequently use as a first step in deciphering new datasets. We explore
different ways of calculating the similarity between data points and describe what
kinds of data these metrics might best apply to. We examine both divisive clustering
algorithms, which split the data into smaller components starting from a single
group, and agglomerative methods, where every data point starts as its own cluster.
Using a number of datasets we show examples where these algorithms will perform
better or worse, and some ways to optimize them. We also see our first(small) data
pipeline, a clustering application in PySpark using streaming data
Chapter 4, Connecting the Dots with Models - Regression Methods, examines the fitting
of several regression models, including transforming input variables to the correct
scale and accounting for categorical features correctly. We fit and evaluate a linear
regression, as well as regularized regression models. We also examine the use of
tree-based regression models, and how to optimize parameter choices in fitting th
en
Finally, we will look at a sample of random forest modeling using PySpark, which
can be applied to larger datasets
Chapter 5, Putting Data in its Place-Classification Methods and Analysis, explains
how to use classification models and some of the strategies for improving model
performance. In addition to transforming categorical features we look at the
interpretation of logistic regression accuracy using the roC curve. In an attempt
to improve model performance, we demonstrate the use of SVMs. Finally, we will
achieve good performance on the test set through Gradient-Boosted Decision Trees
Chapter 6, Words and pixels- Working with Unstructured Data, examines complex,
unstructured data. Then we cover dimensionality reduction techniques such as
the Hashing Vectorizer; matrix decompositions such as PCA, CUR, and NMr
and probabilistic models such as LDA. We also examine image data, including
normalization and thresholding operations, and see how we can use dimensional
reduction techniques to find common patterns among images
Chapter 7, Learning from the Bottom up -Deep Networks and Unsupervised features,
introduces deep neural networks as a way to generate models for complex data
types where features are difficult to engineer. Well examine how neural networks
are trained through back-propagation, and why additional layers make this
optimization intractable
Chapter 8, Sharing models with Prediction Services, describes the three components of
a basic prediction service, and discusses how this design will allow us to share the
results of predictive modeling with other users or software systems
Chapter 9, Reporting and Testing -Iterating on Analytic Systems, teaches severa
strategies for monitoring the performance of predictive models following initial
design, and we look at a number of scenarios where the performance or components
of the model change over time
What you need for this book
You' ll need latest Python version and PySpark version installed, along with the
Jupyter notebook
Who this book is for
This book is designed for business analysts, BI analysts, data scientists, or junior-level
data analysts who are ready to move from a conceptual understanding of advanced
analytics to an expertise in designing and building advanced analytics solutions using
Python You're expected to have basic development experience with python
Conventions
In this book you will find a number of text styles that distinguish between different
kinds of information here are some examples of these styles and an explanation of
their meanin
Code words in text, database table names folder names filenames file extensions
pathnames, dummy URIS, user input, and Twitter handles are shown as follows
Let's start by peeking at the beginning and end of the data using head() and tail (
Any command-line input or output is written as follows:
rdd data coalesce(2). getNumPartitions()
New terms and important words are shown in bold words that you see on
the screen, for example, in menus or dialog boxes, appear in the text like this
Returning to the Files tab, you will notice two options in the top right-hand corner.
Warnings or important notes appear in a box like this
Tips and tricks appear like this
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.