您好,欢迎光临本网站![请登录][注册会员]  
文件名称: Mastering-Predictive-Analytics-with-Python.pdf.pdf
  所属分类: 其它
  开发工具:
  文件大小: 6mb
  下载次数: 0
  上传时间: 2019-09-13
  提 供 者: weixin_********
 详细说明:Mastering-Predictive-Analytics-with-Python.pdfContents Preface Chapter 1: From Data to decisions- Getting started with Analytic Applications Designing an advanced analytic solution Data layer warehouses, lakes, and streams Modeling layer 468 Deployment layer 14 Reporting layer 15 Case study: sentiment analysis of social media feeds Data input and transformation Sanity checking Model development 788 Scoring 19 Visualization and reporting 19 Case study: targeted e-mail campaigns 19 Data input and transformation 20 Sanity checking 21 Model development 21 Scoring 21 Visualization and reporting 21 Summary 23 Chapter 2: Exploratory Data Analysis and visualization in Python 25 Exploring categorical and numerical data in IPython 26 Installing IPython notebook 27 The notebook interface 27 oading and inspecting data 30 Basic manipulations- grouping filtering, mapping, and pivoting 33 Charting with Matplotlib 38 Time series analysis 46 Cleaning and converting 46 Time series diagnostics 48 Joining signals and correlation 50 Working with geospatial data 53 oading geospatial data 53 Working in the cloud 55 Introduction to PySpark 56 Creating the SparkContext 56 Creating an Rdd 58 Creating a spark dataFrame 59 Summary Chapter 3: Finding patterns in the noise- Clustering and Unsupervised Learning 63 Similarity and distance metrics 64 Numerical distance metrics 64 Correlation similarity metrics and time series 70 Similarity metrics for categorical data 78 K-means clustering 83 Affinity propagation -automatically choosing cluster numbers 89 k-medoids 93 Agglomerative clustering 94 Where agglomerative clustering fails 96 Streaming clustering in Spark 100 Summary 04 Chapter 4: Connecting the Dots with Models -Regression Methods 105 Linear regression 106 Data preparation 109 Model fitting and evaluation 114 Statistical significance of regression outputs 119 Generalize estimating equations 124 Mixed effects models 126 Time series data 127 Generalized linear models 128 Applying regularization to linear models 129 Tree methods 132 Decision trees 132 Random forest 138 Scaling out with PySpark- predicting year of song release 141 Summar 143 Chapter 5: Putting Data in its Place- Classification Methods and Analysis 145 Logistic regression 146 Multiclass logistic classifiers: multinomial regression 150 Formatting a dataset for classification problems 151 Learning pointwise updates with stochastic gradient descent 155 Jointly optimizing all parameters with second-order methods 158 Fitting the model 162 Evaluating classification models 165 Strategies for improving classification models 169 Separating Nonlinear boundaries with Support vector machines 172 Fitting and SVM to the census data 174 Boosting-combining small models to improve accuracy Gradient boosted decision trees 177 Comparing classification methods 180 Case study: fitting classifier models in pyspark 182 Summary 184 Chapter 6: Words and pixels- Working with Unstructured data 185 Working with textual data 186 Cleaning textual data 186 Extracting features from textual data 189 Using dimensionality reduction to simplify datasets 192 Principal component analysis 193 Latent Dirichlet Allocation 205 Using dimensionality reduction in predictive modeling 209 Images 209 Cleaning image data 210 Thresholding images to highlight objects 213 Dimensionality reduction for image analysis 216 Case Study: Training a Recommender System in Py Spark 220 Summary 222 Chapter 7: Learning from the Bottom Up-Deep Networks and Unsupervised Features 223 Learning patterns with neural networks 224 A network of one-the perceptron 224 Combining perceptrons -a single-layer neural network 226 Parameter fitting with back-propagation 229 Discriminative versus generative models 234 Vanishing gradients and explaining away 235 Pretraining belief networks 238 Using dropout to regularize networks 241 Convolutional networks and rectified units 242 Compressing Data with autoencoder networks 246 Optimizing the learning rate 247 The TensorFlow library and digit recognition 249 The mnist data 250 Constructing the network 252 Summary 256 Chapter 8: Sharing Models with Prediction Services 257 The architecture of a prediction service 258 Clients and making requests 260 The GET requests 260 The Post request 262 The HEAd request 262 The PUT request 262 The dElete request 263 Server- the web traffic controller 263 Application-the engine of the predictive services 265 Persisting information with database systems 266 Case study-logistic regression service 267 Setting up the database 268 The web server 271 The web application 273 The flow of a prediction service-training a model 274 On-demand and bulk prediction 283 Summary 287 Chapter 9: Reporting and Testing-Iterating on Analytic Systems 289 Checking the health of models with diagnostics 290 Evaluating changes in model performance 290 Changes in feature importance 294 Changes in unsupervised model performance 295 Iterating on models through a/B testing 297 Experimental allocation -assigning customers to experiments 298 Deciding a sample size 299 Multiple hypothesis testing 302 Guidelines for communication 302 Translate terms to business values 303 Visualizing results 303 Case Study: building a reporting service The report server 304 The report application 305 The visualization layer 306 Summary 310 Index 311 Preface In Mastering Predictive Analytics with Python, you will work through a step-by-step process to turn raw data into powerful insights. Power-packed with case studies and code examples using popular open source python libraries this volume illustrates the complete development process for analytic applications. The detailed examples illustrate robust and scalable applications for common use cases. You will learn to quickly apply these methods to your own data What this book covers Chapter l, From data to Decisions- Getting started with Analytic applications teaches you to describe the core components of an analytic pipeline and the ways in which they interact. We also examine the differences between batch and streaming processes, and some use cases in which each type of application is well-suited. We walk through examples of both basic applications using both paradigms and the design decisions needed at each step Chapter 2, Exploratory data Analysis and visualization in Python, examines many of the tasks needed to start building analytical applications. Using the IPython notebook well cover how to load data in a file into a data frame in pandas, rename columns in the dataset, filter unwanted rows, convert types, and create new columns. In addition, we'll join data from different sources and perform some basic statistica analyses using aggregations and pivots Chapter 3, Finding patterns in the noise -Clustering and unsupervised Learning, shows you how to identify groups of similar items in a dataset. It's an exploratory analysis that we might frequently use as a first step in deciphering new datasets. We explore different ways of calculating the similarity between data points and describe what kinds of data these metrics might best apply to. We examine both divisive clustering algorithms, which split the data into smaller components starting from a single group, and agglomerative methods, where every data point starts as its own cluster. Using a number of datasets we show examples where these algorithms will perform better or worse, and some ways to optimize them. We also see our first(small) data pipeline, a clustering application in PySpark using streaming data Chapter 4, Connecting the Dots with Models - Regression Methods, examines the fitting of several regression models, including transforming input variables to the correct scale and accounting for categorical features correctly. We fit and evaluate a linear regression, as well as regularized regression models. We also examine the use of tree-based regression models, and how to optimize parameter choices in fitting th en Finally, we will look at a sample of random forest modeling using PySpark, which can be applied to larger datasets Chapter 5, Putting Data in its Place-Classification Methods and Analysis, explains how to use classification models and some of the strategies for improving model performance. In addition to transforming categorical features we look at the interpretation of logistic regression accuracy using the roC curve. In an attempt to improve model performance, we demonstrate the use of SVMs. Finally, we will achieve good performance on the test set through Gradient-Boosted Decision Trees Chapter 6, Words and pixels- Working with Unstructured Data, examines complex, unstructured data. Then we cover dimensionality reduction techniques such as the Hashing Vectorizer; matrix decompositions such as PCA, CUR, and NMr and probabilistic models such as LDA. We also examine image data, including normalization and thresholding operations, and see how we can use dimensional reduction techniques to find common patterns among images Chapter 7, Learning from the Bottom up -Deep Networks and Unsupervised features, introduces deep neural networks as a way to generate models for complex data types where features are difficult to engineer. Well examine how neural networks are trained through back-propagation, and why additional layers make this optimization intractable Chapter 8, Sharing models with Prediction Services, describes the three components of a basic prediction service, and discusses how this design will allow us to share the results of predictive modeling with other users or software systems Chapter 9, Reporting and Testing -Iterating on Analytic Systems, teaches severa strategies for monitoring the performance of predictive models following initial design, and we look at a number of scenarios where the performance or components of the model change over time What you need for this book You' ll need latest Python version and PySpark version installed, along with the Jupyter notebook Who this book is for This book is designed for business analysts, BI analysts, data scientists, or junior-level data analysts who are ready to move from a conceptual understanding of advanced analytics to an expertise in designing and building advanced analytics solutions using Python You're expected to have basic development experience with python Conventions In this book you will find a number of text styles that distinguish between different kinds of information here are some examples of these styles and an explanation of their meanin Code words in text, database table names folder names filenames file extensions pathnames, dummy URIS, user input, and Twitter handles are shown as follows Let's start by peeking at the beginning and end of the data using head() and tail ( Any command-line input or output is written as follows: rdd data coalesce(2). getNumPartitions() New terms and important words are shown in bold words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this Returning to the Files tab, you will notice two options in the top right-hand corner. Warnings or important notes appear in a box like this Tips and tricks appear like this
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 输入关键字,在本站1000多万海量源码库中尽情搜索: