The_Horvitz_Thompson_Estimator.pdf霍维茨汤姆森估计量。Horvit

文件名称: The_Horvitz_Thompson_Estimator.pdf

所属分类: 讲义

开发工具:

文件大小: 88kb

下载次数: 0

上传时间: 2019-07-21

提供者: qq_16******

下载 (88kb)

不能下载？报告错误

详细说明：霍维茨汤姆森估计量。Horvitz-Thompson Estimator. 用于不等概不放回抽样的总体量估计。Example: The H-T Estimator for SRS without replacement Consider taking a simple random sample(SRS. without replacement, of size n from a population of size N. The inclusion and joint inclusion probabilities are 丌; 丌一 Note that the Horvitz-Thompson estimator for Srs without replacement becomes ∑"-∑ 泛=1 √ the usual estimator of the population total T for an Srs derived earlier Do you think the Horvitz-Thompson variance will be Var(T)=VN(N-n)o2/m as it was before for srs? Example: The H-T Estimator for sampling with replacement Reconsider the scenario for the Hansen-Hurwitz estimator, where we sample with replace ment from a population such that the probabilities of selection on any given draw are unequal These probabilities were denoted p1,.,PN for a population of size N. The inclusion and joint inclusion probabilities are With these then, the Horvitz-Thompson estimator is: Tn=> yi (1-m n, and the H-T variance is a, horrendous mess For sampling with replacement, it is generally easier to use the Hansen-Hurwitz esti- mator Although the Horvitz-Thompson estimator can be used for any probability sampling plan, there is often a simpler way to derive the estimator and its variance than through inclusion probabilities Comparison of H-H and H-T Estimators for the Farm Example Consider taking a random sample of size n= 5(with replacement) from the N= 625 pixels in the map of the farms given in class, and estimating the total number of workers on all the farms. This was done earlier for the Hansen-Hurwitz estimator, and will be repeated here for the Horvitz-Thompson estimator Ten samples of size n= 5 will be taken where the individual farms will be selected accordin to the probability proportional to size"(PPS)sampling described earlier. Specifically, a pair of integers between 1 and 25 will be chosen at random and the farm with the sponding coordinates on the map will be selected First, consider the Horvitz-Thompson estimator for the single sample of size 5 given in class estinate the total number of workers using the Hansen-Hurwitz estinator. The sanple is repeated in the table below, along with the relevant components for the H-T estimator Coordinates Data 2=1-(1-p) D25/625=.0080 0394 19,25 C828/625=.0448 2048 21,21 B412/625=.0192 0924 15.4 A814/625-.0224 .1071 7.20 A313/625=.0208 0998 As the samples here were distinct, the horvitz -Thompson estimator of the total number of 2 4 N-.0394.204+.0924,1071.098 237.94 worker e Recall that the estimated number of workers for the hansen -hurwitz estimator com- puted earlier was 227.66 workers. Since the true total number of workers was T= 247 does this make the Horvitz-Thompson estimator better To compute the estimated variance of this estimated total number of workers, we need first to compute the joint inclusion probabilities for each pair of units in the sample. Using the formula derived in class, given as: Ti;=Ti+T-(1-(1-p; m), the table below gives the ten Tii values corresponding to the ten pairs of units Unit nuinber Unit 4 0066.0029.0034.0032 0156.01810169 0081.0075 0087 The estimated variance is then computed as (x)+()m ≠ 1-.0394 1-.0998 03942 +2 006-(0394)(2048))(2)(8) 0087-(1071)(099(8)(3) (.0394)(2048 0066 4071)(.099.008 11191.15-2(4922037)=1347077 giving a standard error of SE(TT)=v1347077=36 70 workers. This is essentially the same as that found (36.75) with the Hansen-Hurwitz estimator This task of sampling 5 pixels from the farm area was repeated 10 times, producing the following H-H and H-t estimators. R code to compute these estimates is given below the table Hansen-Hurwitz Horvitz-Thompson Sample SE(GD) SE(TT) 2276636.75237.9436.70 221446.7622 47.01 3 160.9020.24171.3520.66 4 2065544.93212.70 4110134.56417473462 6 2648847752760548.04 了 1178719.611257419.48 236.552444242642414 2482169832568369.82 10 2479257942519557.80 A e233.27 241.33 Anything interesting about this table R Code Sets the sample size y<-c(2,8,4,83) Sets the vector of y-values Compute H-H estimate and se p <-(1/625)*C(5, 28, 12, 14, 13)# Computes the vector of selection probs tau.p <-(1/n)*sum(y/p) t Computes the H-H estimate print(tau. p) i prints the h-h estimate var tau. p var(y/p)/n t Computes the variance of the H-H estimate print(sqrt(var tau. p)) Prints se of h-h estimate f Compute h-t estimate and se 1-(1-p)^n #f Computes the vector of inclusion probs tau.pi <-sum(y/pi) Computes the H-T estimate print (tau. pi) Prints the h-t estimate Compute the estimated variance of the H-T estimate by computing the two terms (the single sum and the double sum) separately var1 < sum(y 2*(1-p1/pi 2)# First term of variance Second term of variance: the multiplier 2 below is because the pair i,j is the same as ] 1 var2 <-o for(i in 1:(n-1))t for(j in (1+1): n)t piij <-piliJ pi[j]-(1-(1-p[i]-p[])n)#joint inclusion probability var2 <-var2+ 2*(y [i]*y[j1piij)*(pi. 1j- pi[i]*pi [j])/(pili]*pi [j]) var tau. pi < var1 var2 f Computes the h-t estimated var sd. tau. pi < sqrt(var tau. pi)# Computes the standard dev'n print(sd. tau. pi) Prints the h-T sD f Note: the above program is inefficient in that it uses loops and it can be done more efficiently, but less transparently, with clever use of matrix commands. However, the above program still runs in just a few seconds for n=500 29

(系统自动生成,下载前可以参看下载内容)