加入收藏
大学数学, 研究生数学,大学数学资料下载,免费 大学数学课件,研究生数学课件,免费下载
R语言-cv.glm() 函数 (中文帮助)
2017-01-01 22:46:21
cv.glm {boot}    

[转载请注明出处,胡桃木屋 mathapply.cn ”R语言中文帮助“工作室燕玲媛译]


Cross-validation for Generalized Linear Models

广义线性模型的交叉验证



Description(描述)
     This function calculates the estimated K-fold cross-validation prediction error for generalized linear models. 
      该函数用于计算广义线性模型中估算K折交叉验证的预计误差
Usage(用法)
cv.glm(data, glmfit, cost, K)
Arguments(参数)


Data    
     A matrix or data frame containing the data.The rows should be cases and the columns correspond to variables, one of which is the response.

     数据中包含了一个矩阵或数据框架, 行作为样本,且各列与变量对应,其中一列是是响应变量。


glmfit    
     An object of class "glm" containing the results of a generalized linear model fitted to data.

     一个“glm”类的对象,它包含了一个广义线性模型拟合数据的结果。

 

Cost    
     A function of two vector arguments specifying the cost function for the cross-validation. The first argument to cost should correspond to the observed responses and the second argument should correspond to the predicted or fitted responses from the generalized linear model.cost must return a non-negative scalar value. The default is the average squared error function.

     带两个向量参数的函数,用于指定教程验证时的代价函数,cost的第一个参数应该是观测的响应变量,第二个参数为广义线性模型中预报或拟合的响应变量,cost函数必须返回一个非负的标量值。该参数的默认值是均方误差函数。

 

K    
     The number of groups into which the data should be split to estimate the cross-validation prediction error. The value of K must be such that all groups are of approximately equal size. If the supplied value of K does not satisfy this criterion then it will be set to the closest integer which does and a warning is generated specifying the value of K used. The default is to set K equal to the number of observations in data which gives the usual leave-one-out cross-validation.
     数据组的数量应该分区到估算的交叉验证预计误差,所有数据组中K的数值必须基本相同。如果K的初始值不能满足这项规范,那么它将被设定为满足此规范的最接近的整数并生成一个对使用此K数值的具体说明。

Details(详细说明)

    The data is divided randomly into K groups. For each group the generalized linear model is fit to data omitting that group, then the function cost is applied to the observed responses in the group that was omitted from the fit and the prediction made by the fitted models for those observations. When K is the number of observations leave-one-out cross-validation is used and all the possible splits of the data are used.When K is less than the number of observations the K splits to be used are found by randomly partitioning the data into K groups of approximately equal size.  In this latter case a certain amount of bias is introduced. This can be reduced by using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997). The second value returned in delta is the estimate adjusted by this method.

     数据被随机分入K个数组,广义线性模型会调整适应每组数据并删去此数据组,然后成本函数将应用到数据组中被删去响应参数,调整后的模型会依据这些观测值生成预测。当K的数量与观测值相等时弃一法交叉验证将会使用并且数据中所有可能的分区也将使用。当K的数量比观测值数量少时,数据将会被随机分配进K个大小近似相同的分区。在后一种情况下一个确定值的偏差将会被引进,并能通过使用一个简单的调整降低它的值。第二个值会通过这个方法调整为变量值。

Value(返回值)
     The returned value is a list with the following components.
     返回值是由以下部分组成的:


call    

     The original call to cv.glm.
     初始访问值
K    
     The value of K used for the K-fold cross validation.
     K折交叉验证所使用的K值
delta    
    A vector of length two. The first component is the raw cross-validation estimate of prediction error. The second component is the adjusted cross-validation estimate.The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation.
    长度为2的一个矢量值。第一部分是交叉验证的预计误差第二部分是调整的交叉验证估差,调整值是用于填补通过弃一法交叉验证引入的偏差。
seed    
    The value of .Random.seed when cv.glm was called.
    当cv.glm被引用时,随机种子的值。

Side Effects(副作用)
The value of .Random.seed is updated.

References(参考文献)
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. Wadsworth.
Burman, P. (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and repeated learning-testing methods. Biometrika, 76, 503–514
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1986) How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81, 461–470.
Stone, M. (1974) Cross-validation choice and assessment of statistical predictions (with Discussion). Journal of the Royal Statistical Society, B, 36, 111–147.

See Also(另见)
     glm, glm.diag, predict

Examples举例
# leave-one-out and 6-fold cross-validation prediction error for
# the mammals data set.
data(mammals, package="MASS")
mammals.glm <- glm(log(brain) ~ log(body), data = mammals)
(cv.err <- cv.glm(mammals, mammals.glm)$delta)
(cv.err.6 <- cv.glm(mammals, mammals.glm, K = 6)$delta)
    
# As this is a linear model we could calculate the leave-one-out
# cross-validation estimate without any extra model-fitting.
muhat <- fitted(mammals.glm)
mammals.diag <- glm.diag(mammals.glm)
(cv.err <- mean((mammals.glm$y - muhat)^2/(1 - mammals.diag$h)^2))

# leave-one-out and 11-fold cross-validation prediction error for
# the nodal data set.  Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
(cv.err <- cv.glm(nodal, nodal.glm, cost, K = nrow(nodal))$delta)

(cv.11.err <- cv.glm(nodal, nodal.glm, cost, K = 11)$delta


[Package boot version 1.3-18 Index]
赞一个(87) | 阅读(6303)
上一篇:R语言-cum3 () 函数 (中文帮助)
下一篇:R语言-aids数据 (中文帮助)
 

胡桃木屋版权所有@2013 湘ICP备13006789号-1