用rpart包建立regression tree,并利用prune函数进行修剪

浏览: 1919

body fat data is in TH.data

library(TH.data)

library(rpart)

data("bodyfat", package = "TH.data")

help("bodyfat",package="TH.data")

## starting httpd help server ... done

# head(bodyfat)

user rpart package to “grow” regression tree.Response variable and covariates defined by model formula is same way as lm().we grow a large initial tree.

bodyfat_rpart<-rpart(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,data=bodyfat,

                     # user control arg to restrict of obs for potential binary split to 10:

                     control=rpart.control(minsplit=10))

print the graphical tree with partykit

obs that satisfy the condition shown for each node go to left and those that do not to right

library(partykit)

## Loading required package: grid

plot(as.party(bodyfat_rpart),

     tp_args=list(id=FALSE))

the cptable element of rpart object call tell us if the tree should be “pruned”

(cptable里面的元素能告诉我们这棵树是否需要修剪)

see xerror values … tree with least error has 4 splits:

print(bodyfat_rpart$cptable)

##        CP nsplit rel error xerror    xstd

## 1 0.66290      0    1.0000 1.0360 0.17147

## 2 0.09376      1    0.3371 0.4870 0.09825

## 3 0.07704      2    0.2433 0.4651 0.08414

## 4 0.04508      3    0.1663 0.4090 0.06790

## 5 0.01845      4    0.1212 0.3622 0.06585

## 6 0.01819      5    0.1028 0.3049 0.06312

## 7 0.01000      6    0.0846 0.2799 0.06086

we preserve the minimum xerror in opt(我们将最小xerror的赋值给opt)

opt<-which.min(bodyfat_rpart$cptable[,"xerror"])

here we prune back the large initial tree:(我们对初始树进行修剪)

cp<-bodyfat_rpart$cptable[opt,"CP"]

bodyfat_prune<-prune(bodyfat_rpart,cp=cp)

and then we plot the resulting pruned tree:(我们对修建后的树进行画图)

plot(as.party(bodyfat_prune),

     tp_args=list(id=FALSE))

Based on this model,one can predict the (unkown) body fact content based on covariate values … so we do just that using the data we have:(我们利用建立的模型对原有数据进行预测):

DEXfat_pred<-predict(bodyfat_prune,

                     newdata=bodyfat)

xlim<-range(bodyfat$DEXfat)

plot(DEXfat_pred~bodyfat$DEXfat,

     data=bodyfat,xlab="Observed",

     ylab="Predicted",

     ylim=xlim,

     xlim=xlim)

abline(a=0,b=1)

other approach to recursive partitioning(其他递归分隔方法)

other approach implemented in 'party' package

one each node of those trees,we test for independence bewteen any of the covariates and a split made when p-value is small.

Advantage:Do not have to prune back large initial trees because we are using a statistic motivated stopping criterion.

called a “Conditional Inference Tree”:

we do it for body fat:

library(party)

## Loading required package: zoo

## 

## Attaching package: 'zoo'

## 

## 下列对象被屏蔽了from 'package:base':

## 

##     as.Date, as.Date.numeric

## 

## Loading required package: sandwich

## Loading required package: strucchange

## Loading required package: modeltools

## Loading required package: stats4

## 

## Attaching package: 'party'

## 

## 下列对象被屏蔽了from 'package:partykit':

## 

##     ctree, ctree_control, edge_simple, mob, mob_control,

##     node_barplot, node_bivplot, node_boxplot, node_inner,

##     node_surv, node_terminal

bodyfat_ctree<-ctree(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,

                     data=bodyfat)

plot(bodyfat_ctree)

推荐 0
本文由 谢佳标 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

0 个评论

要回复文章请先登录注册