ISLR :: 5.3 Lab: CV & BS :: (4) Bootstrap

Published by onesixx on

 

5.3.1 Validation Set Approach
5.3.2 Leave-One-Out CV
5.3.3 k-Fold CV 

5.3.4 Bootstrap

linear regression model의 정확도 추정 

5.3  Lab: Cross-Validation and the Bootstrap

5.3.4 The Bootstrap

Linear Regression Model 의 정확성 추정

linear model을 fitting하면서 regression coefficients에 대한 변동성(variability) 평가. 
Auto 자료에서, Y(mpg:연비)와 X(horsepower:마력)의 Linear regression model의 절편과 기울기 에 대한 변동성 평가를 위해
lm()함수와 bootstrap을 사용한 결과값을 비교해 보자. 

Data

### DATA Loading
Auto <- read.table("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",
                   header=T, sep=",", quote="\"", na.strings=c("NA","-","?"))
Auto.omit.na <- Auto[complete.cases(Auto[,4]),]
#str(Auto.omit.na); head(Auto.omit.na)
Df <- Auto.omit.na

lm ()함수

coef(lm(mpg~horsepower, data=Df))

lm.fit.linear <- lm(mpg~horsepower, data = Df)
summary(lm.fit.linear)$coefficients
              Estimate  Std. Error   t value      Pr(>|t|)
(Intercept) 39.9358610 0.717498656  55.65984 1.220362e-187
horsepower  -0.1578447 0.006445501 -24.48914  7.031989e-81

bootstrap

alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower, data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Df, statistic = alpha.fn, R = 1000)
Bootstrap Statistics :
      original        bias    std. error
t1* 39.9358610  0.0296667441 0.860440524
t2* -0.1578447 -0.0003113047 0.007411218

lm ()함수의 SE() =  0.7174  SE() = 0.0064
bootstrap의 SE() = 0.8604 SE() = 0.0074

추정치가 정확히 일치하지는 않는다. 
lm()의 SE 공식은 특정 가정을 필요로 한다 (사실 데이터는 선형보다는 비선형에 가깝다.)
bootstrap 특정한 가정이 필요없다. 따라서 오히려 이  예의 경우 더 정확할 수 있다. 

ggplot(Df, aes(x=horsepower, y=mpg)) + theme_bw() +
    geom_point(shape=1) +
    geom_smooth(method=lm, se=F, color="steel blue") +
    geom_smooth(color="coral", se=F)

 

all

#############################################################################
### DATA Loading
Auto <- read.table("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",
                   header=T, sep=",", quote="\"", na.strings=c("NA","-","?"))
Auto.omit.na <- Auto[complete.cases(Auto[,4]),]
#str(Auto.omit.na); head(Auto.omit.na)
Df <- Auto.omit.na

#############################################################################
alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower, data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)
#############################################################################
coef(lm(mpg~horsepower, data=Df))

lm.fit.linear <- lm(mpg~horsepower, data = Df)
summary(lm.fit.linear)$coefficients

#############################################################################
ggplot(Df, aes(x=horsepower, y=mpg)) + theme_bw() +
    geom_point(shape=1) +
    geom_smooth(method=lm, se=F, color="steel blue") +
    geom_smooth(color="coral", se=F)

 

2차항을 넣으면 좀더 정확해 진다. 

#############################################################################
alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower+I(horsepower^2), data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)
#############################################################################
coef(lm(mpg~horsepower+I(horsepower^2), data=Df))

lm.fit.linear <- lm(mpg~horsepower+I(horsepower^2), data = Df)
summary(lm.fit.linear)$coefficients

 

 

 

 

auto <- fread("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",  na.strings=c("NA","-","?"))
dd <- auto[!is.na(horsepower),]
Categories: book:ISLR

onesixx

Blog Owner

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x