ISLR :: 5.3 Lab: CV & BS :: (4) Bootstrap

Published by onesixx on 17-02-0817-02-08

5.3.1 Validation Set Approach
5.3.2 Leave-One-Out CV
5.3.3 k-Fold CV

5.3.4 Bootstrap

linear regression model의 정확도 추정

5.3 Lab: Cross-Validation and the Bootstrap

5.3.4 The Bootstrap

Linear Regression Model 의 정확성 추정

linear model을 fitting하면서 regression coefficients에 대한 변동성(variability) 평가.
Auto 자료에서, Y(mpg:연비)와 X(horsepower:마력)의 Linear regression model의 절편 $\hat\beta_0$ 과 기울기 $\hat\beta_1$ 에 대한 변동성 평가를 위해
lm()함수와 bootstrap을 사용한 결과값을 비교해 보자.

Data

### DATA Loading
Auto <- read.table("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",
                   header=T, sep=",", quote="\"", na.strings=c("NA","-","?"))
Auto.omit.na <- Auto[complete.cases(Auto[,4]),]
#str(Auto.omit.na); head(Auto.omit.na)
Df <- Auto.omit.na

lm ()함수

coef(lm(mpg~horsepower, data=Df))

lm.fit.linear <- lm(mpg~horsepower, data = Df)
summary(lm.fit.linear)$coefficients

              Estimate  Std. Error   t value      Pr(>|t|)
(Intercept) 39.9358610 0.717498656  55.65984 1.220362e-187
horsepower  -0.1578447 0.006445501 -24.48914  7.031989e-81

bootstrap

alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower, data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)

ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Df, statistic = alpha.fn, R = 1000)
Bootstrap Statistics :
      original        bias    std. error
t1* 39.9358610  0.0296667441 0.860440524
t2* -0.1578447 -0.0003113047 0.007411218

lm ()함수의 SE( $\hat\beta_0$ ) = 0.7174 SE( $\hat\beta_1$ ) = 0.0064
bootstrap의 SE( $\hat\beta_0$ ) = 0.8604 SE( $\hat\beta_1$ ) = 0.0074

추정치가 정확히 일치하지는 않는다.
lm()의 SE 공식은 특정 가정을 필요로 한다 (사실 데이터는 선형보다는 비선형에 가깝다.)
bootstrap 특정한 가정이 필요없다. 따라서 오히려 이 예의 경우 더 정확할 수 있다.

ggplot(Df, aes(x=horsepower, y=mpg)) + theme_bw() +
    geom_point(shape=1) +
    geom_smooth(method=lm, se=F, color="steel blue") +
    geom_smooth(color="coral", se=F)

all

#############################################################################
### DATA Loading
Auto <- read.table("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",
                   header=T, sep=",", quote="\"", na.strings=c("NA","-","?"))
Auto.omit.na <- Auto[complete.cases(Auto[,4]),]
#str(Auto.omit.na); head(Auto.omit.na)
Df <- Auto.omit.na

#############################################################################
alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower, data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)
#############################################################################
coef(lm(mpg~horsepower, data=Df))

lm.fit.linear <- lm(mpg~horsepower, data = Df)
summary(lm.fit.linear)$coefficients

#############################################################################
ggplot(Df, aes(x=horsepower, y=mpg)) + theme_bw() +
    geom_point(shape=1) +
    geom_smooth(method=lm, se=F, color="steel blue") +
    geom_smooth(color="coral", se=F)

2차항을 넣으면 좀더 정확해 진다.

#############################################################################
alpha.fn <- function(data,index){
    return ( coef(lm(mpg~horsepower+I(horsepower^2), data=data, subset=index)) ) # estimate alpha
}
alpha.fn(Df, 1:392)

set.seed(1)
alpha.fn(Df, sample(x=392,size=392,replace=T,prob=NULL))
boot(Df, alpha.fn, 1000)
#############################################################################
coef(lm(mpg~horsepower+I(horsepower^2), data=Df))

lm.fit.linear <- lm(mpg~horsepower+I(horsepower^2), data = Df)
summary(lm.fit.linear)$coefficients

auto <- fread("http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",  na.strings=c("NA","-","?"))
dd <- auto[!is.na(horsepower),]

ISLR :: 5.3 Lab: CV & BS :: (4) Bootstrap

5.3 Lab: Cross-Validation and the Bootstrap

5.3.4 The Bootstrap

Linear Regression Model 의 정확성 추정

onesixx

ISLR :: 9.2 Support Vector Classifiers

ISLR :: 8.3 Lab: Decision Trees :: 8.3.3 Bagging and Random Forests

ISLR :: 7.5 Smoothing Splines

ISLR :: 5.3 Lab: CV & BS :: (4) Bootstrap

5.3 Lab: Cross-Validation and the Bootstrap

5.3.4 The Bootstrap

Linear Regression Model 의 정확성 추정

onesixx

Related Posts

ISLR :: 9.2 Support Vector Classifiers

ISLR :: 8.3 Lab: Decision Trees :: 8.3.3 Bagging and Random Forests

ISLR :: 7.5 Smoothing Splines