ISLR :: 5.2 The Bootstrap

Published by onesixx on

 

5.2 The Bootstrap

bootstrap์€  learning method ๋˜๋Š” ์ฃผ์–ด์ง„ ์ถ”์ •๋Ÿ‰(estimator)๊ณผ ๊ด€๋ จ๋œ ๋ถˆํ™•์‹ค์„ฑ์„ ์ˆ˜๋Ÿ‰ํ™”ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š”
์•„์ฃผ ๊ฐ•๋ ฅํ•˜๋ฉด์„œ๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ†ต๊ณ„์  tool์ด๋‹ค.

๊ฐ„๋‹จํ•œ ์˜ˆ๋กœ, bootstrap์€ linear regression fit์—์„œ ๊ตฌํ•œ coefficient์˜ standard errors๋ฅผ ์ถ”์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. 
(์‚ฌ์‹ค linear regression ๊ฒฝ์šฐ์—๋Š”  R์—์„œ ํ‘œ์ค€์˜ค์ฐจ์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ์œ ์šฉํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์ง€๋งŒ,
bootstrap์˜ ๊ฐ•์ ์€ R์—์„œ ์ œ๊ณตํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ๋ณ€๋™์„ฑ์˜ ์ธก์ •์ด ์–ด๋ ค์šด learning method์—  ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. )

๋ชจ์ง‘๋‹จ์œผ๋กœ๋ถ€ํ„ฐ  ๋…๋ฆฝ์ ์ธ data-set๋“ค์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์–ป๋Š” ๋Œ€์‹ ์—, 
original data-set์œผ๋กœ๋ถ€ํ„ฐ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ด€์ธก์น˜๋ฅผ  samplingํ•˜์—ฌ ๋ณ„๊ฐœ์˜ data-set์„  ์–ป๋Š”๋‹ค.

์œ„ ๊ทธ๋ฆผ์€ ๊ด€์ธก์น˜๊ฐ€ ๋‹ฌ๋ž‘ 3๊ฐœ๋ฟ์ธ (n=3) ๊ฐ„๋‹จํ•œ  Z data-set(Original Data)์— bootstrap ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ์˜ˆ์ด๋‹ค.
bootstrap data-set ()์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด,  data-set Z ๋กœ๋ถ€ํ„ฐ ์ž„์˜๋กœ n๊ฐœ์˜ ๊ด€์ธก์น˜๋ฅผ ์„ ํƒํ•œ๋‹ค. 
: sampling๋ฐฉ๋ฒ•์€ original data-set์œผ๋กœ๋ถ€ํ„ฐ ๋ณต์›์ถ”์ถœ (with replacement). 

์ด ์˜ˆ์—์„œ,  bootstrap data-set์„ ๋ณด๋ฉด,   3๋ฒˆ์งธ ๊ด€์ธก์น˜๊ฐ€ ๋‘๋ฒˆ, ์ฒซ๋ฒˆ์งธ ๊ด€์ธก์น˜๊ฐ€ ํ•œ๋ฒˆ, ๋‘๋ฒˆ์งธ ๊ด€์ธก์น˜๋Š” ์—†๋‹ค.
 bootstrap data-set์„ ์‚ฌ์šฉํ•˜์—ฌ,  ์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šดbootstrap ์ถ”์ •์น˜()๋ฅผ ๊ตฌํ•œ๋‹ค.
์–ด๋–ค ํฐ ์ˆ˜ B๋ฅผ ์ •ํ•˜๊ณ , ์ด ๊ณผ์ •์„ B๋ฒˆ๋งˆํผ ๋งŒํผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด, 
์„œ๋กœ ๋‹ค๋ฅธ  bootstrap  data-set๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๋Œ€์‘๋˜๋Š” bootstrap ์ถ”์ •์น˜   ๊ตฌํ•œ๋‹ค. 
์•„๋ž˜ ๊ณต์‹์„ ์ด์šฉํ•˜์—ฌ  ์œ„์˜ bootstrap ์ถ”์ฒญ์น˜๋“ค์˜ ํ‘œ์ค€์˜ค์ฐจ(standard error)๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด,

====================================

์˜ˆ์ œ> ๊ฐ€์žฅ ์ข‹์€ ํˆฌ์ž๋ฐฐ๋ถ„ ๋ฐฉ์‹์„ ๊ฒฐ์ •ํ•˜๋Š” ์˜ˆ์ œ
 – X, Y(์ž„์˜์˜ ์ˆ˜)๋งŒํผ์˜ ์ˆ˜์ต์„ ๋‚ด๋Š” 2๊ฐœ์˜ ๊ธˆ์œต์ž์‚ฐ์— ๊ณ ์ •๋œ ๊ธˆ์•ก์„ ๋ถ„์‚ฐํˆฌ์žํ•˜๊ณ ์ž ํ•œ๋‹ค. 
 – ๊ณ ์ •๋œ ์ž๊ธˆ์˜ b ๋น„์œจ ๋งŒํผ์„ ์—,  ๋‚˜๋จธ์ง€ 1โˆ’b ๋งŒํผ Y์— ํˆฌ์žํ•œ๋‹ค.  

๋‘  ์ž์‚ฐ์— ๋Œ€ํ•œ ํˆฌ์ž์ˆ˜์ต๊ณผ ๊ด€๋ จ๋œ ๋ณ€๋™์„ฑ์ด ์กด์žฌํ•˜๋ฏ€๋กœ,  ํˆฌ์ž์— ๋Œ€ํ•œ ์ „์ฒด ์œ„ํ—˜(๋˜๋Š” ๋ถ„์‚ฐ)์„ ์ตœ์†Œํ™”ํ•˜๋Š” b ๋ฅผ ์ฐพ๊ณ ์ž ํ•œ๋‹ค.
์ฆ‰, ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด Var(bX -(1-b)Y) ๋ฅผ ์ตœ์†Œ๋กœ ๋งŒ๋“ค๋ ค๊ณ  ํ•œ๋‹ค.  ์œ„ํ—˜์„ ์ตœ์†Œ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๊ณ , 
    ํ˜„์‹ค์—์„œ ๊ตฌํ• ์ˆ˜ ์—†๋Š” ํŽธ์ฐจ๋ฅผ ์ถ”์ •์น˜๋กœ ๋Œ€์ž…ํ•˜๋ฉด,   

์—ฌ๊ธฐ์„œ,  

 ๋Š” ์•Œ์ˆ˜ ์—†๋Š” ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์—,  X์™€ Y์˜ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ X์™€ Y๋ฅผ ์ถ”์ •ํ•œ๋‹ค.
์ถ”์ •๊ฐ’์„ ์œ„ ์‹์— ๋Œ€์ž…ํ•˜๊ณ ,  ํˆฌ์ž์˜ ๋ถ„์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜๋Š” b ๊ฐ’์„ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค. 

simulated ๋ฐ์ดํ„ฐ์…‹์—์„œ ฮฑ ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์„ ์„ค๋ช… 
๊ฐ ํŒจ๋„์—๋Š” X์™€ Y 100์Œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ชจ์˜ํˆฌ์ž์ˆ˜์ต์œผ๋กœ ํ‘œ๊ธฐํ–ˆ๊ณ ,  ์ด๋Ÿฐ ๋ชจ์˜ํˆฌ์ž์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ  ๋ฅผ ์ถ”์ •ํ•˜์˜€๋‹ค.
์ถ”์ •๋œ  ์„ ์œ„ ์‹์— ๋Œ€์ž…ํ•˜์—ฌ  b ์ถ”์ •์น˜๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.  ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„ ๋Š” 0.532์—์„œ 0.657์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ–๋Š”๋‹ค.

b์— ๋Œ€ํ•œ ์ถ”์ •์น˜ ๋Š” ์‹œ๊ณ„๋ฐฉํ–ฅ์œผ๋กœ 0.576, 0.532, 0.657, 0.651์ด๋‹ค.

๊ทธ๋Ÿผ  ์— ๋Œ€ํ•œ ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ• ๊นŒ?  
์ˆ˜๋Ÿ‰ํ™”๋œ ์ •ํ™•์„ฑ์˜ ๊ธฐ์ค€์ธ  ์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด, 
100๊ฐœ์˜ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋œ X , Y ๊ด€์ธก์น˜๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์œ„ ์‹์— ๋Œ€์ž…ํ•˜์—ฌ ์„ ๊ฐœ์‚ฐํ•˜๋Š” ๊ณผ์ •์„ 1000๋ฒˆ ๋ฐ˜๋ณตํ•œ๋‹ค.
์ด๋ ‡๊ฒŒ  ๊ตฌํ•œ  1,000๊ฐœ์˜  ( )์œผ๋กœ  ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ทธ๋ ค๋ณด๋ฉด ์•„๋ž˜ ์™ผ์ชฝ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค.

์ด ๋ชจ์˜์‹คํ—˜์„ ์œ„ํ•ด ๋ชจ์ˆ˜๋“ค์„  =1, =1.25,  =0.5๋กœ ์„ค์ •๋˜๊ณ ,

=  = 0.6    True  b ๊ฐ’์€  0.6 ์ด๋‹ค๋ผ๊ณ  ์•Œ์ˆ˜ ์žˆ๋‹ค.  

True b๋Š” ํžˆ์Šคํ† ๊ทธ๋žจ์—์„œ ์‹ค์„ ์œผ๋กœ  ํ‘œ๊ธฐํ•˜๊ณ ,
b์— ๋Œ€ํ•œ 1,000๊ฐœ์˜ ์ถ”์ •์น˜( )์˜ ์ „์ฒดํ‰๊ท ์€  =0.5996์œผ๋กœ True b๊ฐ’ 0.6์— ๋งค์šฐ ๊ฐ€๊น๋‹ค.

์ถ”์ •์น˜์˜ ํ‘œ์ค€์˜ค์ฐจ๋Š” = 0.083  ์ด ๊ฒฐ๊ณผSE()๋Š”  ์˜ ์ •ํ™•๋„๋ผ๊ณ  ํ• ์ˆ˜ ์žˆ๋‹ค. 

5.3.4 The Bootstrap

 

library(ISLR)

### DATA Loading
data("Portfolio")
str(Portfolio); head(Portfolio)
'data.frame':	100 obs. of  2 variables:
 $ X: num  -0.895 -1.562 -0.417 1.044 -0.316 ...
 $ Y: num  -0.235 -0.885 0.272 -0.734 0.842 ...
           X          Y
1 -0.8952509 -0.2349235
2 -1.5624543 -0.8851760
3 -0.4170899  0.2718880
4  1.0443557 -0.7341975
5 -0.3155684  0.8419834
6 -1.7371238 -2.0371910

ํ†ต๊ณ„๋Ÿ‰(Statistic)์˜ ์ •ํ™•์„ฑ ์ถ”์ •

bootstrap ์ ‘๊ทผ๋ฒ•์˜ ๊ฐ€์žฅ ํฐ ๊ฐ•์ ์€ ๋ณต์žกํ•œ ์ˆ˜ํ•™๊ณ„์‚ฐ์—†์ด๋„ ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒฝ์šฐ์— ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋ผ๋Š” ๊ฒƒ์ด๋‹ค

R์—์„œ bootstrap ๋ถ„์„์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋‘๋‹จ๊ณ„ ์ž‘์—…์ด ํ•„์š”ํ•˜๋‹ค.
1. ํ†ต๊ณ„๋Ÿ‰์„ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜ ์ƒ์„ฑ
2. boot()์„ ์‚ฌ์šฉํ•˜์—ฌ, ๊ด€์ธก๊ฐ’๋“ค์„ ๋ฐ˜๋ณต์ ์œผ๋กœ  ๋ณต์›์ถ”์ถœํ•˜๋Š” bootstrap ์ˆ˜ํ–‰

1. ํ†ต๊ณ„๋Ÿ‰์„ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜ ์ƒ์„ฑ – b.fn()

b.fn() ํ•จ์ˆ˜๋Š” input์œผ๋กœ data์™€ index๋ฅผ ๋ฐ›๊ณ , ouput์œผ๋กœ b์˜ ์ถ”์ •๊ฐ’ return ํ•œ๋‹ค. 
– input data:  (X, Y)๋ฐ์ดํ„ฐ
– input index: ์–ด๋–ค ๊ด€์ธก๊ฐ’์ด b๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์–ด์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฒกํ„ฐ
– output : ์„ ํƒ๋œ ๊ด€์ธก๊ฐ’์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ  ฮฑ์˜ ์ถ”์ •๊ฐ’์„ return

b.fn <- function(data,index){
    X <- data$X[index]
    Y <- data$Y[index]
    return ( (var(Y) - cov(X,Y)) / (var(X)+var(Y)-2*cov(X,Y)) )  # estimate b
}

b.fn(Portfolio, 1:100) ์€ 100๊ฐœ์˜ ๊ด€์ธก๊ฐ’ ๋ชจ๋‘๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ b ์ถ”์ •ํ•œ๋‹ค.

b.fn(Portfolio, 1:100)
[1] 0.5758321

seed๋ฅผ ์ •ํ•˜๊ณ , sample()์„ ์‚ฌ์šฉํ•˜์—ฌ 1~100๊นŒ์ง€ ๊ด€์ธก๊ฐ’์„ ์ž„์˜๋กœ  ๋ณต์›์ถ”์ถœํ•œ๋‹ค. 
=> ์ด๋Š” ์ƒˆ๋กœ์šด bootstrap ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค์–ด b ์ถ”์ •๊ฐ’์„ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

library(boot)
set.seed(1)

b.fn(Portfolio, sample(x=100,size=100,replace=TRUE,prob=NULL))
[1] 0.5963833

2. boot()์„ ์‚ฌ์šฉํ•˜์—ฌ, ๊ด€์ธก๊ฐ’๋“ค์„ ๋ฐ˜๋ณต์ ์œผ๋กœ  ๋ณต์›์ถ”์ถœํ•˜๋Š” bootstrap ์ˆ˜ํ–‰

boot()๋Š”  ์œ„์™€ ๊ฐ™์€ ์ž‘์—…์„  ์—ฌ๋Ÿฌ๋ฒˆ ์‹คํ–‰ํ•˜์—ฌ b ์— ๋Œ€์‘ํ•˜๋Š” ๋ชจ๋“  ์ถ”์ •๊ฐ’์„ ๊ตฌํ•˜๊ณ , ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๊ฒŒ ํ•ด ์ค€๋‹ค. 
b์— ๋Œ€ํ•œ R=1000์ธ bootstrap ์ถ”์ •์น˜

output ์„ ๋ณด๋ฉด ์›๋ž˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, = 0.575821์ด๊ณ ,
SE() ์˜ bootstrap ์ถ”์ •์น˜๋Š” 0.08861826

boot(data=Portfolio, statistic=b.fn, R=1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Portfolio, statistic = b.fn, R = 1000)
Bootstrap Statistics :
     original        bias    std. error
t1* 0.5758321 -7.315422e-05  0.08861826

๋Œ€๋žต์ ์œผ๋กœ ๋งํ•˜๋ฉด, ๋ชจ์ง‘๋‹จ์œผ๋กœ๋ถ€ํ„ฐ ํ•˜๋‚˜์˜ random sample ์— ๋Œ€ํ•˜์—ฌ  ์€  b ์™€ ํ‰๊ท ์ ์œผ๋กœ ์•ฝ 0.08 ๋งŒํผ ์ฐจ์ด๊ฐ€ ๋‚œ๋‹ค๊ณ  ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š” ์œ„์—์„œ ๊ตฌํ•œ SE()์„ estimating ์ ˆ์ฐจ๋Š” ์ ์šฉ๋ถˆ๊ฐ€ํ•˜๋‹ค.
์™œ๋ƒ๋ฉด, ์‹ค์ œ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์›๋ž˜์˜ ๋ชจ์ง‘๋‹จ์œผ๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.  

๊ทธ๋Ÿฌ๋‚˜ bootstrap approach๋Š” ์ปดํ“จํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ  ์ƒˆ๋กœ์šด sample set๋“ค์„ ์–ป๋Š” ๋ฐฉ๋ฒ•์„ ๋ชจ๋ฐฉํ•˜์—ฌ ์‹คํ–‰(์—๋ฎฌ๋ ˆ์ดํŠธ)ํ•˜๊ธฐ ๋•Œ๋ฌธ์—
์ถ”๊ฐ€์ ์ธ ํ‘œ๋ณธ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ ์—†์ด๋„ ์˜ ๋ณ€๋™์„ ์ถ”์ •ํ• ์ˆ˜ ์žˆ๋‹ค. 

bootstrap ์ถ”์ •์น˜  SE()๋Š” 0.088์ด๊ณ , 1,000๊ฐœ์˜ ๋ชจ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์—ฌ ์–ป์€ ์ถ”์ •์น˜ 0.083๊ณผ ๋งค์šฐ ๋น„์Šทํ•˜๋‹ค.

<ALL>

#05.03.R
library(ISLR)
### DATA Loading
data("Portfolio")
str(Portfolio); head(Portfolio)

#############################################################################
b.fn <- function(data,index){
    X <- data$X[index]
    Y <- data$Y[index]
    return ( (var(Y) - cov(X,Y)) / (var(X)+var(Y)-2*cov(X,Y)) )  # estimate b
}
b.fn(Portfolio, 1:100)

#install.packages("boot")
library(boot)
set.seed(1)
b.fn(Portfolio, sample(x=100,size=100,replace=TRUE,prob=NULL))
boot(Portfolio, b.fn, R=1000)

===================================================================

boot.result<-boot(data=Portfolio, statistic=b.fn, R=1000)

head(boot.result$t)

ggplot(data = NULL, aes(x=boot.result$t))+ ggtitle("estimates of b") +
    geom_histogram( binwidth=0.04, fill="steel blue", color="black") +
    geom_vline(xintercept=boot.result$t0, colour="pink", linetype="longdash") +
    geom_vline(xintercept=0.60, colour="red") +
    xlab("b")

 bootstrap ์ ‘๊ทผ๋ฒ•์ด ์™€ ๊ด€๋ จ๋œ ๋ณ€๋™์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋งํ•ด์ค€๋‹ค.

 

 

 

 

Categories: book:ISLR

onesixx

Blog Owner

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x