Concatenating a list of data frames

Published by onesixx on

https://www.r-bloggers.com/concatenating-a-list-of-data-frames/ – 번역

Concatenating a list of data frames

Data

data <- list()
N <- 10000
for (i in 1:N) {
    data[[i]] = data.frame(index = i,
                           char = sample(letters, 1),
                           z = runif(1))
}
> head(data,3)
[[1]]
  index char         z
1     1    n 0.8099211

[[2]]
  index char         z
1     2    r 0.3929528

[[3]]
  index char         z
1     3    y 0.2688812
> str(data)
List of 10000
 $ :'data.frame':	1 obs. of  3 variables:
  ..$ index: int 1
  ..$ char : Factor w/ 1 level "d": 1
  ..$ z    : num 0.0531
 $ :'data.frame':	1 obs. of  3 variables:
  ..$ index: int 2
  ..$ char : Factor w/ 1 level "d": 1
  ..$ z    : num 0.636
 $ :'data.frame':	1 obs. of  3 variables:
  ..$ index: int 3
  ..$ char : Factor w/ 1 level "h": 1
  ..$ z    : num 0.749
....

결과

library(rbenchmark)
library(plyr); library(data.table)

benchmark(
    do.call(rbind, data),  #1. Naive Solution
    ldply(data, rbind),    #2. plyr :: ldply
    rbind.fill(data),      #3. plyr :: rbind.fill
    rbindlist(data))       #4. data.table :: rbindlist
)

 

Thoughts on Performance

1번 기본 함수인  rbind.data.frame() 은 row를 추가할때마다 Column명을 일일이 확인하고 rearange하다보니, 가장 느리다.
반면, 4번 C로 작성된 rbindlist는 같은 위치에 같은 컬럼이 있다고 가정하고 바로 합치다보니, 가장 빠르다. 

 

Categories: Reshaping

onesixx

Blog Owner

Leave a Reply

Your email address will not be published.