data.table reshape (dcast,melt)

Published onesixx on

R

https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf

melt, dcast ==> gather, spread

R

melt, dcast

R

dcast하면 key 가 생긴다.

R

Timeserise to data.table

R

melt.data.table() & dcast.data.table()

data.table에 맞게 reshape2 package의 함수를 수정한 dcast & melt

Convert DT to long form where each dob is a separate observation.

rs <- dcast.data.table(rs,”Step~variable”)

data_melt <- melt.data.table(data, id.vars = c(“A”, “B”, “C”))
data_cast <- dcast.data.table(result, id ~ Param)

R

Examples

Q1. Calculate total number of rows by month and then sort on descending order

mydata[, .N, by = month] [order(-N)]

The .N operator is used to find count.  

Q2. Find top 3 months with high mean arrival delay

mydata[, .(mean_arr_delay = mean(arr_delay, na.rm = TRUE)), by = month][order(-mean_arr_delay)][1:3]

Q3. Find origin of flights having average total delay is greater than 20 minutes

mydata[, lapply(.SD, mean, na.rm = TRUE), .SDcols = c(“arr_delay”, “dep_delay”), by = origin][(arr_delay + dep_delay) > 20]

Q4.  Extract average of arrival and departure delays for carrier == ‘DL’ by ‘origin’ and ‘dest’ variables

mydata[carrier == “DL”,
        lapply(.SD, mean, na.rm = TRUE),
        by = .(origin, dest),
        .SDcols = c(“arr_delay”, “dep_delay”)]

Q5. Pull first value of ‘air_time’ by ‘origin’ and then sum the returned values when it is greater than 300

mydata[, .SD[1], .SDcols=”air_time”, by=origin][air_time > 300, sum(air_time)]

Categories: Reshaping

onesixx

Blog Owner

guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x