data.table reshape (dcast,melt)
https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf
melt, dcast ==> gather, spread
melt, dcast
dcast하면 key 가 생긴다.
Timeserise to data.table
melt.data.table() & dcast.data.table()
data.table에 맞게 reshape2 package의 함수를 수정한 dcast & melt
Convert DT
to long form where each dob
is a separate observation.
rs <- dcast.data.table(rs,”Step~variable”)
data_melt <- melt.data.table(data, id.vars = c(“A”, “B”, “C”))
data_cast <- dcast.data.table(result, id ~ Param)
Examples
Q1. Calculate total number of rows by month and then sort on descending order
mydata[, .N, by = month] [order(-N)]
The .N operator is used to find count.
Q2. Find top 3 months with high mean arrival delay
mydata[, .(mean_arr_delay = mean(arr_delay, na.rm = TRUE)), by = month][order(-mean_arr_delay)][1:3]
Q3. Find origin of flights having average total delay is greater than 20 minutes
mydata[, lapply(.SD, mean, na.rm = TRUE), .SDcols = c(“arr_delay”, “dep_delay”), by = origin][(arr_delay + dep_delay) > 20]
Q4. Extract average of arrival and departure delays for carrier == ‘DL’ by ‘origin’ and ‘dest’ variables
mydata[carrier == “DL”,
lapply(.SD, mean, na.rm = TRUE),
by = .(origin, dest),
.SDcols = c(“arr_delay”, “dep_delay”)]
Q5. Pull first value of ‘air_time’ by ‘origin’ and then sum the returned values when it is greater than 300
mydata[, .SD[1], .SDcols=”air_time”, by=origin][air_time > 300, sum(air_time)]