r - Using lapply to output values between date ranges within different factor levels -
i have 2 dataframes, 1 representing daily sales figures of different stores (df1) , 1 representing when each store has been audited (df2). need create new dataframe displaying sales information each site taken 1 week before each audit (i.e. information in df2). example data, firstly daily sales figures different stores across period:
dates <- as.data.frame(seq(as.date("2015/12/30"), as.date("2016/4/7"),"day")) sales <- as.data.frame(matrix(sample(0:50, 30*10, replace=true), ncol=3)) df1 <- cbind(dates,sales) colnames(df1) <- c("dates","site.a","site.b","site.c")
and dates of each audit across different stores:
store<- c("store.a","store.a","store.b","store.c","store.c") audit_dates <- as.data.frame(as.posixct(c("2016/1/4","2016/3/1","2016/2/1","2016/2/1","2016/3/1"))) df2 <- as.data.frame(cbind(store,audit_dates )) colnames(df2) <- c("store","audit_dates")
of note there uneven amount of dates within each output (i.e. there may not full weeks worth of information prior store audits). have asked question addressing similar problem creating dataframe lapply function different numbers of rows. below shows answer work example if consider information 1 store:
library(lubridate) ##data input store.a_dates <- as.data.frame(seq(as.date("2015/12/30"), as.date("2016/4/7"),"day")) store.a_sales <- as.data.frame(matrix(sample(0:50, 10*10, replace=true), ncol=1)) store.a_df1 <- cbind(store.a_dates,store.a_sales) colnames(store.a_df1) <- c("store.a_dates","store.a_sales") store.a_df2 <- as.date(c("2016/1/3","2016/3/1")) ##output store.a_output<- lapply(store.a_df2, function(x) {store.a_df1[difftime(store.a_df1[,1], x - days(7)) >= 0 & difftime(store.a_df1[,1], x) <= 0, ]}) n1 <- max(sapply(store.a_output, nrow)) output <- data.frame(lapply(store.a_output, function(x) x[seq_len(n1),]))
but don't know how multiple sites.
try this:
# renamed vars convenience... colnames(df1) <- c("t","store.a","store.b","store.c") colnames(df2) <- c("store","t") library(tidyr) library(dplyr) # gather df1 df1 , df2 have same format: df1 = gather(df1, store, sales, -t) head(df1) t store sales 1 2015-12-30 store.a 16 2 2015-12-31 store.a 24 3 2016-01-01 store.a 8 4 2016-01-02 store.a 42 5 2016-01-03 store.a 7 6 2016-01-04 store.a 46 # lapply call not iterate on actual values, indexes, allows # subset data comfortably: r <- lapply(1:nrow(df2), function(i) { audit.t = df2[i, "t"] #time of audit audit.s = df1[, "store"] == df2[i, "store"] #store audited df = df1[audit.s, ] #data audited store df[, "audited"] = audit.t #add column audit date week_before = difftime(df[, "t"], audit.t - (7*24*3600)) >= 0 week_audit = difftime(df[, "t"], audit.t) <= 0 df[week_before & week_audit, ] })
does give proper subsets?
also, summarise results:
r = do.call("rbind", r) %>% group_by(audited, store) %>% summarise(sales = sum(sales)) r audited store sales <time> <chr> <int> 1 2016-01-04 store.a 97 2 2016-02-01 store.b 156 3 2016-02-01 store.c 226 4 2016-03-01 store.a 115 5 2016-03-01 store.c 187
Post a Comment