r - Using lapply to output values between date ranges within different factor levels -


i have 2 dataframes, 1 representing daily sales figures of different stores (df1) , 1 representing when each store has been audited (df2). need create new dataframe displaying sales information each site taken 1 week before each audit (i.e. information in df2). example data, firstly daily sales figures different stores across period:

dates <- as.data.frame(seq(as.date("2015/12/30"), as.date("2016/4/7"),"day")) sales <- as.data.frame(matrix(sample(0:50, 30*10, replace=true), ncol=3))  df1 <- cbind(dates,sales) colnames(df1) <- c("dates","site.a","site.b","site.c") 

and dates of each audit across different stores:

store<- c("store.a","store.a","store.b","store.c","store.c") audit_dates <- as.data.frame(as.posixct(c("2016/1/4","2016/3/1","2016/2/1","2016/2/1","2016/3/1"))) df2 <- as.data.frame(cbind(store,audit_dates )) colnames(df2) <- c("store","audit_dates") 

of note there uneven amount of dates within each output (i.e. there may not full weeks worth of information prior store audits). have asked question addressing similar problem creating dataframe lapply function different numbers of rows. below shows answer work example if consider information 1 store:

library(lubridate) ##data input store.a_dates <- as.data.frame(seq(as.date("2015/12/30"), as.date("2016/4/7"),"day")) store.a_sales <- as.data.frame(matrix(sample(0:50, 10*10, replace=true), ncol=1))  store.a_df1 <- cbind(store.a_dates,store.a_sales) colnames(store.a_df1) <- c("store.a_dates","store.a_sales") store.a_df2 <- as.date(c("2016/1/3","2016/3/1"))  ##output store.a_output<- lapply(store.a_df2, function(x) {store.a_df1[difftime(store.a_df1[,1], x - days(7)) >= 0 & difftime(store.a_df1[,1], x) <= 0, ]}) n1 <- max(sapply(store.a_output, nrow)) output <- data.frame(lapply(store.a_output,  function(x) x[seq_len(n1),])) 

but don't know how multiple sites.

try this:

# renamed vars convenience... colnames(df1) <- c("t","store.a","store.b","store.c") colnames(df2) <- c("store","t")  library(tidyr) library(dplyr)  # gather df1 df1 , df2 have same format:  df1 = gather(df1, store, sales, -t) head(df1)            t   store sales 1 2015-12-30 store.a    16 2 2015-12-31 store.a    24 3 2016-01-01 store.a     8 4 2016-01-02 store.a    42 5 2016-01-03 store.a     7 6 2016-01-04 store.a    46  # lapply call not iterate on actual values, indexes, allows # subset data comfortably:  r <- lapply(1:nrow(df2), function(i) {    audit.t = df2[i, "t"]                                     #time of audit    audit.s = df1[, "store"] == df2[i, "store"]               #store audited    df = df1[audit.s, ]                             #data audited store    df[, "audited"] = audit.t              #add column audit date     week_before = difftime(df[, "t"], audit.t - (7*24*3600)) >= 0    week_audit  = difftime(df[, "t"], audit.t) <= 0     df[week_before & week_audit, ] }) 

does give proper subsets?

also, summarise results:

r = do.call("rbind", r) %>%    group_by(audited, store) %>%    summarise(sales = sum(sales))  r       audited   store sales       <time>   <chr> <int> 1 2016-01-04 store.a    97 2 2016-02-01 store.b   156 3 2016-02-01 store.c   226 4 2016-03-01 store.a   115 5 2016-03-01 store.c   187 

Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -