r script to reshape and count columns within dataset -


hi have dataset of freshwater fish in range of sites, repeat monthly visits on many years. each row has species found, total , status (i.e. positive or negative test result).

sample_id   site    coll_date   species tot inf_status 382870  site 1  27/10/2007  species b   1   positive 382872  site 2  27/10/2007  species d   1   positive 487405  site 3  28/10/2007  species   1   positive 487405  site 3  28/10/2007  species   1   positive 382899  site 4  03/11/2007  species   1   positive 382900  site 5  03/11/2007  species   1   positive 382901  site 5  03/11/2007  species   1   positive 382902  site 6  03/11/2007  species   1   positive 382903  site 7  09/12/2007  species b   1   positive 382904  site 8  05/02/2008  species c   9   negative 382905  site 8  05/02/2008  species   13  negative 382906  site 9  14/02/2008  species   1   positive 382907  site 9  14/02/2008  species   1   positive 

i need reformat data there 1 row per site visit (i.e. in given site name , date combo) columns total found species , fish status (i.e. speciesa_pos, speciesa_neg, sp_b_pos.. etc).

site    coll_date   sp_a_pos    sp_a_neg    sp_b_pos    sp_b_neg    sp_c_pos      sp_c_neg  sp_d_pos    sp_d_neg site 1  27/10/2007  0   0   1   0   0   0   0   0 site 2  27/10/2007  0   0   0   0   0   0   1   0 site 3  28/10/2007  3   0   0   0   0   0   0   0 site 4  03/11/2007  1   0   0   0   0   0   0   0 site 5  03/11/2007  2   0   0   0   0   0   0   0 site 6  03/11/2007  1   0   0   0   0   0   0   0 site 7  09/12/2007  0   0   1   0   0   0   0   0 site 8  05/02/2008  0   13  0   0   0   9   0   0 site 9  14/02/2008  2   0   0   0   0   0   0   0 

figured use reshape function still need sum within site visits reshape take first row. thoughts use split/apply/aggregate/for loops etc tried various combinations , not getting anywhere. apologies i'm not familiar r. comments appreciated!

with tidyr/dplyr, can make new variable represents combination of species , status, sum total each site/date/species-status, , spread species-status columns, filling sum.

library(tidyr) library(dplyr)  dat %>%     unite(sp_status, species, inf_status) %>%     group_by(site, coll_date, sp_status) %>%     summarise(tot = sum(tot)) %>%     spread(key = sp_status, value = tot, fill = 0) 

the same thing can done in reshape2 dcast, taking advantage of ability of dcast aggregate , reshape wide format simultaneously.

library(reshape2) dcast(dat, site + coll_date ~ species + inf_status, value.var = "tot", fun.aggregate = sum) 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

mapreduce - Resource manager does not transit to active state from standby -

serialization - Convert Any type in scala to Array[Byte] and back -