r script to reshape and count columns within dataset -
hi have dataset of freshwater fish in range of sites, repeat monthly visits on many years. each row has species found, total , status (i.e. positive or negative test result).
sample_id site coll_date species tot inf_status 382870 site 1 27/10/2007 species b 1 positive 382872 site 2 27/10/2007 species d 1 positive 487405 site 3 28/10/2007 species 1 positive 487405 site 3 28/10/2007 species 1 positive 382899 site 4 03/11/2007 species 1 positive 382900 site 5 03/11/2007 species 1 positive 382901 site 5 03/11/2007 species 1 positive 382902 site 6 03/11/2007 species 1 positive 382903 site 7 09/12/2007 species b 1 positive 382904 site 8 05/02/2008 species c 9 negative 382905 site 8 05/02/2008 species 13 negative 382906 site 9 14/02/2008 species 1 positive 382907 site 9 14/02/2008 species 1 positive
i need reformat data there 1 row per site visit (i.e. in given site name , date combo) columns total found species , fish status (i.e. speciesa_pos, speciesa_neg, sp_b_pos.. etc).
site coll_date sp_a_pos sp_a_neg sp_b_pos sp_b_neg sp_c_pos sp_c_neg sp_d_pos sp_d_neg site 1 27/10/2007 0 0 1 0 0 0 0 0 site 2 27/10/2007 0 0 0 0 0 0 1 0 site 3 28/10/2007 3 0 0 0 0 0 0 0 site 4 03/11/2007 1 0 0 0 0 0 0 0 site 5 03/11/2007 2 0 0 0 0 0 0 0 site 6 03/11/2007 1 0 0 0 0 0 0 0 site 7 09/12/2007 0 0 1 0 0 0 0 0 site 8 05/02/2008 0 13 0 0 0 9 0 0 site 9 14/02/2008 2 0 0 0 0 0 0 0
figured use reshape function still need sum within site visits reshape take first row. thoughts use split/apply/aggregate/for loops etc tried various combinations , not getting anywhere. apologies i'm not familiar r. comments appreciated!
with tidyr/dplyr, can make new variable represents combination of species , status, sum total each site/date/species-status, , spread
species-status columns, filling sum.
library(tidyr) library(dplyr) dat %>% unite(sp_status, species, inf_status) %>% group_by(site, coll_date, sp_status) %>% summarise(tot = sum(tot)) %>% spread(key = sp_status, value = tot, fill = 0)
the same thing can done in reshape2 dcast
, taking advantage of ability of dcast
aggregate , reshape wide format simultaneously.
library(reshape2) dcast(dat, site + coll_date ~ species + inf_status, value.var = "tot", fun.aggregate = sum)
Comments
Post a Comment