r - Count number of occurrences of vector in list -
i have list of vectors of variable length, example:
q <- list(c(1,3,5), c(2,4), c(1,3,5), c(2,5), c(7), c(2,5))
i need count number of occurrences each of vectors in list, example (any other suitable datastructure acceptable):
list(list(c(1,3,5), 2), list(c(2,4), 1), list(c(2,5), 2), list(c(7), 1))
is there efficient way this? actual list has tens of thousands of items quadratic behaviour not feasible.
match
, unique
accept , handle "list"s (?match
warns being slow on "list"s). so, with:
match(q, unique(q)) #[1] 1 2 1 3 4 3
each element mapped single integer. then:
tabulate(match(q, unique(q))) #[1] 2 1 2 1
and find structure present results:
as.data.frame(cbind(vec = unique(q), n = tabulate(match(q, unique(q))))) # vec n #1 1, 3, 5 2 #2 2, 4 1 #3 2, 5 2 #4 7 1
alternatively match(x, unique(x))
approach, map each element single value deparse
ing:
table(sapply(q, deparse)) # # 7 c(1, 3, 5) c(2, 4) c(2, 5) # 1 2 1 2
also, since case unique integers, , assuming in small range, map each element single integer after transforming each element binary representation:
n = max(unlist(q)) pow2 = 2 ^ (0:(n - 1)) sapply(q, function(x) tabulate(x, nbins = n)) # 'binary' form sapply(q, function(x) sum(tabulate(x, nbins = n) * pow2)) #[1] 21 10 21 18 64 18
and tabulate
before.
and compare above alternatives:
f1 = function(x) { ux = unique(x) = match(x, ux) cbind(vec = ux, n = tabulate(i)) } f2 = function(x) { xc = sapply(x, deparse) = match(xc, unique(xc)) cbind(vec = x[!duplicated(i)], n = tabulate(i)) } f3 = function(x) { n = max(unlist(x)) pow2 = 2 ^ (0:(n - 1)) v = sapply(x, function(x) sum(tabulate(x, nbins = n) * pow2)) = match(v, unique(v)) cbind(vec = x[!duplicated(v)], n = tabulate(i)) } q2 = rep_len(q, 1e3) all.equal(f1(q2), f2(q2)) #[1] true all.equal(f2(q2), f3(q2)) #[1] true microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2)) #unit: milliseconds # expr min lq mean median uq max neval cld # f1(q2) 7.980041 8.161524 10.525946 8.291678 8.848133 178.96333 100 b # f2(q2) 24.407143 24.964991 27.311056 25.514834 27.538643 45.25388 100 c # f3(q2) 3.951567 4.127482 4.688778 4.261985 4.518463 10.25980 100
another interesting alternative based on ordering. r > 3.3.0 has grouping
function, built off data.table, which, along ordering, provides attributes further manipulation:
make elements of equal length , "transpose" (probably slow operation in case, though i'm not sure how else feed grouping
):
n = max(lengths(q)) qq = .mapply(c, lapply(q, "[", seq_len(n)), null)
use ordering group similar elements mapped integers:
gr = do.call(grouping, qq) e = attr(gr, "ends") = rep(seq_along(e), c(e[1], diff(e)))[order(gr)] #[1] 1 2 1 3 4 3
then, tabulate before. continue comparisons:
f4 = function(x) { n = max(lengths(x)) x2 = .mapply(c, lapply(x, "[", seq_len(n)), null) gr = do.call(grouping, x2) e = attr(gr, "ends") = rep(seq_along(e), c(e[1], diff(e)))[order(gr)] cbind(vec = x[!duplicated(i)], n = tabulate(i)) } all.equal(f3(q2), f4(q2)) #[1] true microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2), f4(q2)) #unit: milliseconds # expr min lq mean median uq max neval cld # f1(q2) 7.956377 8.048250 8.792181 8.131771 8.270101 21.944331 100 b # f2(q2) 24.228966 24.618728 28.043548 25.031807 26.188219 195.456203 100 c # f3(q2) 3.963746 4.103295 4.801138 4.179508 4.360991 35.105431 100 # f4(q2) 2.874151 2.985512 3.219568 3.066248 3.186657 7.763236 100
in comparison q
's elements of small length accomodate f3
, f3
(because of large exponentiation) , f4
(because of mapply
) suffer, in performance, if "list"s of larger elements used.
Comments
Post a Comment