r - Count number of occurrences of vector in list -

i have list of vectors of variable length, example:

q <- list(c(1,3,5), c(2,4), c(1,3,5), c(2,5), c(7), c(2,5))

i need count number of occurrences each of vectors in list, example (any other suitable datastructure acceptable):

list(list(c(1,3,5), 2), list(c(2,4), 1), list(c(2,5), 2), list(c(7), 1))

is there efficient way this? actual list has tens of thousands of items quadratic behaviour not feasible.

match , unique accept , handle "list"s (?match warns being slow on "list"s). so, with:

match(q, unique(q)) #[1] 1 2 1 3 4 3

each element mapped single integer. then:

tabulate(match(q, unique(q))) #[1] 2 1 2 1

and find structure present results:

as.data.frame(cbind(vec = unique(q), n = tabulate(match(q, unique(q))))) #      vec n #1 1, 3, 5 2 #2    2, 4 1 #3    2, 5 2 #4       7 1

alternatively match(x, unique(x)) approach, map each element single value deparseing:

table(sapply(q, deparse)) # #         7 c(1, 3, 5)    c(2, 4)    c(2, 5)  #         1          2          1          2

also, since case unique integers, , assuming in small range, map each element single integer after transforming each element binary representation:

n = max(unlist(q)) pow2 = 2 ^ (0:(n - 1)) sapply(q, function(x) tabulate(x, nbins = n))  # 'binary' form sapply(q, function(x) sum(tabulate(x, nbins = n) * pow2)) #[1] 21 10 21 18 64 18

and tabulate before.

and compare above alternatives:

f1 = function(x)  {     ux = unique(x)     = match(x, ux)     cbind(vec = ux, n = tabulate(i)) }     f2 = function(x) {     xc = sapply(x, deparse)     = match(xc, unique(xc))     cbind(vec = x[!duplicated(i)], n = tabulate(i)) }    f3 = function(x) {     n = max(unlist(x))     pow2 = 2 ^ (0:(n - 1))     v = sapply(x, function(x) sum(tabulate(x, nbins = n) * pow2))     = match(v, unique(v))     cbind(vec = x[!duplicated(v)], n = tabulate(i)) }  q2 = rep_len(q, 1e3)  all.equal(f1(q2), f2(q2)) #[1] true all.equal(f2(q2), f3(q2)) #[1] true  microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2)) #unit: milliseconds #   expr       min        lq      mean    median        uq       max neval cld # f1(q2)  7.980041  8.161524 10.525946  8.291678  8.848133 178.96333   100  b  # f2(q2) 24.407143 24.964991 27.311056 25.514834 27.538643  45.25388   100   c # f3(q2)  3.951567  4.127482  4.688778  4.261985  4.518463  10.25980   100

another interesting alternative based on ordering. r > 3.3.0 has grouping function, built off data.table, which, along ordering, provides attributes further manipulation:

make elements of equal length , "transpose" (probably slow operation in case, though i'm not sure how else feed grouping):

n = max(lengths(q)) qq = .mapply(c, lapply(q, "[", seq_len(n)), null)

use ordering group similar elements mapped integers:

gr = do.call(grouping, qq) e = attr(gr, "ends")  = rep(seq_along(e), c(e[1], diff(e)))[order(gr)] #[1] 1 2 1 3 4 3

then, tabulate before. continue comparisons:

f4 = function(x) {     n = max(lengths(x))     x2 = .mapply(c, lapply(x, "[", seq_len(n)), null)     gr = do.call(grouping, x2)     e = attr(gr, "ends")      = rep(seq_along(e), c(e[1], diff(e)))[order(gr)]     cbind(vec = x[!duplicated(i)], n = tabulate(i)) }  all.equal(f3(q2), f4(q2)) #[1] true  microbenchmark::microbenchmark(f1(q2), f2(q2), f3(q2), f4(q2)) #unit: milliseconds #   expr       min        lq      mean    median        uq        max neval cld # f1(q2)  7.956377  8.048250  8.792181  8.131771  8.270101  21.944331   100  b  # f2(q2) 24.228966 24.618728 28.043548 25.031807 26.188219 195.456203   100   c # f3(q2)  3.963746  4.103295  4.801138  4.179508  4.360991  35.105431   100   # f4(q2)  2.874151  2.985512  3.219568  3.066248  3.186657   7.763236   100

in comparison q's elements of small length accomodate f3, f3 (because of large exponentiation) , f4 (because of mapply) suffer, in performance, if "list"s of larger elements used.

Search This Blog

Today

r - Count number of occurrences of vector in list -

Comments

Post a Comment

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -