hadoop - aggregate ordered rows in hive table -
i have table in hive 4 columns this:
row_id| user_id|product_id| duration 1 1 product1 3 2 1 product1 1 3 1 product2 6 4 1 product3 2 5 1 product1 4 6 1 product4 3 7 1 product4 5 8 1 product4 7 9 2 product4 3 10 2 product4 6
i want aggregate rows of same product each user, sum duration , count clicks if consequent in order
row_id| user_id|product_id |duration_per_product |clicks_per_product 1 1 product1 4 2 2 1 product2 6 1 3 1 product3 2 1 4 1 product1 4 1 5 1 product4 15 3 6 2 product4 9 2
any ideas how in hive 1.1.0?
group doesn't work because don't want group products if consequent , have tried case,lag , lead didn't work!
thanks!
first off, want in loop, hive not suitable these kind of problems.
that being said, here approach should work:
suppose our dataset
1 1 product1 3 2 1 product1 1 3 1 product2 6 4 1 product1 4
identify starter rows: 1,3,4
can done doing left join on id=id+1 , seeing whether user , product match.
join onto these starters user , product:
1 1 1 2 1 4 3 3 4 1 4 2 4 4
filter out things in wrong order (starter after row), remaining are:
1 1 1 2 1 4 3 3 4 4
group find maximum valid starter each row, remaining be:
1 1 1 2 3 3 4 4
now join reattach relevant dimensions
1 1 3 1 2 1 3 3 6 4 4 4
now can results grouping on starter id.
1 4 3 6 4 4
of course can choose use join attach name of product.
1 product1 4 3 product2 6 4 product1 4
and all!
Comments
Post a Comment