hadoop - aggregate ordered rows in hive table -
i have table in hive 4 columns this:
row_id| user_id|product_id| duration 1 1 product1 3 2 1 product1 1 3 1 product2 6 4 1 product3 2 5 1 product1 4 6 1 product4 3 7 1 product4 5 8 1 product4 7 9 2 product4 3 10 2 product4 6 i want aggregate rows of same product each user, sum duration , count clicks if consequent in order
row_id| user_id|product_id |duration_per_product |clicks_per_product 1 1 product1 4 2 2 1 product2 6 1 3 1 product3 2 1 4 1 product1 4 1 5 1 product4 15 3 6 2 product4 9 2 any ideas how in hive 1.1.0?
group doesn't work because don't want group products if consequent , have tried case,lag , lead didn't work!
thanks!
first off, want in loop, hive not suitable these kind of problems.
that being said, here approach should work:
suppose our dataset
1 1 product1 3 2 1 product1 1 3 1 product2 6 4 1 product1 4 identify starter rows: 1,3,4 can done doing left join on id=id+1 , seeing whether user , product match.
join onto these starters user , product:
1 1 1 2 1 4 3 3 4 1 4 2 4 4 filter out things in wrong order (starter after row), remaining are:
1 1 1 2 1 4 3 3 4 4 group find maximum valid starter each row, remaining be:
1 1 1 2 3 3 4 4 now join reattach relevant dimensions
1 1 3 1 2 1 3 3 6 4 4 4 now can results grouping on starter id.
1 4 3 6 4 4 of course can choose use join attach name of product.
1 product1 4 3 product2 6 4 product1 4 and all!
Comments
Post a Comment