hadoop - Apace Drill reading gz and snappy performance -
i'm using apache drill 1.8. , test porpoise made .csv 2 parquet files. csv 4gb big, parquet gz codec 120mb , second parquet snappy codec 250gb big.
as spark using snappy default codec, , snappy should performance faster face 1 problem.
this files block size , etc on hadoop:
time when i'm trying query in drill (which have default snappy codec) parquet files on snappy codec around 18seconds. time when i'm trying query in drill parquet files on gz codec same query around 8seconds.
(it's simple query select 5 columns, ordering 1 , limiting on one)
i'm little confused now. isn't snappy more efficient i/o? making mistake somewhere or how works. if explain me super grateful because couln't find useful on net. thank once more!
Comments
Post a Comment