hadoop - Apace Drill reading gz and snappy performance -


i'm using apache drill 1.8. , test porpoise made .csv 2 parquet files. csv 4gb big, parquet gz codec 120mb , second parquet snappy codec 250gb big.

as spark using snappy default codec, , snappy should performance faster face 1 problem.

this files block size , etc on hadoop:

  1. with snappy codec: enter image description here

  2. with gz codec: enter image description here

time when i'm trying query in drill (which have default snappy codec) parquet files on snappy codec around 18seconds. time when i'm trying query in drill parquet files on gz codec same query around 8seconds.

(it's simple query select 5 columns, ordering 1 , limiting on one)

i'm little confused now. isn't snappy more efficient i/o? making mistake somewhere or how works. if explain me super grateful because couln't find useful on net. thank once more!


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -