hadoop - Apace Drill reading gz and snappy performance -

i'm using apache drill 1.8. , test porpoise made .csv 2 parquet files. csv 4gb big, parquet gz codec 120mb , second parquet snappy codec 250gb big.

as spark using snappy default codec, , snappy should performance faster face 1 problem.

this files block size , etc on hadoop:

with snappy codec:
with gz codec:

time when i'm trying query in drill (which have default snappy codec) parquet files on snappy codec around 18seconds. time when i'm trying query in drill parquet files on gz codec same query around 8seconds.

(it's simple query select 5 columns, ordering 1 , limiting on one)

i'm little confused now. isn't snappy more efficient i/o? making mistake somewhere or how works. if explain me super grateful because couln't find useful on net. thank once more!

Search This Blog

Today

hadoop - Apace Drill reading gz and snappy performance -

Comments

Post a Comment

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -