Number of Partitions of Spark Dataframe -

can explain number of partitions created spark dataframe.

i know rdd, while creating can mention number of partitions below.

val rdd1 = sc.textfile("path" , 6)

but spark dataframe while creating looks not have option specify number of partitions rdd.

only possibility think is, after creating dataframe can use repartition api.

df.repartition(4)

so can please let me know if can specify number of partitions while creating dataframe.

you cannot, or @ least not in general case not different compared rdd. example textfile example code you've provides sets limit on minimum number of partitions.

in general:

datasets generated locally using methods range or todf on local collection use spark.default.parallelism.
datasets created rdd inherit number of partitions parent.
datsets created using data source api:
- in spark 1.x typically depends on hadoop configuration (min / max split size).
- in spark 2.x there spark sql specific configuration in use.
some data sources may provide additional options give more control on partitioning. example jdbc source allows set partitioning column, values range , desired number of partitions.

Search This Blog

Today

Number of Partitions of Spark Dataframe -

Comments

Post a Comment

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -