Number of Partitions of Spark Dataframe -
can explain number of partitions created spark dataframe.
i know rdd, while creating can mention number of partitions below.
val rdd1 = sc.textfile("path" , 6)
but spark dataframe while creating looks not have option specify number of partitions rdd.
only possibility think is, after creating dataframe can use repartition api.
df.repartition(4)
so can please let me know if can specify number of partitions while creating dataframe.
you cannot, or @ least not in general case not different compared rdd. example textfile
example code you've provides sets limit on minimum number of partitions.
in general:
datasets
generated locally using methodsrange
ortodf
on local collection usespark.default.parallelism
.datasets
createdrdd
inherit number of partitions parent.datsets
created using data source api:- in spark 1.x typically depends on hadoop configuration (min / max split size).
- in spark 2.x there spark sql specific configuration in use.
- some data sources may provide additional options give more control on partitioning. example jdbc source allows set partitioning column, values range , desired number of partitions.
Comments
Post a Comment