scala - Laziness of spark pipline -


i have question laziness of apache spark pipelines. understand spark transformations , actions.

let's take example:

the following snippet transformation returns rdd of filenames in folder.

val filenames = sc.wholetextfiles("pathtodirectory").map( _ match { case (filename, content) =>  filename }) 

if execute action :

filenames.count 

spark executing defined transformations including loading 'filename' rdd , 'content' rdd memory if 'content' not used

is there better way write spark transformations such as: not used code (like 'content' rdd in previous example) not evaluated while executing actions?

thank feedback.


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -