scala - Laziness of spark pipline -


i have question laziness of apache spark pipelines. understand spark transformations , actions.

let's take example:

the following snippet transformation returns rdd of filenames in folder.

val filenames = sc.wholetextfiles("pathtodirectory").map( _ match { case (filename, content) =>  filename }) 

if execute action :

filenames.count 

spark executing defined transformations including loading 'filename' rdd , 'content' rdd memory if 'content' not used

is there better way write spark transformations such as: not used code (like 'content' rdd in previous example) not evaluated while executing actions?

thank feedback.


Comments

Popular posts from this blog

many to many - Django Rest Framework ManyToMany filter multiple values -

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

Java Entity Manager - JSON reader was expecting a value but found 'db' -