scala - Laziness of spark pipline -


i have question laziness of apache spark pipelines. understand spark transformations , actions.

let's take example:

the following snippet transformation returns rdd of filenames in folder.

val filenames = sc.wholetextfiles("pathtodirectory").map( _ match { case (filename, content) =>  filename }) 

if execute action :

filenames.count 

spark executing defined transformations including loading 'filename' rdd , 'content' rdd memory if 'content' not used

is there better way write spark transformations such as: not used code (like 'content' rdd in previous example) not evaluated while executing actions?

thank feedback.


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -