scala - Laziness of spark pipline -
i have question laziness of apache spark pipelines. understand spark transformations , actions.
let's take example:
the following snippet transformation returns rdd of filenames in folder.
val filenames = sc.wholetextfiles("pathtodirectory").map( _ match { case (filename, content) => filename })
if execute action :
filenames.count
spark executing defined transformations including loading 'filename' rdd , 'content' rdd memory if 'content' not used
is there better way write spark transformations such as: not used code (like 'content' rdd in previous example) not evaluated while executing actions?
thank feedback.
Comments
Post a Comment