scala - how to divide nested array using RDD in spark -


i trying divide nested array using rdd in spark. example, there textfile contains 4 sentences, this:

"he good", "she good", "i good", "we good"

i used val arr = sc.textfile("filename").map(_.split(" ")) command , got this:

array[array[string]] = array(array(he, is, good),                               array(she, is, good),                               ... ) 

i want use each array elements (i.e. array(he, is, good)) don't know how divide this. how can divide this?

it unclear mean 'divided', typically in functional programming languages, when want each element of collection (or 'iterable'), can use map function. map converts each element based on function passed it. example, in worksheet can this:

val sentences = array(array("he", "is", "good"),                       array("she", "is", "very", "good"))  def yodaize(sentence: array[string]): array[string] =  {   val reversed = sentence.reverse   println("yoda says, '%s'".format(reversed.mkstring(" ")))   reversed }  yodaize(array("i", "am", "small"))  val yodasentences = sentences.map(yodaize) 

the function yodaize 2 things: reverses sentence passed , as side effect prints out reversed sentence. worksheet output of above is:

sentences: array[array[string]] = [[ljava.lang.string;@faffecf  yodaize: yodaize[](val sentence: array[string]) => array[string]  yoda says, 'small i' res0: array[string] = [ljava.lang.string;@4bf1c779  yoda says, 'good he' yoda says, 'good she' yodasentences: array[array[string]] = [[ljava.lang.string;@40a19a85 

it's hard see directly here, yodasentences original array each sub-array reversed:

array(array("good", "is", "he"),       array("good", "very", "is", "she")) 

with map can pass in function. can directly convert element or have side effect. in manner functional languages can deal each element without ever needing 'divide' them. note other functions flatmap, foldleft , filter can used perform other sorts of permutations on collection.


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -