Spark Streaming with MongoDB backend -


i have requirement read csv files pumped telemetry equipment location on cloud , store relevant data mongodb store. using spark streaming read new files (they arrive every minute, more frequent) , using momgodb-spark connector.the problem data not being loaded momgodb. have added dataframe's show() steps in code , being displayed @ console correctly, means streaming application reading , processing files expected. final step of saving mongodb not happening. code looks follows

reqdata.foreachrdd { edata =>     import sqlcontext.implicits._     val loaddata = edata.map(w => energydata(w(0).tostring,w(1).tostring,w(2).tostring)).todf()     loaddata.show()     loaddata.printschema();      mongospark.save(loaddata.write.option("uri","mongodb://127.0.0.1:27017/storedata.energydata").mode("overwrite")) }  ssc.start() 

the loaddata.show() function displaying data fine.

i have checked mongodb logs , found few strange lines like

"2016-09-07t08:12:30.109-0700 network [initandlisten] connection accepted 127.0.0.1:55694 #212 (3 connections open) 2016-09-07t08:12:30.111-0700 command [conn212] cmd: drop storedata.energydata"

now, don't understand why mongo drop collection @ all. highly appreciated

i solved myself changing saving mode append:

mongospark.save(loaddata.write.option("uri","mongodb://127.0.0.1:27017/storedata.energydata").mode("append")) 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -