Spark Streaming with MongoDB backend -
i have requirement read csv files pumped telemetry equipment location on cloud , store relevant data mongodb store. using spark streaming read new files (they arrive every minute, more frequent) , using momgodb-spark connector.the problem data not being loaded momgodb. have added dataframe's show() steps in code , being displayed @ console correctly, means streaming application reading , processing files expected. final step of saving mongodb not happening. code looks follows
reqdata.foreachrdd { edata => import sqlcontext.implicits._ val loaddata = edata.map(w => energydata(w(0).tostring,w(1).tostring,w(2).tostring)).todf() loaddata.show() loaddata.printschema(); mongospark.save(loaddata.write.option("uri","mongodb://127.0.0.1:27017/storedata.energydata").mode("overwrite")) } ssc.start()
the loaddata.show()
function displaying data fine.
i have checked mongodb logs , found few strange lines like
"2016-09-07t08:12:30.109-0700 network [initandlisten] connection accepted 127.0.0.1:55694 #212 (3 connections open) 2016-09-07t08:12:30.111-0700 command [conn212] cmd: drop storedata.energydata"
now, don't understand why mongo drop collection @ all. highly appreciated
i solved myself changing saving mode append
:
mongospark.save(loaddata.write.option("uri","mongodb://127.0.0.1:27017/storedata.energydata").mode("append"))
Comments
Post a Comment