Read JSON files from Spark streaming into H2O -

i've got cluster on aws i've installed h2o, sparkling water , h2o flow machine learning purposes on lots of data.

now, these files come in json format streaming job. let's placed in s3 in folder called streamed-data.

from spark, using sparkcontext, read them in 1 go create rdd (this python, not important):

sc = sparkcontext() sc.read.json('path/streamed-data')

this reads them all, creates me rdd , handy.

now, i'd leverage capabilities of h2o, hence i've installed on cluster, along other mentioned software.

looking h2o flow, problem lack of json parser, i'm wondering if import them h2o in first place, or if there's go round problem.

when running sparkling water can convert rdd/df/ds h2o frames quite easily. (scala, python similar) should work:

val datadf = sc.read.json('path/streamed-data') val h2ocontext = h2ocontext.getorcreate(sc) import h2ocontext.implicits._ val h2oframe = h2ocontext.ash2oframe(datadf, "my-frame-name")

from on can use frame code level and/or flowui.

you can find more examples here for python , here for scala.

Search This Blog

Today

Read JSON files from Spark streaming into H2O -

Comments

Post a Comment

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -