apache kafka - Scalable invocation of Spark MLlib 1.6 predictive model w/a single data record -


i have predictive model (logistic regression) built in spark 1.6 has been saved disk later reuse new data records. want invoke multiple clients each client passing in single data record. seems using spark job run single records through have way overhead , not scalable (each invocation pass in single set of 18 values). mllib api load saved model requires spark context though looking suggestions of how in scalable way. spark streaming kafka input comes mind (each client request written kafka topic). thoughts on idea or alternative suggestions ?

non-distributed (in practice majority) models o.a.s.mllib don't require active sparkcontext single item predictions. if check api docs you'll see logisticregressionmodel provides predict method signature vector => double. means can serialize model using standard java tools, read later , perform prediction on local o.a.s.mllib.vector object.

spark provides a limited pmml support (not logistic regression) share models other library supports format.

finally non-distributed models not complex. linear models need intercept, coefficients , basic math functions , linear algebra library (if want decent performance).

o.a.s.ml models harder handle there external tools try address that. can check related discussion on developers list, (deploying ml pipeline model) details.

for distributed models there no workaround. you'll have start full job on distributed dataset 1 way or another.


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -