apache kafka - Scalable invocation of Spark MLlib 1.6 predictive model w/a single data record -
i have predictive model (logistic regression) built in spark 1.6 has been saved disk later reuse new data records. want invoke multiple clients each client passing in single data record. seems using spark job run single records through have way overhead , not scalable (each invocation pass in single set of 18 values). mllib api load saved model requires spark context though looking suggestions of how in scalable way. spark streaming kafka input comes mind (each client request written kafka topic). thoughts on idea or alternative suggestions ?
non-distributed (in practice majority) models o.a.s.mllib
don't require active sparkcontext
single item predictions. if check api docs you'll see logisticregressionmodel
provides predict
method signature vector => double
. means can serialize model using standard java tools, read later , perform prediction on local o.a.s.mllib.vector
object.
spark provides a limited pmml support (not logistic regression) share models other library supports format.
finally non-distributed models not complex. linear models need intercept, coefficients , basic math functions , linear algebra library (if want decent performance).
o.a.s.ml
models harder handle there external tools try address that. can check related discussion on developers list, (deploying ml pipeline model) details.
for distributed models there no workaround. you'll have start full job on distributed dataset 1 way or another.
Comments
Post a Comment