google compute engine - How to handle the load spikes and queue the requests? -
is there configuration in kubernetes in can specify minimum number of requests queued before new instance gets spawned?
this context: have got powerful high cpu machines set our use case , every request levies high amount of load on server. works perfect until reach specific number say... 300 requests ramp-up time of 100 milliseconds. , point receiving connection refused error time , server starts handle them once new machine spawned. best way handle load spikes? looking "pending latency" config in app engine. application deployed on google compute engine , orchestrated kubernetes.
you can use readinessprobe
(see container probes) indicate container ready service requests, , use horizontalpodautoscaler
automatically scale apps up/down based on observed cpu utilization. hope helps.
Comments
Post a Comment