google compute engine - How to handle the load spikes and queue the requests? -


is there configuration in kubernetes in can specify minimum number of requests queued before new instance gets spawned?

this context: have got powerful high cpu machines set our use case , every request levies high amount of load on server. works perfect until reach specific number say... 300 requests ramp-up time of 100 milliseconds. , point receiving connection refused error time , server starts handle them once new machine spawned. best way handle load spikes? looking "pending latency" config in app engine. application deployed on google compute engine , orchestrated kubernetes.

you can use readinessprobe (see container probes) indicate container ready service requests, , use horizontalpodautoscaler automatically scale apps up/down based on observed cpu utilization. hope helps.


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -