Multi-Process TurboGears deployments
Cliff Wells wrote a blog posting a while back about load balancing a TG 1 app using nginx. This deployment senario has worked out really well for me.
Apache is everywhere, but nginx is much less memo2ry and CPU intensiveparticularly under very high loads.
I think both Mod-WSGI and nginx deployment scenarios are both vitally important to the future of TurboGears (for different reasons) and we ought to have good recipes for both on the TurboGears Docs wiki, but that’s a blog post for another day.
There was a bit of confusing FUD about TurboGears out there on the mailing lists last week. The author of this FUD once again seems to have gotten confused, and was telling people that TurboGears multi-threaded deployment model was inherently limiting as opposed to a single-threaded multi-process server model.
Here’s my basic response: Just because you can have more than one thread per process doesn’t mean you can’t also have more than one process.. And that’s exactly what high-volume turbogears sites do they run multiple instances behind a load-balancing proxy server.
In general threads are hard to to get right, but in the context of a web server with a thread-per-request model, things aren’t actually that hard. And threads are nice because they don’t block the whole process during network or database IO, and they take up a lot less memory than a new process. So, if you have 50 processes with 20 threads each you can handle 1000 concurrent requests with a lot less memory than if you ran a separate process for each of those 1000 requests.
As for how many processes to use on your production server, that all depends on your app, but I have noticed a pattern in several in several different load tests on several different TurboGears apps, written by several different clients. In general about 4 processes per CPU seems to be the sweet spot.
But the nice thing is that the CherryPy web server is fast enough that a lot of sites just don’t need to run multiple processes to handle their traffic.
Hi Mark,
Thanks for your blog post. I am not sure I understand why you would need several processes on each CPU (unless it is a multi core cpu). I would naively expect that the optimal setting is about one process (with multiple threads) per core. Why is this not the case?
Best regards,
Jesper
Jesper,
I don’t know the exact reason why either. And it could be specific to the applications I happened to test, all I can say is that 4 processes per processor worked out best for us.
Perhaps 1 process with 40 threads would have worked just as well, as a recent test showed 50 threads and 1 process per processor working well on a TG2 app that I was testing — and I don’t think we spent much time playing with the cherrypy thread-count in those earlier tests.