Threads, Processes, Rails, TurboGears, and Scalability

Threads may not be be best way, or the only way, to scale out your code. Multi-process solutions seem more and more attractive to me.

Unfortunately multi-process and the JVM are currently two tastes that don’t taste great together. You can do it, but it’s not the kind of thing you want to do too much. So, the Jruby guys had a problem — Rail’s scalability story is only multi-process (rails core is NOT thread safe), and Java’s not so good that that….

Solution: Running “multiple isolated execution environments” in a single java process.

I think that’s a neat hack. The JRuby team is to be congratulated in making this work. It lets Rails mix multi-process concurrency with multi-threaded concurrency, if only on the JVM. But it’s likely to incur some memory bloat, so it’s probably not as good as it would be if Rails itself were to become threadsafe.

I’m not sure that the Jython folks have done anything like this. And I’m not sure they should. It’s a solution python folks don’t really have. Django used to have some thread-safety issues, but those have been worked out on some level. While the Django people aren’t promising anything about thread safety, it seems that there are enough people using it in a multi-threaded environment to notice if anything’s not working right.

At the same time, TurboGears has been threadsafe, from the beginning, as has Pylons, Zope, and many other python web dev tools. The point is, you have good web-framework options, without resorting to multiple python environments in one JVM.

Why you actually want multi-threaded execution…

In TurboGears we’ve found that the combination of both multi-threaded and multi-process concurrency works significantly better than either one would alone. This allows us to use threads to maximize the throughput of one process up to the point where python’s interpreter lock becomes the bottleneck, and use multi-processing to scale beyond that point, and to provide additional system redundancy.

A multi threaded system is particularly important for people who use Windows, which makes multi-process computing much more memory intensive than it needs to be. As my Grandma always said Windows “can’t fork worth a damn.” ;)

But, given how hard multi-threaded computing can be to get right TurboGears and related projects work hard to keep our threads isolated and not manipulate any shared resources across threads. So, really it’s kinda like shared-memory optimized micro-processes running inside larger OS level processes, and that makes multi-threaded applications a lot more reasonable to wrap your brain around. Once you start down the path of lock managment the non-deterministic character of the system can quickly overwhelm your brain.

As far as i can see, the same would be true for a Ruby web server in Ruby 1.9, where there is both OS level thread support and an interpreter lock.

I’m well aware of the fact that stackless, twisted, and Nginx have proved that there are other (asynchronous) methods that can easily outperform the multi-threaded+multi-process model throughput/concurrency per unit of server hardware. The async model requires thinking about the problem space pretty differently, so it’s not a drop in replacement, but for some problems async is definitely the way to go.

Anyway, hats off to the Jruby team, and here’s hoping that Rails itself becomes threadsafe at some point in the future.

11 Responses to “Threads, Processes, Rails, TurboGears, and Scalability”

  1. Interestingly the .NET Dynamic Language Runtime (and hence IronPython and IronRuby) allows you to instantiate multiple language engines within the same process.

    This is really useful even within individual applications (Resolver One uses the IronPython API to create multiple engines – effectively execution environments – all from pure Python code). It is one advantage that IronPython has over CPython, that this is not only possible – but easy!

  2. Great post, Mark! When I wrote about computational parallelism to help scale web apps written in Python, I assumed my readers would already understand all the concepts you’ve outlined above. When I realized most readers didn’t understand, it seemed too much of a daunting tasks to write this post which you’ve so elegantly written. I wish this post existed when I first published my concurrency post. I’ll be post-publish linking to this =)

  3. Great post! I did some experiments with ASync and Pylons, I would love to see if it was possible to also get TurboGears running using an event based web server like I got Pylons to work.

  4. 4Bill English

    Do you actually have any evidence for your claims about your windows bashing? … forking a process is more expensive in windows than in unix but is the cost of IPC/sync that far away from the linux model (shm/pipes/sockets) ? also … when combined with the overhead of forking an whole interpreter is it really that much higher?

  5. Bill,

    I have moved a heavily multi-process python app from Linux to Window, and we did notice increased latency (due to longer process startup times) and increased memory use. We ended up needing to use a hybrid model, with several multi-threaded processes in order to get reasonable performance out of the windows solution.

    Some of this could have been ameliorated if we had not been so free with forking off a new process to do some relatively quick job. (We often had started several very short processes which lived 1-2 seconds, did some work and shut down.)

    The fact that windows does not have copy-on-write semantics for forking means that each of those processes made a full copy of our 150 meg or so environment. On linux we had at most a couple of meg memory increase for the life of the process.

    I don’t think I’ve tested 2003 server, or anything later than that, so it could be that Microsoft has changed things and current tests would fare better.

    My intention was not to bash Microsoft, but to point out one limitation, and to mention that the TurboGears model works better in that environment than other frameworks that rely only on multi-processing for concurrency.

  6. Nice post. I’m with Lateef up there–wondering if asynchronousness would also help the throughput or not. I know there’s been some work in the Ruby world to get asynchronous DB adapters going [ lists some] which will hopefully help with the runtime speed. Maybe this will help. We can only hope.

  7. Note that rails 2.2 will be threadsafe, so the JRuby guys will be able to share memory across threads more easily and have concurrency across all available cores.

  8. Roger,

    I’m very excited to hear that. Though I’m a bit skeptical that all of the thread-safety issues in Rails will be worked out in time for 2.2. That kind of thing is hard to bolt onto a framework after the fact. You need to have thought through the issues up-front and come up with a sane way to handle threads, or you end up in a nightmare scenario of mutexes, race conditions, and very hard to reproduce bugs.

  1. [...] is so good that they don’t need TurboGears. And TurboGears multi-threaded+multiprocess model works better on windows than many of the other “dynamic language web-frameworks which depend solely on the [...]

  2. [...] to Django, though I’ve never used it. In a web application setting, it’s easy to have too many Apache/Python processes which max out server memory or the CPU. For those tasks, I like Erlang. Erlang has parallelism and [...]

  3. [...] public links >> stackless Mark Ramm: Threads, Processes, Rails, TruboGears, and Scalability Saved by rationalpi on Tue 23-12-2008 What I’ve listened to this week, 29-Mar-2008 Saved by [...]

Comments are currently closed.