Archive for the 'Ruby' Category

Premature optimization

We all know it’s bad. But, programming for performance in reasonable ways is good. So, what’s the difference?

Sometimes we think we know that a piece of code is important so we spend some time optimizing it. And in the end it’s less clear, and less maintainable, and it turns out that our bottlenecks are all elsewhere.

But, sometimes we do know where bottlenecks are going to be, we’ve learned from experience, and we know what needs to be done.

We know that architecture determines performance, and architecture isn’t easily bolted on at the end of the project.

So we have a conundrum. We shouldn’t optimize yet because we don’t know where the bottlenecks will be. We shouldn’t wait to optimize because we can’t easily retrofit a good architecture on a complex system.

Some of the conundrum is only apparent — there’s a difference between architectural problems that need to be set up front, and the kind of low level micro-optimization that obscures more than it helps. But, sometimes these conflicts are real — how do I know if I need a multi-process multi-consumer queue system for PDF generation before we build the system and benchmark it? If you don’t need it, that kind of extra architectural complexity just obscures the bit of code that actually solves the problem.

Solving the problem by going meta

Perhaps the problem really is that we’re dumb and optimize the wrong things at the wrong time. The solution to that problem is to get less dumb. Which means that we ought to spend time optimizing “learning”, both within our project processes, and across projects.

Codifying this learning is what the Patterns of Enterprise Application Architecture book was all about.

And I think it’s great as far as it goes, and if you haven’t read it you should buy it now.

But there are a lot of patterns that I can identify from my last half dozen projects that aren’t covered in PoEAA, so it would be great to see a next generation of books and blog posts that cover the modern architectural trade-offs that you have to make, something that covers some of the paterns of the web.

Scalability via in HTTP, etags, caching, and load balancing (the whole RESTful services argument), networked async processing patterns, etc. Scaling to the public web levels requires a whole different set of architectural principles than scaling to the old “enterprise” levels did, and that knowledge seems very much in flux.

It would be great if it also provided some advice for those of us who’ve moved into what Neil Ford has called the world of the Polyglot Programmer, patterns for coordinating activities across language barriers in a sensible way. That’s part of the nature of modern web systems too.

Working at SourceForge

I’ve been at SourceForge for a couple of months now, it’s been great, the work is surprisingly fun and rewarding. There’s a local office, and so I actually get to g and hang out with smart people whenever I want. I can still work from home, but having someplace to go in to has been a refreshing change.

I haven’t gotten to know many people outside the engineering team in Dexter, but they are great guys.

There’s lots of good stuff happening here, support for bazar, mercurial, git, trac, and other options on SourceForge itself, improved feeds, and other API’s for getting at SF data, etc. But I’m only peripherally aware of all that at the moment because I was hired to work on “totally new stuff” which is written in Python.

What I’m working on

Our first new project is a site called FossFor.Us, and it was the vision for this site, and the team that is working on this and other new stuff, that sold me on the coming to work for Sourceforge. It’s written in Django, and it’s been my first really large Django project, and while the experience has been pretty positive, there have been a number of things that have renewed my commitment to TurboGears development — but that’s a blog post for another day.

The backstory to the FossFor.Us site is that open source project hosting providers (Sourceforge and it’s recent competitors) have traditionally been pulled in two very different directions by two very different sets of users:

  • developers of open source software
  • and people who just want to.

And that tension has held us back in the past, we have to serve everybody with the same portal, and it ends up not serving either community as well as it should. But since developers are the most vocal users, it’s been the second class of user that’s been most neglected.


These people are just looking to get things done, and don’t care about the “project” part of open source software, they are, at least at first, only interested in the “product.” In many ways the Free and Open Source Software community has not served these people well. is in it’s first incarnation an attempt to create a window on the free software world, that’s just about finding and using software. But in a larger sense it’s an attempt to help us as a community to connect with potential users better.

I think connecting FOSS geeks and users is actually important

It’s important because people aren’t aware that there are free options, and are paying for software they can’t afford. There’s a prototypical user (based on a real person) that we talk about a lot, who’s a single mom, has an old laptop, and struggles week to week to pay her bills, but who bought Photoshop, because “that’s how you edit photos.” Her family could have used that money to more productive ends, but because she needed to edit photos, and didn’t know about the free alternatives all those opportunities are just lost.

Of course the same thing is true of small business owners, who could use free software to reduce their “overhead” costs, and actually spend money on creating things people love. Free software has the potential to lubricate the wheels of the economy, encourage entrepreneurial activity, and enrich people’s lives.

All of this is to say I think is a way to serve the world by making the product of all the open source developer’s labor more easily available and more accessible to real people. And when my mom actually used it to find some software a couple weeks ago, I knew we’d done something right.

Just say no to “software engineering”

I was reading Jacob Kaplan-Moss’s blog article on “syntactic sugar” and I realized that there’s something sitting just below the surface of what he’s saying, something important, something counter to the standard “software development is an engineering discipline” view of our work.

Jacob rails against those who say the differences between modern computer languages amount to nothing more than “syntactic sugar.” Ultimately he argues that: Syntactic sugar matters.

Sure it makes no “technical” difference but it does make a huge difference — because it changes the way we think about writing software.

I’ll loop back to that in a second, but first a quick detour through another recent blog post:

Eventually you come to realize that in order to truly succeed, you have to write programs that can be understood by both the computer and your fellow programmers.

Of all the cruel tricks in software engineering, this has to be the cruelest…. Even when you’re writing code explicitly intended for the machine, you’re still writing. For other people. Fallible, flawed, distracted human beings just like you. And that’s the truly difficult part.

Jeff Attwod

The thread that ties both of these together is that they highlight the way that people and by people I mean software developers, tend to forget is that code is always two things:

  • A series of instructions or declarations processed by a computer.
  • A series of instructions or declarations processed by one or more human beings.

Code is a machine construct, but it’s also a social construct. Software engineering is a strange name for our discipline, since the hard work of programming isn’t just getting code that machines can run, but in creating abstractions that allow human beings to learn, understand, and evolve the code over time. We didn’t invent Structured Programming, or Object Oriented Programming because they help the computer understand what we mean — we created them because they provide us with tools to help us as human beings to be able to understand the code we write.

Non-software Engineers aren’t concerned primarily with the practice of communicating complex thoughts and ideas to others. This is a vast oversimplification, but I think it’s fair to say that they are interested in constructing mathematical models of how things things behave, so that they can build stuff that works.

But that’s not what we are, we are creative writers, we invent new ways of thinking about the world, and we try to communicate them to each other every day.

Software Engineers do create incredibly complex systems, and operate under constraints that other writers do not — what we write has two audiences, one human, and one non-human. Writing for the non-human audience alone will result in code that’s incomprehensible to other programmers, but the opposite is also true. Computer programming is hard — and it’s hard for a very specific reason — because it requires thinking like other people, and not thinking like people at all.

Fortunately, the problem is simplified a bit by the fact that other programmers have learned at least somewhat to think like silicon and metal, so you can lean a little bit in that direction and still be understood. But still you have a very exacting, very alien audience, and a very exacting, very human one — and programming languages must be designed to balance the needs of both.

So, perhaps we ought to stop calling ourselves software developers, software architects, or software engineers, and start calling ourselves software writers.

REST is a design for the long run

Found this quote in a recent discussion or REST.

REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. Many of the constraints are directly opposed to short-term efficiency.

Threads, Processes, Rails, TurboGears, and Scalability

Threads may not be be best way, or the only way, to scale out your code. Multi-process solutions seem more and more attractive to me.

Unfortunately multi-process and the JVM are currently two tastes that don’t taste great together. You can do it, but it’s not the kind of thing you want to do too much. So, the Jruby guys had a problem — Rail’s scalability story is only multi-process (rails core is NOT thread safe), and Java’s not so good that that….

Solution: Running “multiple isolated execution environments” in a single java process.

I think that’s a neat hack. The JRuby team is to be congratulated in making this work. It lets Rails mix multi-process concurrency with multi-threaded concurrency, if only on the JVM. But it’s likely to incur some memory bloat, so it’s probably not as good as it would be if Rails itself were to become threadsafe.

I’m not sure that the Jython folks have done anything like this. And I’m not sure they should. It’s a solution python folks don’t really have. Django used to have some thread-safety issues, but those have been worked out on some level. While the Django people aren’t promising anything about thread safety, it seems that there are enough people using it in a multi-threaded environment to notice if anything’s not working right.

At the same time, TurboGears has been threadsafe, from the beginning, as has Pylons, Zope, and many other python web dev tools. The point is, you have good web-framework options, without resorting to multiple python environments in one JVM.

Why you actually want multi-threaded execution…

In TurboGears we’ve found that the combination of both multi-threaded and multi-process concurrency works significantly better than either one would alone. This allows us to use threads to maximize the throughput of one process up to the point where python’s interpreter lock becomes the bottleneck, and use multi-processing to scale beyond that point, and to provide additional system redundancy.

A multi threaded system is particularly important for people who use Windows, which makes multi-process computing much more memory intensive than it needs to be. As my Grandma always said Windows “can’t fork worth a damn.” ;)

But, given how hard multi-threaded computing can be to get right TurboGears and related projects work hard to keep our threads isolated and not manipulate any shared resources across threads. So, really it’s kinda like shared-memory optimized micro-processes running inside larger OS level processes, and that makes multi-threaded applications a lot more reasonable to wrap your brain around. Once you start down the path of lock managment the non-deterministic character of the system can quickly overwhelm your brain.

As far as i can see, the same would be true for a Ruby web server in Ruby 1.9, where there is both OS level thread support and an interpreter lock.

I’m well aware of the fact that stackless, twisted, and Nginx have proved that there are other (asynchronous) methods that can easily outperform the multi-threaded+multi-process model throughput/concurrency per unit of server hardware. The async model requires thinking about the problem space pretty differently, so it’s not a drop in replacement, but for some problems async is definitely the way to go.

Anyway, hats off to the Jruby team, and here’s hoping that Rails itself becomes threadsafe at some point in the future.