Archive Page 2

The tech of the new SourceForge

Last week I blogged about the new SourceForge.net and one of the first questions I got was when are we going to “lift the covers” and show off our new tech.

There’s definitely more to come in terms of releases and code, but I thought it’d be worthwhile to start with a quick run through of the tech stack and a bit of a description of what we’re doing.

Our first rule for libraries and tools on the new forge, was that we needed to use open source everywhere. Partly this is just because having the freedom to look at the code and modify it where we need fixes, makes it’s the easiest and best way to develop software. Partly it’s because we’re an open source code hosting platform, and we want to use what we promote. But perhaps most importantly, it means that we’re not prevented from sharing our work with others, or from inviting others to work with us in the future.

At the same time we had a company wide decision to standardize on the technology stack that we’d used in the “consume” project last year. So, we’re using:

  • Python,
  • TurboGears,
  • MongoDB,
  • and AMQP (RabbitMQ).

The combination of these means that we have:

  • a huge number of libraries available to us,
  • a web framework that we can turn into a plugin framework for projects and the tools they want,
  • a schema-free database that lets us easily version documents to keep history on wiki pages, tickets, and other “artifacts” within the new forge
  • a scalable system for handling asynchronous tasks, and propagating update notifications

The choice to use Python has been particularly valuable, since there are (literally) dozens of libraries that we were able to use to help us with everything from encrypted cookie sessions, and mongodb drivers, to markdown text processing, and syntax highlighting.

We’re still in the early days and have a lot more to do, but the goal is an open extensible, system that supports open source projects, and ultimately encourages more people do download and use a wider variety of open source applications.

A peek at a new Sourceforge.net

So, I’ve been working on sf.net in various ways for about a year now.
http://sourceforge.net/p/. It’s written in Python using modern open source tools, from RabbitMQ, and MongoDB, to Git and Mercurial. And we are committed to making this the most open forge possible. We’re committed, to open processes, open code, and perhaps most importantly open data.

The first thing we did was create some new pages for downloads. Recently we releases a new service designed just for open source project leaders who want to use sf.net as a directory and downloads service.

But, we’re also aware that one of the most important services we provide is project hosting. For the last several months a small group of us have been trying to bring sourceforge.net’s tools into 2010. And now we’re releasing an early preview of those new developer/community tools:

We have a long way still to go, but every long journey begins with a single step, and today’s step is allowing you to try the new forge, to create new projects at:

https://sourceforge.net/register

Where you can go to get a new project, with our new tracker, wiki, git, svn, and other tools. Projects can have subprojects, and links to other tools hosted off site, along with the many features that sf.net brings (free web hosting, hosted apps, etc).

But, why do all this?

In 1999 SourceForge was cool.

It provided all the tools that an open source project needed to get going, from cvs hosting, to bug tracking, and e-mail list support.

They pioneered free free software project hosting, and helped to transform the software development culture from one which barely new about free software or open source, to one where nearly everybody I know uses open license software. Oh sure, some of them might not know it, but they have it on their phones, in their TVs, their wireless routers — not to mention all the websites they use everyday that run on open source.

But, time passed.

More alternatives came out, more projects (including my own) started self hosting, and the landscape of open source software development changed. SourceForge.net took a long time coming out with support for new tools like svn, and then git.

Still, SourceForge has a special place in my heart. Partly it’s nostalgia, I suppose, but I still think:

  • the core mission is still right
  • and there is still a real need

We (Open Source developers) still need tools like git, mercurial, and svn hosting. We still need bug trackers and mailing lists. And in a meeting of other open source project leaders last fall, nearly every single one of them identified the time wasted integrating and administering these tools as one of their most important frustrations.

Not enough…

But, for many sourceforge.net and other free project hosting services were just not good enough, they weren’t scriptable, the weren’t extensible, their data wasn’t portable, and so they felt like they had to take on that cost.

And I fundamentally believe that open source projects live an die by communication, and that sourceforge.net can do something new by integrating the various kinds of “conversations” that happen around the project. We can integrate mailing lists and forums, we can integrate SCM and ticket trackers, etc.

New and improved

So, a couple of us have been quietly working on something new. The new forge is designed around a few core ideas:

  • that data should be portable (every project gets their own database, which they can take with them if they want),
  • that the open source community ought to be able to extend and enhance the tools they need,
  • that integrating and cross linking the various kinds of conversations that open source projects need to have ought to be easier.

So, what we’re announcing today is more of a commitment to getting there on all these things, and a commitment to the “release early, release often” project management strategy.

So, expect us to take your feedback and make things better. Expect us to release lots of small fixes, and expect a few places where things are broken/incomplete because we value feedback more than polish at this point.

People VS Process?

Lean Manufacturing people go around saying “it’s always a process problem.”

Meanwhile Gerry Weinberg, who wrote several books that I love, and gives lots of great advice, including the some of the best advice I’ve ever read about how to give advice, says “every problem is a people problem.”

So, which is it?

Are bad things that happen the result of bad processes, are they the result of things people do?

I’ve been party to a bit of discussion about this in the last month or two, and in the end it’s all pretty silly.

Processes are created by people, implemented by people, and are designed to accomplish the goals of people.

People run processes!

So, whenever something is broken, it’s people who will need to find the problem and fix it.

People can and do think of ways to improve processes everyday, but I’ll eat my shoe if you find a process that thinks of a way to improve people.

But there’s still a HUGE problem.

Modern companies seem to have a persistent failing — they look for people to blame when something goes wrong — and ignore the context in which those problems happened.

When something goes wrong, fire some people, and replace them with new people who make the same mistakes all over again.

Sometimes you “get lucky”.

The company might get lucky and find a person who’s able to raise awareness, reveal the larger contextual problems, and succeeded in spite of the fact that everything’s stacked against her.

More often than not though, the poor new guy doesn’t see the systematic pressures that caused everything to fall apart, at least not until it’s too late.

Sometimes replacing what’s broken isn’t enough.

Sometimes it’s the equivalent of a mechanic replacing your car’s engine several times in a row, because it keeps burning up — without ever checking to make sure oil is flowing normally, and the cooling system is working.

The easy way out.

It’s often easier to blame people because they don’t “control” them they way they do the context. This blame game is as old as the hills, but definitely not as pretty.

Help people fix processes

The solution is to ask people to look for the systematic pressures, give them the tools to find them, and to empower them to change the way work gets done.

In the end, people will improve the processes, if they believe they are allowed.

Sometimes a design isn’t working because you think you can’t change the one element that needs to be changed.

Ryan (via svn)

The same thing is true when you are designing the processes by which work gets done.

Premature optimization

We all know it’s bad. But, programming for performance in reasonable ways is good. So, what’s the difference?

Sometimes we think we know that a piece of code is important so we spend some time optimizing it. And in the end it’s less clear, and less maintainable, and it turns out that our bottlenecks are all elsewhere.

But, sometimes we do know where bottlenecks are going to be, we’ve learned from experience, and we know what needs to be done.

We know that architecture determines performance, and architecture isn’t easily bolted on at the end of the project.

So we have a conundrum. We shouldn’t optimize yet because we don’t know where the bottlenecks will be. We shouldn’t wait to optimize because we can’t easily retrofit a good architecture on a complex system.

Some of the conundrum is only apparent — there’s a difference between architectural problems that need to be set up front, and the kind of low level micro-optimization that obscures more than it helps. But, sometimes these conflicts are real — how do I know if I need a multi-process multi-consumer queue system for PDF generation before we build the system and benchmark it? If you don’t need it, that kind of extra architectural complexity just obscures the bit of code that actually solves the problem.

Solving the problem by going meta

Perhaps the problem really is that we’re dumb and optimize the wrong things at the wrong time. The solution to that problem is to get less dumb. Which means that we ought to spend time optimizing “learning”, both within our project processes, and across projects.

Codifying this learning is what the Patterns of Enterprise Application Architecture book was all about.

And I think it’s great as far as it goes, and if you haven’t read it you should buy it now.

But there are a lot of patterns that I can identify from my last half dozen projects that aren’t covered in PoEAA, so it would be great to see a next generation of books and blog posts that cover the modern architectural trade-offs that you have to make, something that covers some of the paterns of the web.

Scalability via in HTTP, etags, caching, and load balancing (the whole RESTful services argument), networked async processing patterns, etc. Scaling to the public web levels requires a whole different set of architectural principles than scaling to the old “enterprise” levels did, and that knowledge seems very much in flux.

It would be great if it also provided some advice for those of us who’ve moved into what Neil Ford has called the world of the Polyglot Programmer, patterns for coordinating activities across language barriers in a sensible way. That’s part of the nature of modern web systems too.

How do we expand Open Source?

So, one thing which keeps comming up in a bunch of different areas of my life is how we can expand the ethic of Open Source development.

People want TurboGears to do more than it does, they want other open source projects to grow, they want new open source projects in specific areas, and they want Open Source like activity in other professions like nursing or construction.

I definitely don’t have the answers. But I’ve had this conversation with a lot of folks over the last couple of months, and some of them had some great ideas.

So, in the spirit of opening up a larger conversation about these issues, here are a couple of thoughts distilled from all those conversations.

Institutionalizing Open Source Values

It is of course possible to create cultural institutions around which money can be channeled into Open Source development.

And all the legal mechanisms needed to structure those institutions in the right way are available today.

But the trick it seems is to create the institutions in such a way that money is delivered in small enough amounts that individuals remain in control. Money is powerfully persuasive, but one of the keys to the current success of open source is that collective action is always purely voluntary.

But at the same time the money needs to come in large enough amounts to make a difference. People need to be able to support lives and families on the work they do advancing various projects. To the extent that this is reliable income, we can remove competing priorities, and developers will be able to devote themselves more fully to projects that advance the common good.

So, the key to making all of this work is going to be the “bureaucracies” we create to manage the flow of money. They need to be tuned properly to the nature of the work, stable enough to provide a level of personal security, and perhaps above all they need to be financially transparent.

Creating the right kinds of organizational structures will help us channel the right amounts of money to the right people, and creating the wrong kinds will create perverse incentives that pollute the whole system.

Most of what’s been happening so far in this direction are ecosystems of companies built around open source offerings. This has worked pretty well, but it’s clear that there can be conflicts of interest, and the nature of commercial ownership leaves even the best run companies vulnerable to sudden changes (acquisition of small open source companies by huge proprietary competitors is already a fact of life).

But, what seems more interesting to me at this point is the number of foundations that are being are created for popular projects or groups of popular projects, etc.

These institutions will continue to grow, but they have the potential to change the way projects are run, so I expect a lot of fits and starts as we mature.

Open Source for other Professions

With that thought in mind perhaps lawyers, doctors, and other professions already have a form of the Open Source ethic, which has grown up around large institutions, and functions to spread knowledge and advance the state of the art of those groups. These institutions work to create new knowledge, train practitioners, and they seem to work pretty well.

If you haven’t caught on already I think it might be fair to say that this sub-section of these professions is called “academics.” ;)

Of course the university system isn’t perfect, and it’s taken hundreds of years to evolve to it’s current state, but I think it does provide some insight into how we might evolve larger institutional presences around open source, not in the next few years, but in the next few decades.