Archive for the 'Python' Category

The tech of the new SourceForge

Last week I blogged about the new SourceForge.net and one of the first questions I got was when are we going to “lift the covers” and show off our new tech.

There’s definitely more to come in terms of releases and code, but I thought it’d be worthwhile to start with a quick run through of the tech stack and a bit of a description of what we’re doing.

Our first rule for libraries and tools on the new forge, was that we needed to use open source everywhere. Partly this is just because having the freedom to look at the code and modify it where we need fixes, makes it’s the easiest and best way to develop software. Partly it’s because we’re an open source code hosting platform, and we want to use what we promote. But perhaps most importantly, it means that we’re not prevented from sharing our work with others, or from inviting others to work with us in the future.

At the same time we had a company wide decision to standardize on the technology stack that we’d used in the “consume” project last year. So, we’re using:

  • Python,
  • TurboGears,
  • MongoDB,
  • and AMQP (RabbitMQ).

The combination of these means that we have:

  • a huge number of libraries available to us,
  • a web framework that we can turn into a plugin framework for projects and the tools they want,
  • a schema-free database that lets us easily version documents to keep history on wiki pages, tickets, and other “artifacts” within the new forge
  • a scalable system for handling asynchronous tasks, and propagating update notifications

The choice to use Python has been particularly valuable, since there are (literally) dozens of libraries that we were able to use to help us with everything from encrypted cookie sessions, and mongodb drivers, to markdown text processing, and syntax highlighting.

We’re still in the early days and have a lot more to do, but the goal is an open extensible, system that supports open source projects, and ultimately encourages more people do download and use a wider variety of open source applications.

A peek at a new Sourceforge.net

So, I’ve been working on sf.net in various ways for about a year now.
http://sourceforge.net/p/. It’s written in Python using modern open source tools, from RabbitMQ, and MongoDB, to Git and Mercurial. And we are committed to making this the most open forge possible. We’re committed, to open processes, open code, and perhaps most importantly open data.

The first thing we did was create some new pages for downloads. Recently we releases a new service designed just for open source project leaders who want to use sf.net as a directory and downloads service.

But, we’re also aware that one of the most important services we provide is project hosting. For the last several months a small group of us have been trying to bring sourceforge.net’s tools into 2010. And now we’re releasing an early preview of those new developer/community tools:

We have a long way still to go, but every long journey begins with a single step, and today’s step is allowing you to try the new forge, to create new projects at:

https://sourceforge.net/register

Where you can go to get a new project, with our new tracker, wiki, git, svn, and other tools. Projects can have subprojects, and links to other tools hosted off site, along with the many features that sf.net brings (free web hosting, hosted apps, etc).

But, why do all this?

In 1999 SourceForge was cool.

It provided all the tools that an open source project needed to get going, from cvs hosting, to bug tracking, and e-mail list support.

They pioneered free free software project hosting, and helped to transform the software development culture from one which barely new about free software or open source, to one where nearly everybody I know uses open license software. Oh sure, some of them might not know it, but they have it on their phones, in their TVs, their wireless routers — not to mention all the websites they use everyday that run on open source.

But, time passed.

More alternatives came out, more projects (including my own) started self hosting, and the landscape of open source software development changed. SourceForge.net took a long time coming out with support for new tools like svn, and then git.

Still, SourceForge has a special place in my heart. Partly it’s nostalgia, I suppose, but I still think:

  • the core mission is still right
  • and there is still a real need

We (Open Source developers) still need tools like git, mercurial, and svn hosting. We still need bug trackers and mailing lists. And in a meeting of other open source project leaders last fall, nearly every single one of them identified the time wasted integrating and administering these tools as one of their most important frustrations.

Not enough…

But, for many sourceforge.net and other free project hosting services were just not good enough, they weren’t scriptable, the weren’t extensible, their data wasn’t portable, and so they felt like they had to take on that cost.

And I fundamentally believe that open source projects live an die by communication, and that sourceforge.net can do something new by integrating the various kinds of “conversations” that happen around the project. We can integrate mailing lists and forums, we can integrate SCM and ticket trackers, etc.

New and improved

So, a couple of us have been quietly working on something new. The new forge is designed around a few core ideas:

  • that data should be portable (every project gets their own database, which they can take with them if they want),
  • that the open source community ought to be able to extend and enhance the tools they need,
  • that integrating and cross linking the various kinds of conversations that open source projects need to have ought to be easier.

So, what we’re announcing today is more of a commitment to getting there on all these things, and a commitment to the “release early, release often” project management strategy.

So, expect us to take your feedback and make things better. Expect us to release lots of small fixes, and expect a few places where things are broken/incomplete because we value feedback more than polish at this point.

Premature optimization

We all know it’s bad. But, programming for performance in reasonable ways is good. So, what’s the difference?

Sometimes we think we know that a piece of code is important so we spend some time optimizing it. And in the end it’s less clear, and less maintainable, and it turns out that our bottlenecks are all elsewhere.

But, sometimes we do know where bottlenecks are going to be, we’ve learned from experience, and we know what needs to be done.

We know that architecture determines performance, and architecture isn’t easily bolted on at the end of the project.

So we have a conundrum. We shouldn’t optimize yet because we don’t know where the bottlenecks will be. We shouldn’t wait to optimize because we can’t easily retrofit a good architecture on a complex system.

Some of the conundrum is only apparent — there’s a difference between architectural problems that need to be set up front, and the kind of low level micro-optimization that obscures more than it helps. But, sometimes these conflicts are real — how do I know if I need a multi-process multi-consumer queue system for PDF generation before we build the system and benchmark it? If you don’t need it, that kind of extra architectural complexity just obscures the bit of code that actually solves the problem.

Solving the problem by going meta

Perhaps the problem really is that we’re dumb and optimize the wrong things at the wrong time. The solution to that problem is to get less dumb. Which means that we ought to spend time optimizing “learning”, both within our project processes, and across projects.

Codifying this learning is what the Patterns of Enterprise Application Architecture book was all about.

And I think it’s great as far as it goes, and if you haven’t read it you should buy it now.

But there are a lot of patterns that I can identify from my last half dozen projects that aren’t covered in PoEAA, so it would be great to see a next generation of books and blog posts that cover the modern architectural trade-offs that you have to make, something that covers some of the paterns of the web.

Scalability via in HTTP, etags, caching, and load balancing (the whole RESTful services argument), networked async processing patterns, etc. Scaling to the public web levels requires a whole different set of architectural principles than scaling to the old “enterprise” levels did, and that knowledge seems very much in flux.

It would be great if it also provided some advice for those of us who’ve moved into what Neil Ford has called the world of the Polyglot Programmer, patterns for coordinating activities across language barriers in a sensible way. That’s part of the nature of modern web systems too.

How do we expand Open Source?

So, one thing which keeps comming up in a bunch of different areas of my life is how we can expand the ethic of Open Source development.

People want TurboGears to do more than it does, they want other open source projects to grow, they want new open source projects in specific areas, and they want Open Source like activity in other professions like nursing or construction.

I definitely don’t have the answers. But I’ve had this conversation with a lot of folks over the last couple of months, and some of them had some great ideas.

So, in the spirit of opening up a larger conversation about these issues, here are a couple of thoughts distilled from all those conversations.

Institutionalizing Open Source Values

It is of course possible to create cultural institutions around which money can be channeled into Open Source development.

And all the legal mechanisms needed to structure those institutions in the right way are available today.

But the trick it seems is to create the institutions in such a way that money is delivered in small enough amounts that individuals remain in control. Money is powerfully persuasive, but one of the keys to the current success of open source is that collective action is always purely voluntary.

But at the same time the money needs to come in large enough amounts to make a difference. People need to be able to support lives and families on the work they do advancing various projects. To the extent that this is reliable income, we can remove competing priorities, and developers will be able to devote themselves more fully to projects that advance the common good.

So, the key to making all of this work is going to be the “bureaucracies” we create to manage the flow of money. They need to be tuned properly to the nature of the work, stable enough to provide a level of personal security, and perhaps above all they need to be financially transparent.

Creating the right kinds of organizational structures will help us channel the right amounts of money to the right people, and creating the wrong kinds will create perverse incentives that pollute the whole system.

Most of what’s been happening so far in this direction are ecosystems of companies built around open source offerings. This has worked pretty well, but it’s clear that there can be conflicts of interest, and the nature of commercial ownership leaves even the best run companies vulnerable to sudden changes (acquisition of small open source companies by huge proprietary competitors is already a fact of life).

But, what seems more interesting to me at this point is the number of foundations that are being are created for popular projects or groups of popular projects, etc.

These institutions will continue to grow, but they have the potential to change the way projects are run, so I expect a lot of fits and starts as we mature.

Open Source for other Professions

With that thought in mind perhaps lawyers, doctors, and other professions already have a form of the Open Source ethic, which has grown up around large institutions, and functions to spread knowledge and advance the state of the art of those groups. These institutions work to create new knowledge, train practitioners, and they seem to work pretty well.

If you haven’t caught on already I think it might be fair to say that this sub-section of these professions is called “academics.” ;)

Of course the university system isn’t perfect, and it’s taken hundreds of years to evolve to it’s current state, but I think it does provide some insight into how we might evolve larger institutional presences around open source, not in the next few years, but in the next few decades.

Python Template languages (Part 1 — Django)

I’ve been thinking a lot about template engines in Python recently. Partly because sourceforge.net’s new python code needed to choose a template language, and there were some questions about why we would choose one over the others.

But beyond that In the past few weeks used Genshi, Mako, Jinja, Django Templates, and Cheetah, and have been looking at, but not yet using out chameleon.genshi.

I figure all this promiscuous template library usage means that I should put my thoughts down somewhere. There are advantages and disadvantages of all these libraries, but I think that the choices are pretty clear once you know your constraints.

I’m not going to commit to covering them all in depth, but I’m going to try to put my thoughts about them down over the next few days.

For today let’s talk about the pros and cons of Django Templates. This is another post that has been developed over the last year or so, where typed stuff up while working on fossfor.us.

Django made making fossfor.us easy in lots of ways. Want threaded comments? Add the existing app in a couple hours — Done! Want OpenID? Again add an app — Done!

But it also had frustrations, and one of the biggest for me was the template language.

Continue reading ‘Python Template languages (Part 1 — Django)’