Archive for September, 2008

PythonPackaging == hot_action

Ever since my friend Adrian posted up a slide at PyCon telling us that in Django Trunk == hot_action, I’ve been getting lots of e-mails from various folks telling me about the hot_action that I could get from various python packages. But today, the hot action isn’t the contents of the packages, so much as the packaging systems.

As a starter there’s a very, very good background post byKevin Teague that provides great overview that provides the then current Python Packaging ecosystem.

But even though that post is very recent it’s not at all complete, because last week spawned two interesting new projects pyinstall from Ian Bicking, and distribute a potential setuptools fork.

The distribute project didn’t last long, as it’s already been declared officially dead.

But I still think it’s interesting, and it’s definitely spawned a lot of good discussion on the distutils-sig.

When I talked to Tarek’s last week his motivation behind starting distribute was always to help move setuptools forward, and his hope was that anything that happened there would be folded back into setuptools proper as time went on.

There were lots of people at PyCon this year who were both vocal about what was wrong with easy_install and motivated to help improve it. Beyond that some of them were people who do packaging all the time, and some of them had some previous experience writing packaging systems in python.

On of the most interesting tidbits from that discussion is this suggestion from Tarek and Guido’s response:

> My own inclination is that a scalable future for distutils means an improved
> sdist format, the end of as an command-line interface, and
> community-maintained platform-specific installation tools that process
> source or binary distributions. Most complaints about distutils (and
> setuptools, for that matter) are focused on installation policy&preference
> issues. Making it possible and practical for a variety of tools to flourish
> around a standardized format (ala WSGI) seems like the way to go.

Given the success of WSGI (which I use every day) this sounds like a
very good plan!

And like Guido I’m pretty enthusiastic about extending and standardizing the format while allowing a variety of tools to grow up around that new standard.

But I’m a bit skeptical about focusing only on sdist, since I think we do need some kind of way to distribute binary packages to Windows and OS X users, and while eggs work pretty well there, I don’t what that stuff to be left behind.

Which brings me back to pyinstall for pure python packages I really like pyinstall, because it has all of the key features that I think that a python package system should have (repeatable installs, single file installs, etc), but it”s focus on only source-level distribution makes penetration in the windows market harder. And I guess the same could be said of OS X where getting a compiler can be require a 1gig download.

Where do we go from here?

It’s fair to say that easy_install is not perfect, but it is much better than nothing, and last week’s news had made me excited about the future of python packaging again. My thanks to Philip, Ian, Tarek, and all the other people who have put in effort to make this part of the python better, and hopes are with you.

If you are interested in getting involved, and helping to make all this stuff better, there’s also going to be a sprint this weekend.

DjangoCon talk (part 3)

The third part of my DjangoCon talk was about how Django can become a more innovative place, or alternatively how it can stifle innovation in the future. This very much builds on parts one and two, which I blogged about over the last few days.

The “big idea” of this talk is that competition, and continual refinement are the two real drivers of design excellence, and both Django and Python are well served by having web frameworks that compete and that are continually undergoing a process of internal refinement and improvement.

Again, this notion is not unique to me, the book Innovation Happens Elsewhere fleshes out the specifics of how these two factors foster good design:

“Competition and collaboration contribute
to innovation and creativity in two distinct ways.
Competition works through diversity and selection;
collaboration works through refinement and improving.”

This means that we are best served by an python web ecosystem that’s both cooperative and competitive. Which is why I (a turbogears guy) am willing to take time out of my schedule to try to make Django a better framework.

And it’s why I’m so stuck on the notion of components, and reuse, because good component architecture helps us to collaborate at the level of the library, not just the level of the framework. If the file system abstraction code, or the Django orm, or whatever other components of django were reusable libraries, they would be candidates for cooperation, and continual refinement.

And if newforms (now forms) were a separate component, they could compete head-to-head with similar tools like ToscaWidgets.

But since they are locked up inside Django, that competition has to happen at the level of the framework. Which is fine, except that you loose the granularity, and specificity that comes from competing at the individual component level. For example, where’s the detailed comparison of ToscaWidgets vs Django’s new forms library? Nobody has written such a thing because nobody (except my friend Max) outside of Django considers the Django forms library to be a reusable component.

And at the same time that competition happens at the level of the framework, cooperation can’t really happen well at all, because Django’s no outside dependency policy precludes it in some important ways. Django users can use non-django components, and Django itself has borrowed ideas and even code from other projects (simplejson, etc). But there is no two-way flow of code, and no active cooperation with outside library developers.

So, now for the specific recommendations, and here since I have a bit more time, I’ll offer a few more specifics than I did in the talk itself:

  1. Make the request object a proxy to the wsgi environ dict
  2. Make django middleware and WSGI middleware more interchangable
  3. Reconsider the no-outside dependency rule

Make the request object a proxy to the wsgi environ dict

In the past there has been some talk about making TurboGears, Pylons, and Django all share the same request/response object API’s. I’m not opposed to this, and the fact is that TG2 and Pylons both share WebOb as a request/response API now. But I think that WebOb has actually shown us that the critical thing is not that the Request/Response object API must always be the same, instead the key thing is that the WSGI Environ dictionary be the canonical representation of the data, so that it can be passed along easily to anything WSGI compliant.

Since WebOb is already pretty close to the Django request/response object API, it’s probably easiest to subclass the webob object and provide a backwards compatible API with the current Django implementation. But I think we’d all be almost as happy if Django re-implemented this, the key is to always have an up-to-date version of the environ that you can grab.

The big advantage of all this is that you can then use WSGI applications easily within a django view. For example Rum is a wsgi app that provides a django admin like interface for SQLAlchemy, and if the environ were up-to-date and available there’s no reason that RUM couldn’t be used inside a Django app.

WSGI Middleware and Django Middleware

There’s a lot of innovation happening in the WSGI middleware space, from web-based interactive debuggers, to memory-leak tracing tools, and web-based profilers, to an advanced transaction management system that handles two-phase commit, cross-database transactions, etc.

At the same time a lot of effort is going into making Django middleware that does the same thing, which isn’t nessisarily a valuable kind of effort — we should be trying to make better stuff than what’s there, not just duplicating existing stuff in a different framework.

And though I have not tested all this WSGI middleware with Django it should “just work.” So, in the short term anyone creating what might be widely reusable middleware should consider doing it as WSGI middleware rather than Django middleware because that will allow it to reach a wider audience.

But since WSGI middleware and Django middleware are essentially doing the same thing (pre and post processing a request) there’s no reason that we couldn’t write an adaptor that lets everybody use everybody elses stuff in a reasonable way, and while this isn’t particularly glamourous work it would significantly improve cooperation between python web develoers on another front.

The no-dependency rule

My understanding is that the no-outside dependency rule is designed to make installation easy, and a realization that easy_install is not always so easy. But this is changing, virtualenv, basketweaver, eggbasket, and other tools have made working around the limitations of easy_install a lot easier. You can now create your own package indexes, make reliable install scripts, etc.

There are also a number of people working in this space to make distributing multi-package installs easier. Ian Bicking’s new pyinstall package has a way to package tarballs with all the source distribution packages you want to ship.
And Alberto Valverde released the EggFreezer, which packages up all the source and binary eggs necessary to do an install into a single python install script.

Ian’s package skips binary eggs entirely, and something similar can be done using Kevin Dangoor’s Paver project. Here the idea would be to use paver to overide the “python install” command, to install not just django but all of it’s dependencies (which are conveniently packaged up in the same tarball as Django.

Roomate wanted for PloneConf in DC

After some last minute wrangling of schedule, and other commitements, I’m going to PloneConf in October in hopes of furthering the python web development ecosystem’s cross-framework collaboration a bit more.

But lodging in DC is more than expected, so if anybody’s interested in sharing a room with me it would help make this less unaffordable for me. I’ve got a nice hotel booked through priceline, and would love to have a good roommate to help share the costs. If interested send me e-mail at my gmail address (mark dot ramm), or leave a comment here.

I’m really looking forward to connecting to some of the Plone folks, and to learning more about how things work in the Plone world of 2008.


Roomate found. Thanks everybody!

DjangoCon (Part 2)

As I mentioned in my last post, I did a talk at DjangoCon a couple weeks ago, which has proved to be a little bit controversial in some circles (though the biggest critics are people who admit they didn’t actually watch it).

To see the part of the talk I’m talking about in this post jump forward to about minute 19 of the talk.

My previous post talks about almost all of the stuff that I consider controversial, and is basically an argument from history that very-large monolithic code-bases can get into trouble, and cause community fragmentation by separating out those who use “monolith x” from those who don’t. This is particularly true when the monolith is not particularly modular internally. This was the case for Zope 2 and Django seems to be following them down that path to some extent.

But today, I’d like to shift focus from the past to the present, and from social lesions to a famous bumper sticker I’m pretty sure most of us have seen:

SHIT happens

But since I don’t want to be too negative, let’s rephrase that:

GOOD shit happens

And I’m actually not interested in how great the django core-dev team’s drinking abilities are, or how wonderful their social lives are.

I’m interested in the good shit that’s relevent to making developing web applications better, so perhaps we could use this bumper sticker:

Innovation Happens

Django’s dev team has done some great stuff, that has shown that there’s lots more that can be done to make web developmet easier for python programmers. But at the same time not everybody uses Django and lots of problems that Django is currently trying to solve have already been solved by other people.

This argument isn’t really mine, it’s an argument made in the book Innovation Happens Elsewhere by Ron Goldman & Richard P. Gabriel. In the book they argue for using open source software so that you can harness the innovation that’s happening elsewere.

But even for Django, which is open source there’s no cornering the market on innovation, so we could say:

Innovation Happens Everywhere

It’s happening in Django, and outside of Django. It’s happening wherever there are smart people who want to make things better.

Here’s a quote form Goldman and Gabrial’s book:

Silly as it sounds, this is the brutal truth:

Regardless of how smart, creative, and innovative
you believe your organization is,
there are more smart, creative, and innovative people
outside your organization than inside.”

Innovation Happens Elsewhere

But, Django has a policy that they do not depend on anything else. This means that they have to reproduce all the innovation that happens elsewhere in the python web community in order to keep up. Perhaps this is OK, I mean Django is big, and they are innovating themselves, at least that’s the argument that some are making.

I think it’s a bit short-sided though, there’s a lot going on in TurboGears, repoze, SQLAlchemy, Pylons, Zope, etc. And all of those people are starting to work together more and more. They are recognizing that WSGI and reusable libraries makes python web development more flexible than ever before. If anybody thinks web development 10 years from now will look much like it does today has their head in the sand.

In the talk I just mentioned a few limitations in Django that people mentioned in Saturday’s talks. And I took one smart person outside the django community, Mike Bayer, and showed what he’d already done to solve those problems.

  • Beaker: Mighty+Beaker solved the dog-piling problem with caching in django’s cache layer
  • Beaker: encrypted cookie sessions (actually this one was done by Ben Bangert)
  • SQLAlchemy: Batched commits (the unit of work pattern)
  • SQLAlchemy: Multiple database support
  • SQLAlchemy: Horizontal partioning (AKA, sharding)

Of course there are lots of others, who are also doing lots of innovative things that solve problems Django users have right now. And there’s no technical reason Django and the Django developers can’t use that stuff.

What this means for Django

My fundamental argument in this section of my talk is that Django users should start taking advantage of things that the rest of the python development community has been building. Just because Django has a no dependency policy doesn’t mean you should. And it would help the whole community to grow if Django components were easily decoupled from the framework as a whole.

I would love it if the Django ORM could be used outside of Django in an easier way — it’s a very powerful and flexible “Active Record” style ORM — I’d even go so far as to say that with it’s generative query syntax, and the recent queryset refactoring work it’s the best and most thoroughly documented Active Record type ORM in python today. But installing it and using it requires installing Django, and dancing an interesting (though not particularly complicated) configuration dance.

More on the third part of my talk which focuses on the future of Django coming later this week.

DjangoCon and learning from Zope 2

DjangoCon was a lot of fun. I think Jacob, Adrian, Malcom, and James are a lot of fun to hang out with, and I like them a lot — so it was good to get to see them all. I was a bit nerve-wracked about my talk which was hard to write and hard to present, so I missed out on some good times by being too preoccupied.

A TurboGears Guy talks about what Django can learn from Zope

The talk itself was first thing in the morning on after the official gathering at the bar on the previous evening, and I was asked to talk a bit about how django fit into the python web ecosystem, and how things could be better.

My goal with the talk was to shake things up a bit, and hopefully to encourage lots more cross-talk and cross pollination between Django and the rest of the python web developer community.

There is a lot to be admired in Django, and a lot we can learn from them. And at the same time, Ben Bangert, Mike Bayer, Ian Bicking, Armin Ronacher, and many, many others have been doing great things outside of django, which it would be awesome if Django users took on.

I thought the talk was well received, and Simon Wilison filed several tickets in django’s track during the talk, and I had lots of good conversations with folks after the talk was over.

But now that it’s out on YouTube, there seems to be a few people getting riled up about the talk.

I think discussion is good, and people have been saying that it’s hard to watch a 50 min. talk. So I’m going to quickly summarize a couple of points from the talk, and open up some discussion here if people want to do it.

Basically the talk is three 10-12 min talks followed by 10 minutes of QA. The first section is about Zope2 and learning from the mistakes of the past. The second is about how innovation works, and how Django fits into web-innovation in python. The third is about the future and a couple of possible suggestions for a couple of technical changes to Django that could improve things for everybody.

So, feel free to just watch the portions that you want.

Zope 2, history and not repeating the mistakes of the past

The first section is basically an argument that monolithic frameworks can lead to community fragmentation by increasing the cost of switching and by creating an attitude of “uninformed superiority” on both sides of the divide. This is among the lesions that can be learned from examining the history of Zope2.

I have a brief digression where I mention Zope’s z-shaped learning curve in this section which seems to have gathered some criticism. My point is that both through-the-web and the django admin have made getting started very easy, but require you to go back and relearn some stuff when it comes time t customize it beyond a certain point. There is lots of complexity in Zope2 that is avoided by Django, but still I think it’s pretty clear that there is a bit of similarity there. And some danger in the future if Django continues to grow in the wrong way. So, this was intended as a warning about a possible future, not as a critique of the present.

Dependency charts and Django’s “loose coupling”

Another digression in this section seems to have been been controversial. I put a couple of dependency diagrams into the talk the night before I gave it. I did this because I saw a couple things at the conference that made me think that Django developers didn’t really see what was happening in their framework.

I don’t want to be unfair to Django, but I very much thought that it’s important that Django cast of some of the complacency that seems to have grown up in the django community around the slogan “tightly integrated, but loosely coupled.” There’s a lot of great integration in Django, but I’m afraid some of the loose coupling seems to escape me. Take a look at this django dependency diagram. Django dependency diagram And I think you’ll agree that’s not exactly the definition of “loosely coupled.”

The diagram shows that there’s a lot going on inside of the Django package, and a lot of things look like they depend on one another. Someone suggested that they wanted to use the filesystem abstraction code from Django outside Django, and Adrian suggested two possible approaches:

  1. just import it from django.
  2. fork it to create a new package.

The first is hampered by the fact that it’s not at all clear as an outsider what importing one module from Django will drag into your application. A quick look at the dependency diagram here, was enough to make me realize that it was not trivial to find out exactly what will happen.

Don’t get me wrong, I think there are good things about the tightly integrated nature of Django. I just want it to be clear, that there is a trade-off, and there’s no way to magically get the benefits of both tight integration and loose coupling at the same time for free. And one of those trade-off’s is that Django can begin to become a bit of a closed system — good stuff can flow in from the outside world, but nothing flows out.

This too encourages the creation of a divide between non-django python web programmers and Djangonaughts.

OK, this has gone on plenty long enough. I’ll blog about parts 2 and 3 of my talk tomorrow.