Sure, there are lots of potential nitpicks that you can have with WSGI and WSGI middleware — mostly because the CGI spec on which WSGI is based is getting a bit long in the tooth — but I think those who get worked up about that kind of thing are missing the point.
WSGI provides a known, standard, working way to create reusable components of functionality which are coupled only to a published specification, not to your code. It’s the same reason that HTTP was so successful in the first place, you can build stuff without knowing exactly how it will be used later.
A perfect example of this is the new repoze.squeeze middleware that was released a couple of days ago.
It uses statistical analysis to decide how to best join and compress stylesheets and javascript resources based on actual usage.
Sure you could do this same thing with Django middleware, but if you use WSGI then Zope, Django, TurboGears, Plone, and any other WSGI compliant app can just use it without having to worry about this stuff inside the app itself.
BTW, they also released repoze.bitbit which automatically re-sizes images before sending them out. When coupled with a good caching proxy server this makes it very easy to let people link to any-sized version of any of the images in your library. While i don’t think it’s as widely useful as repoze.squeeze, it too provides an example of something that can just sit there transparently helping you, and it exemplifies a kind of very loose coupling that is a mark of “true middleware.”
There’s another kind of middleware which does stuff that’s “required” for your app to work properly, and I agree that it’s probably best to call this something else, since it’s not transparent, but I think it too has a wide variety of uses, but that’s a blog post for another day.
Ever since my friend Adrian posted up a slide at PyCon telling us that in Django Trunk == hot_action, I’ve been getting lots of e-mails from various folks telling me about the hot_action that I could get from various python packages. But today, the hot action isn’t the contents of the packages, so much as the packaging systems.
As a starter there’s a very, very good background post byKevin Teague that provides great overview that provides the then current Python Packaging ecosystem.
But even though that post is very recent it’s not at all complete, because last week spawned two interesting new projects pyinstall from Ian Bicking, and distribute a potential setuptools fork.
But I still think it’s interesting, and it’s definitely spawned a lot of good discussion on the distutils-sig.
When I talked to Tarek’s last week his motivation behind starting distribute was always to help move setuptools forward, and his hope was that anything that happened there would be folded back into setuptools proper as time went on.
There were lots of people at PyCon this year who were both vocal about what was wrong with easy_install and motivated to help improve it. Beyond that some of them were people who do packaging all the time, and some of them had some previous experience writing packaging systems in python.
On of the most interesting tidbits from that discussion is this suggestion from Tarek and Guido’s response:
> My own inclination is that a scalable future for distutils means an improved
> sdist format, the end of setup.py as an command-line interface, and
> community-maintained platform-specific installation tools that process
> source or binary distributions. Most complaints about distutils (and
> setuptools, for that matter) are focused on installation policy&preference
> issues. Making it possible and practical for a variety of tools to flourish
> around a standardized format (ala WSGI) seems like the way to go.
Given the success of WSGI (which I use every day) this sounds like a
very good plan!
And like Guido I’m pretty enthusiastic about extending and standardizing the format while allowing a variety of tools to grow up around that new standard.
But I’m a bit skeptical about focusing only on sdist, since I think we do need some kind of way to distribute binary packages to Windows and OS X users, and while eggs work pretty well there, I don’t what that stuff to be left behind.
Which brings me back to pyinstall for pure python packages I really like pyinstall, because it has all of the key features that I think that a python package system should have (repeatable installs, single file installs, etc), but it’’s focus on only source-level distribution makes penetration in the windows market harder. And I guess the same could be said of OS X where getting a compiler can be require a 1gig download.
Where do we go from here?
It’s fair to say that easy_install is not perfect, but it is much better than nothing, and last week’s news had made me excited about the future of python packaging again. My thanks to Philip, Ian, Tarek, and all the other people who have put in effort to make this part of the python better, and hopes are with you.
If you are interested in getting involved, and helping to make all this stuff better, there’s also going to be a sprint this weekend.
The third part of my DjangoCon talk was about how Django can become a more innovative place, or alternatively how it can stifle innovation in the future. This very much builds on parts one and two, which I blogged about over the last few days.
The “big idea” of this talk is that competition, and continual refinement are the two real drivers of design excellence, and both Django and Python are well served by having web frameworks that compete and that are continually undergoing a process of internal refinement and improvement.
Again, this notion is not unique to me, the book Innovation Happens Elsewhere fleshes out the specifics of how these two factors foster good design:
“Competition and collaboration contribute
to innovation and creativity in two distinct ways.
Competition works through diversity and selection;
collaboration works through refinement and improving.”
This means that we are best served by an python web ecosystem that’s both cooperative and competitive. Which is why I (a turbogears guy) am willing to take time out of my schedule to try to make Django a better framework.
And it’s why I’m so stuck on the notion of components, and reuse, because good component architecture helps us to collaborate at the level of the library, not just the level of the framework. If the file system abstraction code, or the Django orm, or whatever other components of django were reusable libraries, they would be candidates for cooperation, and continual refinement.
And if newforms (now forms) were a separate component, they could compete head-to-head with similar tools like ToscaWidgets.
But since they are locked up inside Django, that competition has to happen at the level of the framework. Which is fine, except that you loose the granularity, and specificity that comes from competing at the individual component level. For example, where’s the detailed comparison of ToscaWidgets vs Django’s new forms library? Nobody has written such a thing because nobody (except my friend Max) outside of Django considers the Django forms library to be a reusable component.
And at the same time that competition happens at the level of the framework, cooperation can’t really happen well at all, because Django’s no outside dependency policy precludes it in some important ways. Django users can use non-django components, and Django itself has borrowed ideas and even code from other projects (simplejson, etc). But there is no two-way flow of code, and no active cooperation with outside library developers.
So, now for the specific recommendations, and here since I have a bit more time, I’ll offer a few more specifics than I did in the talk itself:
Make the request object a proxy to the wsgi environ dict
Make django middleware and WSGI middleware more interchangable
Reconsider the no-outside dependency rule
Make the request object a proxy to the wsgi environ dict
In the past there has been some talk about making TurboGears, Pylons, and Django all share the same request/response object API’s. I’m not opposed to this, and the fact is that TG2 and Pylons both share WebOb as a request/response API now. But I think that WebOb has actually shown us that the critical thing is not that the Request/Response object API must always be the same, instead the key thing is that the WSGI Environ dictionary be the canonical representation of the data, so that it can be passed along easily to anything WSGI compliant.
Since WebOb is already pretty close to the Django request/response object API, it’s probably easiest to subclass the webob object and provide a backwards compatible API with the current Django implementation. But I think we’d all be almost as happy if Django re-implemented this, the key is to always have an up-to-date version of the environ that you can grab.
The big advantage of all this is that you can then use WSGI applications easily within a django view. For example Rum is a wsgi app that provides a django admin like interface for SQLAlchemy, and if the environ were up-to-date and available there’s no reason that RUM couldn’t be used inside a Django app.
At the same time a lot of effort is going into making Django middleware that does the same thing, which isn’t nessisarily a valuable kind of effort — we should be trying to make better stuff than what’s there, not just duplicating existing stuff in a different framework.
And though I have not tested all this WSGI middleware with Django it should “just work.” So, in the short term anyone creating what might be widely reusable middleware should consider doing it as WSGI middleware rather than Django middleware because that will allow it to reach a wider audience.
But since WSGI middleware and Django middleware are essentially doing the same thing (pre and post processing a request) there’s no reason that we couldn’t write an adaptor that lets everybody use everybody elses stuff in a reasonable way, and while this isn’t particularly glamourous work it would significantly improve cooperation between python web develoers on another front.
The no-dependency rule
My understanding is that the no-outside dependency rule is designed to make installation easy, and a realization that easy_install is not always so easy. But this is changing, virtualenv, basketweaver, eggbasket, and other tools have made working around the limitations of easy_install a lot easier. You can now create your own package indexes, make reliable install scripts, etc.
There are also a number of people working in this space to make distributing multi-package installs easier. Ian Bicking’s new pyinstall package has a way to package tarballs with all the source distribution packages you want to ship.
And Alberto Valverde released the EggFreezer, which packages up all the source and binary eggs necessary to do an install into a single python install script.
Ian’s package skips binary eggs entirely, and something similar can be done using Kevin Dangoor’s Paver project. Here the idea would be to use paver to overide the “python setup.py install” command, to install not just django but all of it’s dependencies (which are conveniently packaged up in the same tarball as Django.
After some last minute wrangling of schedule, and other commitements, I’m going to PloneConf in October in hopes of furthering the python web development ecosystem’s cross-framework collaboration a bit more.
But lodging in DC is more than expected, so if anybody’s interested in sharing a room with me it would help make this less unaffordable for me. I’ve got a nice hotel booked through priceline, and would love to have a good roommate to help share the costs. If interested send me e-mail at my gmail address (mark dot ramm), or leave a comment here.
I’m really looking forward to connecting to some of the Plone folks, and to learning more about how things work in the Plone world of 2008.
As I mentioned in my last post, I did a talk at DjangoCon a couple weeks ago, which has proved to be a little bit controversial in some circles (though the biggest critics are people who admit they didn’t actually watch it).
To see the part of the talk I’m talking about in this post jump forward to about minute 19 of the talk.
My previous post talks about almost all of the stuff that I consider controversial, and is basically an argument from history that very-large monolithic code-bases can get into trouble, and cause community fragmentation by separating out those who use “monolith x” from those who don’t. This is particularly true when the monolith is not particularly modular internally. This was the case for Zope 2 and Django seems to be following them down that path to some extent.
But today, I’d like to shift focus from the past to the present, and from social lesions to a famous bumper sticker I’m pretty sure most of us have seen:
SHIT happens
But since I don’t want to be too negative, let’s rephrase that:
GOOD shit happens
And I’m actually not interested in how great the django core-dev team’s drinking abilities are, or how wonderful their social lives are.
I’m interested in the good shit that’s relevent to making developing web applications better, so perhaps we could use this bumper sticker:
Innovation Happens
Django’s dev team has done some great stuff, that has shown that there’s lots more that can be done to make web developmet easier for python programmers. But at the same time not everybody uses Django and lots of problems that Django is currently trying to solve have already been solved by other people.
This argument isn’t really mine, it’s an argument made in the book Innovation Happens Elsewhere by Ron Goldman & Richard P. Gabriel. In the book they argue for using open source software so that you can harness the innovation that’s happening elsewere.
But even for Django, which is open source there’s no cornering the market on innovation, so we could say:
Innovation Happens Everywhere
It’s happening in Django, and outside of Django. It’s happening wherever there are smart people who want to make things better.
Here’s a quote form Goldman and Gabrial’s book:
Silly as it sounds, this is the brutal truth:
Regardless of how smart, creative, and innovative
you believe your organization is,
there are more smart, creative, and innovative people
outside your organization than inside.”
Innovation Happens Elsewhere
But, Django has a policy that they do not depend on anything else. This means that they have to reproduce all the innovation that happens elsewhere in the python web community in order to keep up. Perhaps this is OK, I mean Django is big, and they are innovating themselves, at least that’s the argument that some are making.
I think it’s a bit short-sided though, there’s a lot going on in TurboGears, repoze, SQLAlchemy, Pylons, Zope, etc. And all of those people are starting to work together more and more. They are recognizing that WSGI and reusable libraries makes python web development more flexible than ever before. If anybody thinks web development 10 years from now will look much like it does today has their head in the sand.
In the talk I just mentioned a few limitations in Django that people mentioned in Saturday’s talks. And I took one smart person outside the django community, Mike Bayer, and showed what he’d already done to solve those problems.
Beaker: Mighty+Beaker solved the dog-piling problem with caching in django’s cache layer
Beaker: encrypted cookie sessions (actually this one was done by Ben Bangert)
SQLAlchemy: Batched commits (the unit of work pattern)
SQLAlchemy: Multiple database support
SQLAlchemy: Horizontal partioning (AKA, sharding)
Of course there are lots of others, who are also doing lots of innovative things that solve problems Django users have right now. And there’s no technical reason Django and the Django developers can’t use that stuff.
What this means for Django
My fundamental argument in this section of my talk is that Django users should start taking advantage of things that the rest of the python development community has been building. Just because Django has a no dependency policy doesn’t mean you should. And it would help the whole community to grow if Django components were easily decoupled from the framework as a whole.
I would love it if the Django ORM could be used outside of Django in an easier way — it’s a very powerful and flexible “Active Record” style ORM — I’d even go so far as to say that with it’s generative query syntax, and the recent queryset refactoring work it’s the best and most thoroughly documented Active Record type ORM in python today. But installing it and using it requires installing Django, and dancing an interesting (though not particularly complicated) configuration dance.
More on the third part of my talk which focuses on the future of Django coming later this week.