Archive for February, 2008

JVM as platform for dynamic languages?

Martin Fowler recently blogged about how to choose between the emerging JRuby and Groovy scripting languages on the JVM. He pretty much ignores both Rhino, and Jython, which is strange since Jython is the grand-daddy of dynamic languages on the JVM, and Rhino seems to have the most internal support from Sun, and has been getting some serious press in the blogosphere. And I think his failure to address Jython, and Rhino reduces the value of the article.

But he does, finally, at the end of his article, he get around to a quick mention of Jython in the context of asking a bigger question:

Will either matter to Java? After all Jython’s been around for a long time without making a huge impact on the JVM. Tool support is frankly pathetic for any of these languages when you compare it to what you have for Java at the moment.

This question is skating right around the critical issue, without ever touching it directly.

Why is there so little uptake for dynamic languages on the JVM? And what will it take to change that?

Another way to ask the same question would be: We have a lot of deep experience with C integration in the Perl/Python/Ruby communities, but there is no such deep experience with the JVS as a platform for scripting languages. Why?

To sharpen the question a bit further, we should note that the Jython/java integration story is in many waysbetter than the Python/C story. In Python 2.5, c-types makes using C libraries from python easier than ever — but you don’t have to do anything special at all to use Java libraries from jython. In spite of this, Jython and all the other new “dynamic languages” on the JVM have not become a popular way to work with Java libraries.

Why?

I’m going to address this from the perspective of Jython user, because that’s the JVM/dynamic language experience that I have. But, I’ve played with Jruby enough to know that the issues are similar.

One problem that seems to stop python programmers from flowing over into Jython easily is the impedance mismatch between the way things are done in Python and the way they would be done in Java. As a python developer you find yourself constantly switching mental gears to write anything complex in Jython. You’re thinking about the python code you’d write, and things sort of flow until you run up to something in a java library which requires you to look at the world in a fundamentally different way.

Of course, the same is could be said to be true of Python and C. Sure, you could argue that C programmers and python have more similar pragmatic approach to library development, but I don’t think that’s the main difference. The main difference is that developers tend to “wrap” C libraries and provide “pythonic” bindings which reduce the impedance mismatch. Nobody seems to do this in jython, partly because it’s just too easy to ignore the need for good pythonic API’s and just get something done. It seems that easy is sometimes the enemy of good — which is definitely not news to any experienced programmer.

But, perhaps there is another possible reason, is that the JVM was written explicitly for java, and is not particularly friendly to dynamic languages. Reflection is slow, and in general features used by dynamic languages haven’t been a priority for the JVM. Fortunately there’s some work happening already to make the JVM a much, much friendlier place for dynamic languages. But I don’t expect that Jython is going to be faster than C Python any time soon. So that, means that the JVM as a platform needs to offer something more than just a good way to implement python. I’ve heard some rumblings from the Jython team that this may not be a significant factor in the future, but it’s certainly colored people’s perceptions in the past.

This brings us back to the fundamental nature of the JVM as a viable platform. It’s one thing to have languages that run on the JVM, and call that the platform, but the language and the managed runtime environment are only part of the picture — in my experience the platform is defined as much by it’s libraries as anything else.

I think this is the main problem that the JVM will face as it evolves into a multi-language platform over the next few years — the java standard library is complex in unexpected ways (how do you open a file and process it line by line — I always have to look it up?) and that makes it difficult to use as the foundation for a vibrant scripting platform.

If however people aren’t working with java libraries, you’re just running Jython on the JVM, and all you get is a slower slightly more out of date version of Python, with fewer libraries, and that’s not a very compelling case. Perhaps you need to run on the JVM for political reasons, or you need access to one of the java libraries that don’t have an obvious python equivalent, like i-text or lucene. But except for those relatively narrow use cases, why wouldn’t you stick with CPython?

The Microsoft threat

Microsoft intentionally has worked to create libraries that make sense for C# users, but which are accessible from languages like VB.net, and IronPython. Thus they have created an alternative Python platform which is much more interesting than Jython. And to be successful in turning the JVM into a multi-language platform I think something similar needs to happen to Java.

At least right now, the situation the way that Groovy, Jython, Scala and JRuby have been implemented makes it unclear how well you’ll be able to integrate libraries written in one of these languages with libraries from another. Effectively this places Java libraries at the center, and relegates dynamic languages to “second class” status. At this moment, writing libraries in Jython that would be in an attempt to make them usable to Jruby and Groovy folks seems like a fools errand.

So, having a common core of reusable libraries written in Java seems like a key ingredient in making the JVM a better platform. These libraries would have to be built:

* with multi-language reuse in mind
* in ways that don’t require lots of boilerplate code
* to work well with dynamically typed languages

That kind of standard library would benefit allof these new players on the JVM. And that’s what I think Sun ought to invest in if they want the JVM to become a platform that JRuby, Jython, Scala, Groovy, and whatever else comes next, can thrive.

Another Microsoft Threat

At the same time, Microsoft has been widening the gap in another way, by introducing the DLR — a new set of tools for dynamic language implementors.

Much of the thinking behind the DLR comes from Jim Hugunin (the original implementor of Jython), and if it works the DLR will help to make dynamic languages first class citizens on the Microsoft CLR since you will be able to use them to write libraries that are cross-compatible, making IronPython libraries usable in IronRuby, etc.

The aforementioned project to make the JVM more friendly to dynamic languages is a good place for language implementors to work together. And it would be very interesting to see the jython, jruby, and other language implementors get together and provide a good solid answer to the DLR. I’ve been talking to some of the Jython guys this week, and they tell me that this is happening, and that a lot of the work is already done.

Given the politics of language communities, and the necessities of the Open Source development model, I’m sure this has been more difficult than it could have been, so I’m very excited to hear that the people involved have taken the time to make that happen. I haven’t seen a lot of talk about this on the internets, but it looks like people are quietly doing the right thing without running the hype machine in overdrive which is refreshing.

After talking to the Jython guys this week, I’m actually pretty optimistic about the JVM. It’s growing in a direction that provides a solid memory-managed platform for the dynamic languages of the present and future, and it could become a platform for collaboration between language developers.

The CLR could do the same thing, but I want to see it happen on the JVM, because that would make it free in all the important senses of the word.

Profiling

I’ve got some to-do items centered around profiling TurboGears 2 applications, and it looks like they just got easier. Repoze just released some WSGI middleware that profiles everything inside of it. Simple, clean, and easy to use. Heck it’s even got a simple web based interface for looking at what’s going on inside your app.

New tools like this reinforce my conviction that WSGI, and reusable tools are critical for the future of python web development.

By the way, I don’t see any reason why you couldn’t use this new profiler with Djanogo right now — and that’s the point. WSGI means that frameworks can reuse tools, which is a very, very good thing.

Migrations, Schema Evolution and SQLAlchemy

The SQLAlchemy migrate project has reawakened after a long slumber. And I couldn’t be happier. Evolutionary database development, or “migrations” as the Rails folks have re-labeled the process, is an important piece of the agile web development puzzle.

It’s possible to do without it but I can’t think of a single project I’ve worked on in the last couple of years that didn’t need something. I’ve helped to hack things together for various groups, and I’ve never been particularly satisfied with our one-off solutions. Particularly when everything I’ve worked on has had needed a robust, standardized system for mananging and versioning database schemas.

Today, thanks to Christian Simms and Jan Dittberner, SQLAlchemy Migrate now works with SQLAlchemy 0.4. Which means that I can now use in in almost every project I have. I expect that this will bring a new wave of users, and I know that it opens the door for a whole new set of features.

This is also a core feature that I think TurboGears 2 needs to have to really be complete web development toolkit, and now we’ve got it. I don’t want to sound like a broken drum here, but I’m very happy that this isn’t part of the TurboGears package. It’s better because it’s not part of the framework. Anybody who uses SQLAlchemy in GUI’s or in command line tools, or whatever can (and probably should) use these Migrate, and the likely wouldn’t want to install a web-framework just to manage their database schemas.

Question for the masses:

On another note, there’s one django project that is the only thing I’m currently working on that doesn’t use SQLAlchemy, and I must confess to being bewildered by the array of Django schema migration projects out there. So any of you Djangonaughts with advice, recommendations, tips on how to do schema evolution on that project would be very welcomed.

Request/Response objects in TG2 and beyond

One of the things that slowed down the TG2 release recently has been the move to WebOb based Request/Response objects. There was some extra work required to make this happen right away, but I think it was worth it. WebOb provides an new and interesting approach to request/response objects that I hope will catch on around the Python web community. And it was likely that a switch would be inevitable, so we bit the bullet now in order to avoid future API changes.

Why WebOb?

A WSGI server (see www.wsgi.org) itself doesn’t do anything to provide a full featured web request/response object, but providing a simple CGI style “environment dictionary” and a callable for setting response headers. This is almost certainly the “right” way to do things for WSGI itself, because it provides a usable base, and avoids getting into framework specific issues, but there’s certianly room to provide a nicer API for end users than that.

Which is why modern python web frameworks, from Zope, to Django, Pylons, and TurboGears 1/CherryPy all provide users with a more “user friendly” request/response objects. Unfortunately they all implement different request/response objects, and their implementation is part of the “framework” itself so it isn’t easily reusable outside of the framework, and it certainly isn’t usable across frameworks. As I mentioned earlier, WebOb is an attempt by Ian Bicking to change that. The best thing about WebOb is that it is very much a friend of WSGI, not a competitor. It maintains the WSGI environment as the canonical source of data, and just wraps it in a nicer API.

WebOb makes fiddling with headers, cookies, and all that in your app easier. But it also makes writing WSGI middleware much easier. Several pieces of middleware used by TG2 have already been rewritten to use WebOb internally, and the authors report that WebOb has make thier life much easier. And it’s certiainly made their code easier to read and understand.

WebOb WSGI and re-usable components:

Appropriately, given yesterday’s discussion of re-usable components, the WebOb documentation has an interesting example of a commenting middleware component, that shows off how you can make “pure WSGI” components that are entirely framework neutral.

WebOb and the future of Python Web Development:

And it looks a bit like it’s happening, SkunkWeb, Pylons, and TurboGears 2 are adopting WebOb, and there’s been at least some talk about doing something similar in Django, which I think would help make it easier to write libraries that “just work” in multiple frameworks. Of course, we may never settle on one ORM, or one templating language, , but there’s a whole set of libraries that pretty much just need to manipulate the request and response objects and don’t need to interact with the rest of the framework. So, if several of the major frameworks move to WebOb (and WSGI of course) that would make the dream of cross-framework libraries much more of a reality.

I keep saying it, because I think it’s important, Python’s web framework diversity is only a hindrance if we 1) don’t work together, and 2) don’t learn from each other. And given the current crop of framework developers, I don’t see any chance of us not working together and learning from each other.

“Site Components” in Django and TG2

James Bennet responded to my recent post about Pylons, Django, and conceptual integrity.

James describes a core feature of Django (re-usable web-site components they call “apps”), and explains part of the trade-off that the Django community makes:

If you take a little bit less flexibility… in “swappability” of framework components, you can get a useful benefit in return: a set of common APIs that you can rely on, in exactly the way that my generic content tags rely on Django’s model-loading API, or that things like my user-registration and user-profile applications rely on the APIs and near-universality of django.contrib.auth.

And, I think this trade-off is often missed by people critiquing django. People have even gone so far as to say you “can’t use SQLAlchemy” with django — which is just plain false on it’s face. But what they generally mean is that if you use SQLAlchemy you loose the django-admin, the contrib stuff, and many of the conveniences that Django offers. So, fundamentally this isn’t a trade-off that the django developers forced on you, it’s one that django users have to make for themselves. They can either take the conveniences offered by standardized components, or they can go out on their own and use pretty much any component they want. Most people don’t go off on their own though, as “the price is too high.” But “can’t” is definitely the wrong word to describe this.

PartsUsers might want to be able to switch our the ORM and still be able to use the Django admin — but I for one am not sure how the Django people could make that happen in a reasonable way, without significant additional complexity.

But I think James is right on target when he suggested (in a past discussion) that users care about reusability at this level a lot more than they care about how much code the framework authors were able to reuse. TG2 users don’t care if we had to write our own transaction middleware or not. But they do care whether or not they have to write their own user-registration system!

And from the perspective of a Pylons user, this is the very reason that TG2 is important. TG2, like Django will define a set of tools that can be used in building re-usable web site components. TG2 users should be able to powerful, reusable components, with SQLAlchemy, Genshi, ToscaWidgets, and the whole TG2 toolchain, and this is one of the core reasons that Ben Bangert and I decided that TG2 should continue on-top-of Pylons rather than just merging the two projects outright.

There’s a lot of work to be done on the TurboGears 2 site-component front, but we’re committed to making it happen, and it will be one of the areas that we’ll be focusing on at the PyCon sprint in March. And DBSprockets is leading the way by showing how we can use the TG2 toolset to build something like the newforms based version of the Django Admin app.

And of course TG2 has another vector of re-usability via WSGI and WSGI middleware, and that’s an important part of the story too. If you can do authentication in middleware (like Kevin Horn has been doing) with a well defined interface between that middleware and the TG2 app, that enforces a level of orthogonality on your code, and provides for all kinds of interesting possibilities. But, that’s a post for another day.