Archive for March, 2008

Documentation tools

Zed Shaw and I spent quite a bit of time talking about Documentation tools, since we are both working on books for Prentice Hall and we both want to have an open-source, reusable tool-chain that helps us get tested code into our documentation easily and effectively.

Zed did a bunch of hacking and came up with this:

Which has a very simple syntax for importing code into your plain-text documentation file. It’s a simple parser that we use to parse the document source and the export files, create parse trees, and do a bit of simple processing. One of the main advantages of this for book writers is that you can have the source code imported into your plain text documentation automatically, and repeatedly. That means you can look at the code while you’re writing. The final parsed output will allow you to convert the plain text document into your final output format (HTML, LaTeX, or whatever) and it will use pygments to code formating/colorization, and interlieve the code into the rendered document.

I also learned that the tool being used to do the new Python 2.6/3.0 documentation has been cleaned up and released as a general purpose documentation tool. It’s ReST (ReStructured Text) all the way, which seems to be a small price to pay for such attractive docs, and such a wide feature set. It also relies on imports from specific line numbers — which is a lot more fragile than marking sections for import, but does not require you to be able to include delimiters in your source code files.

I haven’t had a chance to use it yet, but I’m very excited about it as a tool for the TurboGears online Documentation.

Hooray for better doc tools!

Google Summer of Code

First I want to congratulate Chris Arent and Chris Perkens who put a lot of work into the Google Summer of Code application for TurboGears. The GSoC has been very good for python, and good for TurboGears in the past, and it’s really nice to take this to the next step and become a GSoC mentoring organization this year.

With that said, there is one project being sponsored by the TG team which I want to highlight because I really want to see get picked up by talented and motivated student. The first would be the Genshi speedup stuff, this is going to require some python profiling and optimization skills (obviously with lots of mentor help) and will likely require some C coding as well. But it will be a huge benift not just to TurboGears, but also to Trac, and all kinds of other Python projects which want to produce XML output for whatever reason. Recent benchmarks have shown that TG2 is very fast when you don’t use genshi match templates — so making this stuff faster will have a big performance impact.

So if you’re a student, and interested in doing something challenging, interesting, and useful, please take a look at this, and the rest of our GSoC ideas on the wiki — and feel free to suggest new things on the mailing list.

Tutorials are hard.

Ben Bangert and I did a tutorial at PyCon this year, and it was hard. We’d just cut eggs for TG2 and pylons pre-release versions that morning after months of trying to have releases done before PyCon. So, I personally was a bit frazzled before the tutorial even got started, and then we had a huge number of technical issues with the pycon ireless, which lead to various egg-dependency problems.

Our back up strategy for sharing the eggs was to set up an ad-hock wireless network, and sharing them out directly. But the PyCon folks specifically asked that nobody setup ad-hock networks. So we had to fall back to burning CD’s, putting eggs on flash drives, and walking around helping people get installed.

Ben Bangert and Chris Perkins did a great job of helping people get stuff installed, while I walked the people in the room through a basic WSGI tutorial, so they would have the conceptual framework needed to understand how some of the TurboGears2 and pylons code works.

We then went into the TG2 tutorial proper, but by this time we were running pretty late, and the need to provide both TG2 and Pylons examples became somewhat problematic in the tight timeline we had. I worked hard to make sure that there were materials that the tutorial participants could take away with them, and I think that paid off because we didn’t get to finish the TG2 tutorial. But hopefully people will be able to go through the tutorial on their own.

And I created a Google Group for tutorial members so that they can ask questions, get help, and otherwise continue to learn stuff even after the talk itself is done. If you were at the tutorial, and you didn’t get a chance to join the group — but still want to, send me an e-mail and I’ll get you added.

I learned a lot about how to handle crazy problems in the context of a large tutorial group, and I’ll be doing a number of things differently next time.

How to write a better book (or just better docs!)

A lot of people tell me that they want to write a technical book for one reason or another. And I think that’s a great goal that can really stretch you as a communicator, as a programmer, and as a human being — so go for it. But if you’re thinking about it, I’d suggest that you learn from a couple of my mistakes. ;)

People might tell you that writing technical books sucks because you don’t make much money. (Which is true, as far as it goes). Or they may tell you that writing books sucks because it’s hard work. Or they might tell you how much time you spend away from those you love. And those things are true. But I don’t regret any of those things about writing the TurboGears book.

I do however, have a couple of process related regrets, and I’ve felt for a long time that I needed to write an article to codify some of the things I’ve learned about writing, so that prospective book authors and open source framework/library documenters have a shot at avoiding some of my rookie mistakes.

The two most important things that I learned from writing the TurboGears book were:


  • Every single line of code needs to be tested, not just before it goes in the book, but every time you make changes. If you don’t do this code will get broken in the process of last minute reorganization, rewrites, and crazy insanity.
  • It’s better to take time to do it right, than to rush something out the door that’s not what people need.

The testing issue is the most critical thing about book writing and it comes in two parts — both of which are far too easy to ignore. First code needs to be tested to make sure that: it runs, it does the right thing, and it makes sense. The first two tests are automatable, and really need to be automated. Refactoring, and rewriting are fundamental to making good code and good books, and you can’t confidently refactor without tests. And since I think book authors should be testing the code to make sure it makes sense, but getting target audience readers to read-and-understand it and making it shockingly easy for them to provide feedback, it’s likely that lots of refactoring opportunities will come up.

Unfortunately, though the Pragmatic Press people have one, as do many, many authors, I’m not aware of a single openly available tool which is designed to testing book-code easy. And I think this is a shame because even if you’re not writing books, every open source library needs documentation, and most of them need tutorial style documentation which requires the same basic tools. So, I’m hoping that some of us can join forces to get a tool like this started at the PyCon Sprints next week.

There have been two approaches to the problem:

  1. Suck code from external source-code into the document itself.
  2. Take code from the document, and mark it up with a list of external resources needed to test it.

Based on my unscientific results it looks like the first approach is more popular than the second. But the second approach has one very significant advantage — all of the code is visible while you’re writing the text and therefore you are less likely to have “refactoring” bugs that cross the text/code boundary (a method name is changed in the code, but not the text that describes it).

With that said, there are a number of compelling advantages of the suck-in code method. First, it’s relatively language independent. You just need to define what comments you’ll use to mark off code in the project (to add formatting, and mark the beginning and ending boundaries) and create a simple structure that runs the native language tests, and then builds the document. You may need to adjust things slightly for languages with different commenting conventions. And it certianly seems like multi-language support would be a lot harder to achieve when pulling code out of the document.

Also, I’m very much a believer in the idea that both the source code and the document-text source should be in an plain text format that’s easy to keep in version control, easy to track and easy to diff. I also want to be able to use the same editor for both my document source and my source code.

But in order to mitigate the possibility of the kind of “refactoring” problems I mentioned a minute ago, we ought top make it really easy to create rendered documents. I suppose you could work in two windows with the source-document in one and the rendered version in a second, but it would be even better if you could leave the “processing directives” that grab the code above the rendered-source in a plain text document, and then mark the end of the code samples in the rendered document, so that a document could be safely edited (while looking at the source) and then re-rendered at will.

If you’ve got an internal toolchain you think might be valuable as a reference for us, please let me know. And if you’ve got a couple of days and want to contribute to making Open Source documentation better, while making it easier to write good technical books, feel free to drop in (in person, or virtually) to the TurboGears sprint at PyCon next week and we’ll see what we can do.

Jython and Java part 2

Well, in the week since my post on Jython, some really good news has come to my attention.

Ted, and Frank are now working on Jython and Python stuff for Sun Microsystems. I’ve heard from Frank, Jim, and Ted that the work on the JVM is actually already a much more hospitable environment for dynamic languages than I thought. And all of them have pointed to the Da Vinci Machine project as being an active project with a community that’s very concerned with making the JVM into a really great platform for dynamic languages.

Ian Bicking suggested that the runtime environment of the JVM isn’t all that friendly to Open Source sensibilities (the VM takes a while to load, it’s got WAY to many options, and is too complex to tweek properly for memory issues.) And perhaps that’s true, but I think the least friendly bit of the JVM is the what feels like open hostility towards C libraries. In my opinion this is the biggest single issue in terms of hostility to the Open Source way. So much of the Open Source community is steeped in a Unix+C ethos that’s very hard to shake. And for significant number of problems, there are indisputable performance gains that you can get when you manage the way that your data structures are arranged in memory yourself. I’ve been told that this is getting better too, and that the horrible days of the Java Native Interface (JNI) are numbered.

On another front, I had hoped that the Jython and Jruby people could reach across their respective language community cultures, and work together. And I’ve been told a good chunk of the JRuby team is coming to the PyCon sprints to help with the Jython sprint — which is awesome!

In other news:

Somebody suggested that perhaps the Tamarin virtual machine might be a viable alternative to the JVM, with the IronMonkey project porting IronPython to that vm. I honestly don’t think that will happen, both because the IronMonkey project seems stalled, and because the IronMonkey project was just a IL to Tamarin VM “bytecode” cross compiler, so it would likely always be a step behind the .NET guys in terms of performance.