How to write a better book (or just better docs!)

A lot of people tell me that they want to write a technical book for one reason or another. And I think that’s a great goal that can really stretch you as a communicator, as a programmer, and as a human being — so go for it. But if you’re thinking about it, I’d suggest that you learn from a couple of my mistakes. ;)

People might tell you that writing technical books sucks because you don’t make much money. (Which is true, as far as it goes). Or they may tell you that writing books sucks because it’s hard work. Or they might tell you how much time you spend away from those you love. And those things are true. But I don’t regret any of those things about writing the TurboGears book.

I do however, have a couple of process related regrets, and I’ve felt for a long time that I needed to write an article to codify some of the things I’ve learned about writing, so that prospective book authors and open source framework/library documenters have a shot at avoiding some of my rookie mistakes.

The two most important things that I learned from writing the TurboGears book were:


  • Every single line of code needs to be tested, not just before it goes in the book, but every time you make changes. If you don’t do this code will get broken in the process of last minute reorganization, rewrites, and crazy insanity.
  • It’s better to take time to do it right, than to rush something out the door that’s not what people need.

The testing issue is the most critical thing about book writing and it comes in two parts — both of which are far too easy to ignore. First code needs to be tested to make sure that: it runs, it does the right thing, and it makes sense. The first two tests are automatable, and really need to be automated. Refactoring, and rewriting are fundamental to making good code and good books, and you can’t confidently refactor without tests. And since I think book authors should be testing the code to make sure it makes sense, but getting target audience readers to read-and-understand it and making it shockingly easy for them to provide feedback, it’s likely that lots of refactoring opportunities will come up.

Unfortunately, though the Pragmatic Press people have one, as do many, many authors, I’m not aware of a single openly available tool which is designed to testing book-code easy. And I think this is a shame because even if you’re not writing books, every open source library needs documentation, and most of them need tutorial style documentation which requires the same basic tools. So, I’m hoping that some of us can join forces to get a tool like this started at the PyCon Sprints next week.

There have been two approaches to the problem:

  1. Suck code from external source-code into the document itself.
  2. Take code from the document, and mark it up with a list of external resources needed to test it.

Based on my unscientific results it looks like the first approach is more popular than the second. But the second approach has one very significant advantage — all of the code is visible while you’re writing the text and therefore you are less likely to have “refactoring” bugs that cross the text/code boundary (a method name is changed in the code, but not the text that describes it).

With that said, there are a number of compelling advantages of the suck-in code method. First, it’s relatively language independent. You just need to define what comments you’ll use to mark off code in the project (to add formatting, and mark the beginning and ending boundaries) and create a simple structure that runs the native language tests, and then builds the document. You may need to adjust things slightly for languages with different commenting conventions. And it certianly seems like multi-language support would be a lot harder to achieve when pulling code out of the document.

Also, I’m very much a believer in the idea that both the source code and the document-text source should be in an plain text format that’s easy to keep in version control, easy to track and easy to diff. I also want to be able to use the same editor for both my document source and my source code.

But in order to mitigate the possibility of the kind of “refactoring” problems I mentioned a minute ago, we ought top make it really easy to create rendered documents. I suppose you could work in two windows with the source-document in one and the rendered version in a second, but it would be even better if you could leave the “processing directives” that grab the code above the rendered-source in a plain text document, and then mark the end of the code samples in the rendered document, so that a document could be safely edited (while looking at the source) and then re-rendered at will.

If you’ve got an internal toolchain you think might be valuable as a reference for us, please let me know. And if you’ve got a couple of days and want to contribute to making Open Source documentation better, while making it easier to write good technical books, feel free to drop in (in person, or virtually) to the TurboGears sprint at PyCon next week and we’ll see what we can do.

9 Responses to “How to write a better book (or just better docs!)”

  1. The Red Bean Mercurial book mentions that its code samples are live and continually tested – I wonder if they’d share their test harness.

  2. I’ve been fighting a bit with this myself, and haven’t yet found a really good solution. For now I just keep the code open in one buffer in Emacs, the chapter I’m working on in another, and yank bits between them as needed. If you’d like to solve this problem on behalf of Python book authors everywhere, I’ll buy you a beer ;)

  3. 3Kevin Horn

    You could build something like this around docutils, I think. It has all kinds of hooks, etc. which allow various types of processing on the text of the document. I think it should be possible to kick off a test harness of some kind using it. Or at least easily grab code from your document and test it.

    That said, it may not be the _best_ option, as the internals of docutils are something of a beast…and the documentation is a bit scattered.

  4. 4Ken Kuhlman

    I haven’t had a chance to actually play with it yet, but I’ve got Crunchy filed in this space.

    Of course, I’ve never worried myself about writing a book, and that problem probably raises a huge set of formating issues, but the Crunchy developers seem eager to tackle new challenges.

    FWIW, I think Johannes Woolard will be presenting about Crunchy at PyCon.

  5. doctest is pretty much designed for this type of stuff.

    doctests are documentation that are also a tests :)


  6. Rene,

    Yea, doctest is great. But I can’ actually imagine trying to write an entire book in doctest — there are just too many limitations. Particularly when it comes to write extended tutorials on how to write large projects. You really can’t do that all in an interpreter session.

    And even for smaller documentation projects, doctest can sometimes feel a little awkward as a full documentation tool. doctest’s use of the interpreter session metaphor can force your example code to look strangely unlike the code you would actually write when using the library.

  7. 7Jean


    I know that Bruce Eckel uses its own method and python scripts to extract code from his “Thinking in Java” book and test them. You might want to contact him.

    Best regards,

    – Jean

  8. The new Python standard documentation system, sphinx, already has directives to include code blocks from external files. Maybe we should have a look at it, how easy/hard it is to expand it’s capabilities to do what you propose. It produces beautiful docs from the rest sources and has good support for both reference and tutorial style docs.

  9. I’ve been curious about this too. I’ve settled on Scrivener (OSX) which can export to MultiMarkdown ready for LaTeX. I can generate a PDF preview of the book in about 5 seconds and a few keypresses. And, MultiMarkdown also allows you to go to HTML or RTF.

    Next, I next need to work out how to pull source code in for code listings in the book. I’m thinking of writing a script that will detect special comment tags in my code and then import them in.

    I’m really curious about how the Pragmatic Press are doing things. Does anyone know what they use?

Comments are currently closed.