Relational Databases as an “implementaiton detail?”

I found this little tidbit in amidst the hugely overraught comment treads on Derik’s post describing his return to PHP from the wonderful world of Ruby on Rails:

I loved the part about the database as an “implementation detail.” That sings to me.

Daniel Waite

And, in spite of the poetic hyperbole of folks like Daniel, I think relational databases exist for good reasons and knowing about how they work is a good thing. Active record treats the database kinda like a big hash table in the sky, which is OK for some applications, but not so OK for others.

I find that large, complex apps have a way of growing in unexpected ways over time — and relational algebra is much better designed to handle unexpected requests for complex datasets than hash tables or even the most well designed objects. SQL, and relational algebra are implementation details in the same way that using a hammer is an implementation detail in building a tree house. Sure you can do without, but you will have a harder time, and will definitely end up building a different treehouse!


Don’t misunderstand me, Object’s are good things, and I sure as hell hope people understand them, use them, and make great software with them. But relational algebra is a good thing too. And it’s become clear that any really good database toolkit can’t forsake one in favor of the other, and that’s why SQLAlchemy, Storm, DejaVu, and Hibernate are all better tools than Rails’ Active Record.

As for Derik’s original article. In spite of all the brouhaha, Derek’s post basically says: “hey I liked Rails, and I learned a lot about good programming from it. And ultimately it was easier to get my site rewrite done in PHP because I learned rails first.”

I think it’s obvious that Rails could have done the job for Derek, but two things got in the way:

  • everything was already in PHP so Rails wasn’t and incremental update, it was a rewrite.
  • Active Record got in the way of using SQL as a relational algebra engine, rather than a glorified hash persistence mechanism.

TurboGears users should pay attention to both points. If you have a large complex app, you’ll be better off with an ORM that treats SQL as a full partner, which is why we’re migrating the defaults to SQLALchemy which does a fantastic job of making both relatinal algebra, and object oriented programming part of it’s core.

The second point is much harder to deal with, but if you can wrap existing application functionality in a larger TurboGears app, and migrate it piece by piece you’ll definitely have more success. I’ve been doing a lot of Web Service architecture stuff this year, and that’s proven to be a great way to connect existing resources to TurboGears applications.

8 Responses to “Relational Databases as an “implementaiton detail?””

  1. 1John M. Camara

    To make your case for TurboGears’ migration to SQLAlchemy you are making points on the following 2 issues.

    - relational vs non-relational databases
    - simple vs full featured ORMs

    You should just concentrate on the simple vs full featured ORM issue as the choice of database type is a separate issue. After all ORMs map to relational databases.

    On the relational vs non-relational databases issue.

    There are many types of hammers and choosing the ideal one for a particular job can make a world of difference in getting a task done. Would you use a 10lb sledge hammer to drive a tack into a will? I don’t think so, as you will likely smash your hand holding the tack.

    So what kind of hammer is a relational database? Is it a common hammer. What about BerkeleyDB? A nail gun as its faster and not as flexible as a common hammer. What about ZODB…

    Seriously, there are many types of data stores and they are all tools. The one who better knows the pros and cons of each tool can make wiser decisions on their use and will likely get the job done more efficiently. So don’t get fooled into believing that relational databases can solve all your problems and in a better way than other tools as in many cases they are the wrong tool to use.

    I prefer to “work smarter not harder” so I make sure my tool belt is full of tools.

  2. John,

    You have a good point. There’s plenty of room for non-relational data storage, from plain text files to ZODB.

    My point is not that you should always use relational data storage. In fact, that would be absurd! I don’t want to store my python source code in a relational database, I want plain text.

    Instead, my point is that if you need the ability to do post-hoc queries you need the “relational” part of relational data storage, and you shouldn’t use an ORM that hides that from you.

    If on the other hand you just want to persist some objects to disk can live with a slightly simplified data query syntax, ZODB is going to make your life a lot easier than ActiveRecord would.

    To stretch the original analogy a bit, if you’re building with concrete and steel, you may not need a hammer. But I definitely wouldn’t want to build a wooden tree house without one.

  3. 3John M. Camara

    If I was building with concrete and steel I would want my hammers. How would you put up the forms, chip concrete, show who’s the boss to piece of steel by beating it into submission with big old sledge hammer, etc. I can see using common hammers, framing, sledge, mason, jackhammers, hammer drills, chipping, etc. :)

    Any way, when you need ad-hoc queries use a relational database. If you need a relational database while programming with objects and you can live with the performance penalties you get with using an ORM go ahead and use an ORM. They will make your life easier.

    Note that most applications don’t need to worry about these performance issues so it is wiser to just use the ORMs so as to develop the code faster. Those that are concerned about the performance issues probable shouldn’t use a relation database to begin with. At least this would be my advice for typical web applications.

    When you use an ORM it is important to understand how the ORM works under the covers so that you don’t do dumb things with it. If you don’t take the time to understand how they work you can easily build applications that have terrible performance issues and give the ORM a bad name.

    Now what about the selection between a simple vs a full featured ORM. I agree with Mark that you shouldn’t use an ORM that tries to hide the database from you. I know it is very tempting for most developers to just use a simple ORM as it takes very little time to get up and running with it.

    But sooner or latter you will have performance issues or you will want to do something that the simple ORMs was not designed to handle. You will have to fight with it and will have to learn the inner workings of it to understand why you are running into these performance issues.

    If in the end, you’re going to put all that effort to learn how to use a simple ORM you would be better served by taking the time to learn SQLAlchemy. It will not require that much more effort to learn SQLAlchemy than a simple ORM. Although, you will have to put in a larger effort to get up and running with SA but the gains from the additional capabilities will more than make up for this extra effort.

    With SA you will not find it necessary to fight with it. At least no where near as often as you would with the simple ORMs. Plus when ever you need to do something that the ORM layer was not designed to handle you have the option to drop to a layer below the ORM which is not possible with the simple ORMs.

    What about Elixir (a declarative layer on top of SA)? My advice is not to use it. It will take less time to learn like the simple ORMs but sooner or latter you will have to learn more about SA and very likely the same amount of information as would be required had you just used straight SA’s ORM.

    So in the end you will learn more and the only gain will be that you can save yourself a little typing. To me it just doesn’t seam worth the extra effort. Of course if you do use Elixir you can still drop to the lower SA layers as necessary.

  4. Great post Mark! I love the decision to use SQLAlchemy as the default database layer within TurboGears 2.0. SQLAlchemy gives people every level of abstraction they could possibly want:

    * Users who want to do everything by hand can just use SQLAlchemy’s connection pool and configuration, and issue straight hand-written SQL against the connection. This offers a much cleaner and easier to work with option than the straight DB-API.

    * People who love SQL but want to be database independent, and want to avoid object-relational mapping can use SQLAlchemy’s table objects and SQL generation API.

    * There will of course be people who are interested in using the power of the SQLAlchemy ORM, who can leverage the great mapping capabilities of SQLAlchemy following the data mapper pattern for maximum flexibility.

    * People who want something a little bit simpler than SQLAlchemy’s ORM and who prefer the active record pattern can utilize Elixir with minimal switches flipped in their project, giving them a cleaner API, and they won’t sacrifice the great ability to drop down to SQLAlchemy’s table objects or custom mappers when needed.

    Its the best of all worlds, and will make TurboGears be able to not make the database layer just an implementation detail, but allow users to harness the raw power of their relational database of choice.

    Keep it up, Mark!

  5. Yes nice

  6. 6hernan43

    I’m definitely not a high end programmer by any means. But ActiveRecord does have support for writing custom SQL queries. It even has hooks to extend your models as such.

    But it is possible that I don’t get your meaning. I don’t really know the difference between what would be considered simple vs. full featured ORMs.

  7. Hernan43,

    I think the main thing I’m talking about is the ability to map objects to relations, not just tables. So in SQLALchemy (and many other ORM’s) you could for example, do a join across three tables, and map an object to that result set.

    In active record you can use the results of a custom SQ L query, but you can’t modify and save back the results.

    There are other features that I expect in a “full featured” ORM, like having an “identity map,” following the “unit of work” pattern, playing nice with database features like multicolumn primary keys, foreign key constraints, etc.

    I know I just threw out a couple of terms you may be unfamiliar with, if so, there’s a great book — Patterns of Enterprise Application Archetecture — by Martin Fowler, which explains a lot of the basic principles of database access. And it’s that book which coined the terms “identity map,” “unit of work,” and even “active record.” I highly recommend it.

  8. 8hernan43

    I think I see what you are talking about now. Thanks for the book reco I will check it out.

Comments are currently closed.