Guide to ORMs, part 2

06 July 2006

In the first part we outlined the worlds of the database and the application that, although superficially similar, are really different universes bridged by what are known as ORMs (or object-relational mappers). We also showed one problem of the mismatch between the universes, that of the identity of an entity (that is, a row in the database or an object in the class model). We did not try and sort out the identity problem, just noted that it was an issue.

This time, let's explore the mismatch space between the database and the object-oriented application a little more.

Suppose we want to design a UI control application that will provide controls to be rendered onto a screen. The developer calls certain APIs in our application and we'll serve up a fully-configured button or combobox or textbox and so on. Later, the developer can call other APIs and store a configured component under a particular ID for later retrieval. We'll be storing these objects in a database.

Because we're great OO designers, we have a nice little class model that defines all this. So we have an ancestor Control class with properties like ID (we're assuming that this is the identity field for a control), Left, Top, Width, and Height, and various descendants like Button (which has a property called Caption), TextBox (which has properties like Prompt, Text, and so on), and ComboBox (with an Items property containing a list of the items in the combobox and an SelectedIndex property for the selected item's index in the list).

But now comes the $64,000 question: what should our database schema look like? Think about it for a while before reading on.

First thing to realize is that relational databases have no concept of inheritance. Zero, nada, zilch. This is a huge problem for our example: a Button is a subclass of Control, a TextBox is too, and so on. How are we going to be able to store our Buttons, TextBoxes and ComboBoxes? This is another big object-oriented relational mismatch: the problem of storing an inheritance object model.

So it looks like we're going to have to fake inheritance in our database. Somehow.

There are three accepted ways of doing this, as it happens. The first one is to have one table per class. So, in our example we'll have a table for the ancestor class, Control, that will have columns called ID, Left, Top, Width, and Height. Then we'll have a table for Buttons that will have a column for Caption. The problem is that we need to link a row in the Button table with its corresponding row in the Control table so that we can get the Left, Top, etc, properties. The best way is to have the ID column replicated in the Button table and then we can join the Control table to the Button table through this column.

Notice that this solution to the inheritance model problem means that our SQL queries are going to always be pretty horrendous. Every time we execute a query that returns Buttons, say, we will have to join the Control table to the Button table.

SELECT (c.ID, c.Left, c.Top, c.Width, c.Height, b.Caption)
FROM Button as b
  JOIN Control c ON c.ID = b.ID
WHERE b.ID = @ID

And the SQL for adding or updating or deleting a button? I leave that as an easy exercise for my SQL-savvy readers.

Here's an even worse scenario: suppose we wanted all controls that were at aligned on the left at 10 pixels? The query would have to join the Control table to every other table that stored concrete controls like comboboxes and textboxes.

So, although the premise to the solution is easy to understand (one class per table), the mechanics of using it tends to be fairly nasty looking.

The second solution to the inheritance problem is the one table per concrete class solution, where "concrete" means "can be instantiated to provide a useful object". This one actually fits our problem quite nicely as we'll see.

The first thing to accept is that the Control class is never instantiated on its own. Yes, we may pass around Control instances or have arrays of Control instances, but they're all actually instances of other classes like Buttons, Textboxes and so on. The Control class is an ancestor class whose only reason for being is to store "common" data for all our controls. So, form the viewpoint of the database, we merely merge the properties of the ancestor class into each of the concrete control classes.

So our Button table in the database will contain columns for ID, Left, Top, Width, Height, and Caption. The TextBox table will have columns for ID, Left, Top, Width, Height, Prompt, and Text, and so on.

This makes the retrieval and update of our control objects a little easier to do since the SQL becomes a lot simpler. Our OO code in the application (the ORM, if you like) is a little more intricate, but OO code tends to be easier to write and test than SQL.

We still have the problem though of a complex query to the "get all controls that are aligned left at 10 pixels" issue with lots of joins all over the place. Nevertheless, this solution tends to be the one that everyone uses.

The third and final solution to the inheritance problem is the one table per inheritance tree solution. Here we just have one table for the entire class model. We'll store all types of controls in it. The table will have not only ID, Left, Top, Width and Height columns but it will also have a Caption column (for the Buttons), Prompt and Text columns (for the TextBoxes), and a SelectedIndex column (for the combobox, and I'm deliberately skipping the Items property here).

There's another column that needs to be added as well since we need to be able to tell whether a particular row defines a Button, a TextBox or a ComboBox entity. We could try and be clever and use tests to see whether various columns are null or not to determine the type of the control, but generally we add a discriminant column of some type (say a character) that defines the type of the row (so a B in this column means the row is a Button instance, a T means Textbox, and so on).

Although retrievals and updates with this solution are pretty easy (barring the need to worry about the discriminant column), the main issue with this solution is the number of empty values we'll be storing. A button, for example, needs no Prompt, Text, or SelectedIndex values so those columns will be empty for every button. Nevertheless, this is still an accepted and widely-used solution (I remember in my RPG days on the System/38 constructing views over data that replicated this solution to make it easier to leverage the RPG record processing cycle).

Next time we'll start thinking about how XPO makes it easier for us to solve these issues.

1 comment(s)
M. Khayyat
Dear Mr. Bucknall,
First of all, i wanna thank you for the wonderful article you wrote about ORMs.
Its really nice to find your own problems discussed by experts like you on logical way. All I want for the time being is to show me how to implement the (Primary Key - Forign Key) database relation into a meaningfull object relation.

Thanks in advance.
M. Khayyat
1 August, 2006

Please login or register to post comments.