Guide to ORMs, part 1

ctodx
30 June 2006

Yesterday evening I went along to the monthly meeting organized by what might grandly be called the Colorado Springs Delphi User Group. I say "grandly" because in reality it's all the people who were left over when meetup.com decided to charge for their services and everyone agreed that it wasn't worth it and we could just email each other to confirm the meetings.

The sad thing is that it seems no one there is using Delphi for substantial new work any more. Several still say that they're using it for experimental type stuff and the majority have existing codebases in Delphi that still need maintaining, but usually new work is now being done in C#. Now certainly it may be that my sample size is way too small to draw any conclusions from this, but I think DevCo (the codename of the future company that will soon be responsible for Delphi) is going to have some work on its hands to stem the tide of defection brought about by Borland's seemingly lackadaisical marketing and promotion of Delphi.

Anyway, this post is not about that (I'm collating information for a later larger post), but about ORMs. One of the attendees reported that he was using XPO (eXpress Persistent Objects) in his latest application and was enjoying the experience, never having used an ORM before. I was thinking about it on the way home and given that I tossed in a heavy duty post yesterday on the topic I think I'll do a series of posts about ORMs and XPO and discuss where we're heading with this library.

Why ORMs? What are they for? What are their benefits, their limitations?

Well, first you've got to look at the different worlds of the database and of the application, especially of the features and of the languages we use in both domains.

The relational databases we use (SQL Server, Oracle, Access, and the like) store data as rows in tables. (Note to the relational theory purists: I'm not going to use the standard terms like tuples, attributes, and the like in this article, but instead I'm going to stick to the common names.) Each row contains a bunch of values in columns, with the assumption that these values all relate to the entity described by the row. So, in a Customers table, for example, each row is a single customer, with column values that describe that particular customer (name, address, contact, what have you). The columns of course have the same type of data (and the same meaning of that data) for each row in the table. All this seems pretty easy enough.

Now, the thing is that we assume something more about each row: that each row is unique. Indeed some database engines enforce that sense of uniqueness by insisting on each table having a primary key, where each row has a unique value that identifies and isolates it from all the others. In general (although it doesn't have to be done that way), the primary key is identified by a single extra column (sometimes referred to as the identity column) that's tacked on just for the primary key. So we have things like Customer ID, Order ID, and the like. These tend to be integer values, although sometimes people use GUIDs instead.

So, each row in a table defines an entity (customer, order, order line, etc) in our system, and it's uniquely identified by a key value.

We use a language called SQL to access, maintain, and retrieve data from our database. SQL is a language that was designed to make it easy to identify rows in a table (or tables) and return them as a complete set to the caller. The caller can also specify that only certain columns are returned rather than the entire row or rows. So, viewed from a height of 50,000 feet, SQL is a language than returns sets of rows of data, or for inserting, changing or deleting data in tables in the database. SQL is not a language for displaying UI or for doing financial calculations or for creating reports or anything like that.

That kind of work is best suited to a high level application language. These days, I'll qualify that even further, by saying that it's an object-oriented language, like C#, VB, Delphi, Java and the like. These languages use class models and instances of those classes (usually called objects) to do the work. So we write classes like Customer and Order and give them properties like name and address and order date and so on. We can also create lists of these kinds of objects as other classes, like CustomerList and OrderList.

Hmm, superficially at least, it looks like we have a one-to-one mapping between objects in our object model and rows in our database. A row in the Customers table has a column called Name, and our Customer class has a property called Name too. It looks as if we can just use an object for a row that we've read from the database. Where's the problem?

The first problem is this. In a database, a row is uniquely different from any other row in the same table. The primary key enforces this. We can never create two rows that have the same identity; in other words, no two rows can have the same key.

What does this translate to in the object world? Well, big problems that's what. You see, in our application programs one object is different from another if they have different pointer addresses on the heap. So I can create two Customer objects, both with Customer ID of 1 and they will be different objects.

(OK, I'm skating on thin ice here. Talking C# now: if I compare two objects -- that is, typed as System.Object -- with the equality operator then I'm going to be comparing their addresses on the heap. If I use X.Equals(Y) instead, a virtual method gets called that presumably checks the internal state of both objects to see if they're equal. But only if I took the time to write code that does the checking of course.)

So we've already run into the first problem in what's known as the object-relational impedance mismatch: the problem of identity.

We'll look at some others next time.

no comments
No Comments

Please login or register to post comments.