XPO, LINQ and Distinct

XPO Team Blog
28 March 2008
XPO, LINQ and Distinct

In a recent forum post, Dusan Pupis mentioned that our LINQ support didn't currently include the Distinct extension method. I looked into this and I found a pretty complex situation that requires some explanation.

First, in XPO 8.1.1, we do in fact support Distinct(), i.e. the overload that doesn't take any further parameters, in some situations. For my test application, I'm using the following code based on three standard persistent classes:

XpoDefault.DataLayer = new SimpleDataLayer(
new DataStoreLogger(
new InMemoryDataStore(new DataSet( ), AutoCreateOption.SchemaOnly),
Console.Out));

using (UnitOfWork uow = new UnitOfWork( )) {
Artist eltonJohn = new Artist(uow) { Name = "Elton John" };
eltonJohn.Albums.Add(new Album(uow) { Name = "Tumbleweed Connection", Year = 1970 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Don't Shoot Me I'm Only the Piano Player", Year = 1973 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Too Low for Zero", Year = 1983 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Captain Fantastic and the Brown Dirt Cowboy", Year = 1975 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Caribou", Year = 1974 });

Artist rollingStones = new Artist(uow) { Name = "The Rolling Stones" };
rollingStones.Albums.Add(new Album(uow) { Name = "Aftermath", Year = 1966 });
rollingStones.Albums.Add(new Album(uow) { Name = "Their Satanic Majesties' Request", Year = 1967 });
rollingStones.Albums.Add(new Album(uow) { Name = "Sticky Fingers", Year = 1971 });
rollingStones.Albums.Add(new Album(uow) { Name = "It's Only Rock 'n' Roll", Year = 1974 });
rollingStones.Albums.Add(new Album(uow) { Name = "Exile on Main Street", Year = 1972 });
rollingStones.Albums.Add(new Album(uow) { Name = "Sucking in the Seventies", Year = 1981 });

Artist procolHarum = new Artist(uow) { Name = "Procol Harum" };
procolHarum.Albums.Add(new Album(uow) { Name = "Procol Harum", Year = 1967 });

Artist jethroTull = new Artist(uow) { Name = "Jethro Tull" };
jethroTull.Albums.Add(new Album(uow) { Name = "Thick as a Brick", Year = 1972 });
jethroTull.Albums.Add(new Album(uow) { Name = "Minstrel in the Gallery", Year = 1975 });

new TestData(uow) { Val1 = "One", Val2 = "Two" }.Save( );
new TestData(uow) { Val1 = "Three", Val2 = "Four" }.Save( );
new TestData(uow) { Val1 = "Four", Val2 = "Two" }.Save( );
new TestData(uow) { Val1 = "One", Val2 = "Two" }.Save( );

uow.CommitChanges( );
}

With this initialization, I can do a few things with Distinct. For example, I can find a distinct Album year like this:

using (UnitOfWork uow = new UnitOfWork( )) {
var albumYears =
from album in new XPQuery(uow)
select album.Year;

ObjectDumper.Write(albumYears.Distinct( ));
}

Since I'm using the logger in my data layer, I can see that the query is this:

SelectData request with 1 queries:select('N0'.'Year') from('Album'.'N0') 
where(IsNull('N0'.'GCRecord')) order() group('N0'.'Year') having() params() ;

Along the same lines, I can also query more than one value, this time using the TestData objects:

var testdata =
from test in new XPQuery(uow)
select new { Val1 = test.Val1, Val2 = test.Val2 };

ObjectDumper.Write(testdata.Distinct());

This creates the following query:

SelectData request with 1 queries:select('N0'.'Val1','N0'.'Val2') from('TestData'.'N0') 
where(IsNull('N0'.'GCRecord')) order() group('N0'.'Val1','N0'.'Val2') having() params() ;

As you can see, in both cases the XPQuery<T> has translated the query code into our query language, which in turn gets executed on the server. This is really the whole point of LINQ to XPO - bridging from LINQ syntax and extensions into the XPO world, where our database independent querying system can then execute the query efficiently on the server.

Now here's something that doesn't work:

var result =
from artist in query
select artist;

ObjectDumper.Write(result.Distinct());

Why not? Because the system can't guess what it is that makes one Artist different from another. Of course there's the primary key, but I don't need Distinct for that one. All other distinctions are much more complex to describe, so Microsoft introduced another overload of the Distinct method:

public static IQueryable Distinct(
this IQueryable source, IEqualityComparer comparer);

This one takes a parameter for an IEqualityComparer implementation, and of course that interface allows me to specify precisely when a certain object is to be regarded equal to another object. However, this overload is not support by XPO, and will likely never be supported.

There are two reasons for that. First, Microsoft decided to have this method take an interface as a parameter, not a lambda expression, like most of the other LINQ extension methods do. Of course we could create an overload ourselves that takes a lambda expression, but that brings us to reason number two: the code that determines equality is likely to be so complex that translating it into something that would run on the server would be impossible in the majority of cases. I don't know if reason two is perhaps also the source of reason one, but it seems likely.

There's one easy way to use this overload of Distinct with XPO: simply make sure to use Enumerable.Distinct, pass in your IEqualityComparer and suddenly everything works. How does it work? Very simple: the data is retrieved from the server and Distinct runs over the client-side results, where the comparer code can be executed. We could have made this happen by default in the cases where translation isn't possible, but we decided it would be safer not to do that. So it's left to you to be explicit about this if your intention is to get Distinct executed client-side. Here's how that works (for example - there are many different ways of making IEqualityComparer work for you):

// This is bad!!! Just an example. This results in Elton John
// and Procol Harum being equals.
class Comparer : EqualityComparer {
public override bool Equals(Artist x, Artist y) {
return GetHashCode(x) == GetHashCode(y);
}

public override int GetHashCode(Artist obj) {
if (obj.Name == "Elton John" || obj.Name == "Procol Harum")
return 734;
return obj.GetHashCode( );
}
}

...

var result =
from artist in new XPQuery(uow)
select artist;

ObjectDumper.Write(((IEnumerable) result).Distinct(new Comparer( )));

The cast to IEnumerable<Artist> makes sure that the Distinct extension method is taken from the Enumerable class (because it's declared as an extension on IEnumerable<T>). Of course there are other ways - you could call ToList() on the result before calling Distinct, or call Enumerable.Distinct without using the extension method syntax.

Free DevExpress Products - Get Your Copy Today

The following free DevExpress product offers remain available. Should you have any questions about the free offers below, please submit a ticket via the DevExpress Support Center at your convenience. We'll be happy to follow-up.
Tags
No Comments

Please login or register to post comments.