XPO, LINQ and Distinct

XPO Team Blog
28 March 2008
XPO, LINQ and Distinct

In a recent forum post, Dusan Pupis mentioned that our LINQ support didn't currently include the Distinct extension method. I looked into this and I found a pretty complex situation that requires some explanation.

First, in XPO 8.1.1, we do in fact support Distinct(), i.e. the overload that doesn't take any further parameters, in some situations. For my test application, I'm using the following code based on three standard persistent classes:

XpoDefault.DataLayer = new SimpleDataLayer(
new DataStoreLogger(
new InMemoryDataStore(new DataSet( ), AutoCreateOption.SchemaOnly),
Console.Out));

using (UnitOfWork uow = new UnitOfWork( )) {
Artist eltonJohn = new Artist(uow) { Name = "Elton John" };
eltonJohn.Albums.Add(new Album(uow) { Name = "Tumbleweed Connection", Year = 1970 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Don't Shoot Me I'm Only the Piano Player", Year = 1973 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Too Low for Zero", Year = 1983 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Captain Fantastic and the Brown Dirt Cowboy", Year = 1975 });
eltonJohn.Albums.Add(new Album(uow) { Name = "Caribou", Year = 1974 });

Artist rollingStones = new Artist(uow) { Name = "The Rolling Stones" };
rollingStones.Albums.Add(new Album(uow) { Name = "Aftermath", Year = 1966 });
rollingStones.Albums.Add(new Album(uow) { Name = "Their Satanic Majesties' Request", Year = 1967 });
rollingStones.Albums.Add(new Album(uow) { Name = "Sticky Fingers", Year = 1971 });
rollingStones.Albums.Add(new Album(uow) { Name = "It's Only Rock 'n' Roll", Year = 1974 });
rollingStones.Albums.Add(new Album(uow) { Name = "Exile on Main Street", Year = 1972 });
rollingStones.Albums.Add(new Album(uow) { Name = "Sucking in the Seventies", Year = 1981 });

Artist procolHarum = new Artist(uow) { Name = "Procol Harum" };
procolHarum.Albums.Add(new Album(uow) { Name = "Procol Harum", Year = 1967 });

Artist jethroTull = new Artist(uow) { Name = "Jethro Tull" };
jethroTull.Albums.Add(new Album(uow) { Name = "Thick as a Brick", Year = 1972 });
jethroTull.Albums.Add(new Album(uow) { Name = "Minstrel in the Gallery", Year = 1975 });

new TestData(uow) { Val1 = "One", Val2 = "Two" }.Save( );
new TestData(uow) { Val1 = "Three", Val2 = "Four" }.Save( );
new TestData(uow) { Val1 = "Four", Val2 = "Two" }.Save( );
new TestData(uow) { Val1 = "One", Val2 = "Two" }.Save( );

uow.CommitChanges( );
}

With this initialization, I can do a few things with Distinct. For example, I can find a distinct Album year like this:

using (UnitOfWork uow = new UnitOfWork( )) {
var albumYears =
from album in new XPQuery(uow)
select album.Year;

ObjectDumper.Write(albumYears.Distinct( ));
}

Since I'm using the logger in my data layer, I can see that the query is this:

SelectData request with 1 queries:select('N0'.'Year') from('Album'.'N0') 
where(IsNull('N0'.'GCRecord')) order() group('N0'.'Year') having() params() ;

Along the same lines, I can also query more than one value, this time using the TestData objects:

var testdata =
from test in new XPQuery(uow)
select new { Val1 = test.Val1, Val2 = test.Val2 };

ObjectDumper.Write(testdata.Distinct());

This creates the following query:

SelectData request with 1 queries:select('N0'.'Val1','N0'.'Val2') from('TestData'.'N0') 
where(IsNull('N0'.'GCRecord')) order() group('N0'.'Val1','N0'.'Val2') having() params() ;

As you can see, in both cases the XPQuery<T> has translated the query code into our query language, which in turn gets executed on the server. This is really the whole point of LINQ to XPO - bridging from LINQ syntax and extensions into the XPO world, where our database independent querying system can then execute the query efficiently on the server.

Now here's something that doesn't work:

var result =
from artist in query
select artist;

ObjectDumper.Write(result.Distinct());

Why not? Because the system can't guess what it is that makes one Artist different from another. Of course there's the primary key, but I don't need Distinct for that one. All other distinctions are much more complex to describe, so Microsoft introduced another overload of the Distinct method:

public static IQueryable Distinct(
this IQueryable source, IEqualityComparer comparer);

This one takes a parameter for an IEqualityComparer implementation, and of course that interface allows me to specify precisely when a certain object is to be regarded equal to another object. However, this overload is not support by XPO, and will likely never be supported.

There are two reasons for that. First, Microsoft decided to have this method take an interface as a parameter, not a lambda expression, like most of the other LINQ extension methods do. Of course we could create an overload ourselves that takes a lambda expression, but that brings us to reason number two: the code that determines equality is likely to be so complex that translating it into something that would run on the server would be impossible in the majority of cases. I don't know if reason two is perhaps also the source of reason one, but it seems likely.

There's one easy way to use this overload of Distinct with XPO: simply make sure to use Enumerable.Distinct, pass in your IEqualityComparer and suddenly everything works. How does it work? Very simple: the data is retrieved from the server and Distinct runs over the client-side results, where the comparer code can be executed. We could have made this happen by default in the cases where translation isn't possible, but we decided it would be safer not to do that. So it's left to you to be explicit about this if your intention is to get Distinct executed client-side. Here's how that works (for example - there are many different ways of making IEqualityComparer work for you):

// This is bad!!! Just an example. This results in Elton John
// and Procol Harum being equals.
class Comparer : EqualityComparer {
public override bool Equals(Artist x, Artist y) {
return GetHashCode(x) == GetHashCode(y);
}

public override int GetHashCode(Artist obj) {
if (obj.Name == "Elton John" || obj.Name == "Procol Harum")
return 734;
return obj.GetHashCode( );
}
}

...

var result =
from artist in new XPQuery(uow)
select artist;

ObjectDumper.Write(((IEnumerable) result).Distinct(new Comparer( )));

The cast to IEnumerable<Artist> makes sure that the Distinct extension method is taken from the Enumerable class (because it's declared as an extension on IEnumerable<T>). Of course there are other ways - you could call ToList() on the result before calling Distinct, or call Enumerable.Distinct without using the extension method syntax.

Tags
7 comment(s)
Alex Hoffman
Alex Hoffman

Using the Distinct extension method that takes a parameter to compare complex types is interesting.

But like always for me, the example "side-effects" are more interesting:- i.e. your use of the DataStoreLogger to pipe output to the console.

28 March, 2008
Rinat Abdullin
Rinat Abdullin

Oliver,

Could, please, someone come up with the list of LINQ features that are not supported by XPO-Linq?

It was quite a surprise for me to discover that nested where clauses throw NotImplementedException() (and this applies to any decent group or join operation or any filtering of nested collection).

Best regards,

Rinat Abdullin

29 March, 2008
Jascha
Jascha

Ditto Rinat's request. It would be preferable to discovering them by trial and error and, as in the above article, not knowing the exact circumstances where things will/won't work.

Regards,

Jascha

11 April, 2008
Jascha
Jascha

Ping

16 April, 2008
Oliver Sturm (DevExpress)
Oliver Sturm (DevExpress)

My apologies, guys - the comment notification wasn't working, so I missed these last entries.

I have asked the XPO team to start creating a list of (non|badly) supported LINQ features. I'm going to get back to you as soon as I get it - I am told though that it's not particularly easy to isolate the actual use cases, so I can't be sure yet how successful thing is going to be.

16 April, 2008
Frans Bouma
Frans Bouma

Isn't 'distinct' just a specification on the projection of the XPO query? After all, it's just a flag, so if your XPO supports distinct, why not just emit that flag for XPO and let it emit distinct as it would otherwise?

For queries which might bug, you may borrow the queries on my blog posts about writing a linq provider ;). A lot of them, like the Contains series, are pretty nasty. There are many more thinkable which could cause problems, like multi-groupbys, multiple aggregates on the same groupby, queries in join sides etc.

17 April, 2008
Anonymous
Florian

Hi all,

I would still love to see that "list of (non|badly) supported LINQ features" Oliver announced in April 2008 (about 1 year ago).

As many other users I feel quite unhappy with the current implementation of XPQuery and the information politics on its working or not working features.

Some of features I would really expect to be working cause exceptions (i.e. simple joins) with no chance for us to notice in design time.

So, Oliver, could you please let us know about the announced document?

Regards,

Florian

16 March, 2009

Please login or register to post comments.