LINQ, or Language INtegrated Query, is an all purpose query language that is added right in your C# or VB code. You can use LINQ to query collections, SQL DataSets, XML, entities, and LINQ is extensible into other technologies like SharePoint and Active Directory. LINQ is a general purpose query technology without a lot of power and some subtle features. One such feature is the where clause. The where clause is used to filter data a lot like where clauses are used in SQL.
Where clauses in LINQ can have a single predicate, multiple predicates where each expression is combined with the Boolean And or Boolean Or operator. LINQ also supports multiple where clauses. You can use multiple where clauses in a single LINQ query to break up filtering in pieces. The utility in using multiple where clauses is that you can short circuit query processing, especially processor or IO expensive sub-operations, by adding additional where clauses that will stop processing part of the way through when a filter condition—a where predicate—fails.
File IO can be system intensive. Suppose you were to use a LINQ query to process information in the file system—reading the files that match a specific search pattern like *.txt and that counts the words in those files. Searching the file system, reading the contents of the matching files, and counting the words each represents a relatively expensive IO operation. You could use a where clause with Boolean operators to perform the filtering in a single where statement, but by splitting the work up with light weight checks early you can reduce the total amount of query processing. (An additional technique that works well here, too, is to use the let clause. Let can be used to assign temporary values that can be stored and processed once in each iteration, but the data can be used multiple times.)
In Listing 1, the file system is queried for text files. The firs where clause short circuits on empty files. For files with data all of the text is assigned to the temporary range variable content. The variable content is used to obtain the word count per file, and the second where clause us used to filter files that contain more than ten words. Finally, the project—the output data—contains the filename and the word count of each file that passes both tests.
Listing 1: A LINQ query with multiple where clauses short circuits on empty files—represented by the first where clause.
Imports System.IO
Imports System.Text
Imports Microsoft.VisualBasic.FileIO
Module Module1
Sub Main()
Dim wordCount = From filename In Directory.GetFiles("C:\temp", "*.txt", System.IO.SearchOption.AllDirectories) _
Where FileSystem.GetFileInfo(filename).Length > 0 _
Let content = File.ReadAllText(filename) _
Let words = content.Split(",", ".", ";", ":", "!", ".", " ", "/", "?") _
Where words.Length > 10 _
Select New With {.File = filename, .WordCount = words.Length}
For Each item In wordCount
Console.WriteLine(item)
Next
Console.ReadLine()
End Sub
End Module