Paging is good. But throughout the stack, please.

22 May 2008

Sometimes I'm amazed about people's perception of how a web application works, especially when used with advanced controls like our ASPxGridView.

Let's, if you will, imagine a standard web application with a grid that displays data from some table or view in the database. By default, it's a multi-tier application. There's not a lot you can do about it; it's par for the course. Automatically, you have to think in terms of the individual tiers when you worry about loading, about memory usage, about performance. You cannot think of it as a whole, a gestalt, right up front; each tier matters in its own right.

At the tier everyone notices, because, well, it's the visual part of the whole thing, there's the display of the grid in the browser. There's some Javascript around so you can use the mouse and keyboard to do things like move/resize columns, change the focused row, and so on.

Most obviously, there are a couple of things to say about this presentation layer (oops, some architecture terminology crept in there). You want to minimize the amount of Javascript you download because (1) it takes a little extra time to download, and (2) it takes a little extra time when loading the page on the client for the Javascript to be parsed and executed. But most of all you want to minimize the amount of data you download to the browser; essentially all you want to download is enough to display the page in the grid you're looking at. Boiled down to its basics, you just want to download the HTML that displays the table that defines the grid.

Back in the old days (cue violin music) grids used to download the entire dataset to the browser, together with enough Javascript to float a battleship. These days, vendors are more lean and mean with their controls, recognizing that downloading such volumes of data is counter-productive, especially when the datasets are huge, and AJAX is around to do work in the background downloading data without going through a full page cycle on the server or page refreshes on the client. The pendulum has swung the other way.

On to the next tier. Well, this is where so many people go wrong and assume there is only one other tier. Sorry, but there isn't. There are at least two other tiers, not one, otherwise we would be justified in calling it a client/server application.

The first tier is the web server (the bit that executes the ASP.NET application) and the second the database server (the bit that dishes out the data). They are not the same, and you should not consider them the same. In general, they will run on completely different machines, if not for performance reasons then for security reasons.

Oh certainly, grid vendors -- even us! -- demonstrate their products at shows like TechEd with the web server and database server on the same machine. It's just easier that way. (For that matter, the browser is on the same machine too.) But that is not the primary scenario for a web application in production and so you should take these demos with a grain of salt performance-wise. Remember our slogan: mumble, COMPARE, mumble? The comparisons you're making should involve testing in a similar environment to your production one.

So let's take these tiers separately. The web server is next in the chain. Here we assume that the server is servicing many clients simultaneously, say 100 to use a round number. Because of the way web servers work, each time a client makes a request the server executes another instance of the web application, sets it up by creating the Page and all its controls, processes the request, renders and sends the reply, tears it down. Sure, the instance may come from the application pool, but in essence this is what's happening. As you can imagine, with 100 active clients the web server is going to be doing a lot of work.

And note that using AJAX doesn't change the equation too much here. Even with a callback the application is instanced on the server, the Page is created with all its controls, and execution is passed to the control which has to service the callback. Postback processing is avoided, of course. In essence, AJAX gives you a better experience at the browser and smaller HTML payloads, but as regards the server workload it's all pretty much of a muchness.

The general principles we had with the client -- minimizing data downloads (this time downloading from the database server), minimizing memory usage -- are just as important with the web server. It is pointless and counter-productive and a sheer waste of memory to download an entire dataset from the database server, just so you can perform grouping. Or just so you can filter the data. Or just so you can calculate a summary. Remember: if the control does this kind of processing, it's going to happen with up to 100 separate instances of the application at the same time. Memory usage on the web server is going to go through the roof. Swapping pages to/from the web server disk is going to be excessive. I know that these days 64-bit web servers can have gigabytes of memory, but in the end there is still only so much RAM you can plug in and pretty quickly your bottleneck will be your CPUs instead.

So, we have to write our ASP.NET applications to be frugal with memory too, the more simultaneous clients we envisage having, the more frugal we should strive to be. Controls we use shouldn't download entire datasets as part of their internal processing, they should page data as and when they need it. Sure, paging when the grid is only displaying records in a list is simple to implement and very effective, but the same should apply when the grid is doing more advanced work, presenting broader analytic functionality for the end-user.

Note that this principle has nothing to do with LINQ or any other data retrieval process. It's all about how much data is downloaded from the database server. The web application (and the controls it's using) must page its data. After all, instances of the application are being created and destroyed all the time. It doesn't make sense to have a memory-hungry, data-guzzling application since all this work setting it all up takes too much time and bashes performance.

The final tier I'm going to talk about is the database server. Get this: it has one client and one client only, the web server (lots of connections, maybe, but all asking for the same data). It can allocate as much memory as it wants to cache data for that one client. And guess what? As far as the database server can see, this client just keeps on asking for chunks of data from the same tables and views, bam, bam, bam. This client seems to suffer from extreme short-term memory loss. So the database server caches the heck out of the data in order to fling it back to the forgetful client as quickly as possible.

And what's more, the database engine has been thoroughly optimized to process analytic type queries far better than you or I could code up on the web server. Grouping? Should be done on the database server. Filtering? Ditto. Summaries? Most assuredly. Furthermore, you can tune the database server so that queries that run most frequently are optimized. It doesn't make sense from a processing viewpoint or a performance viewpoint or a networking viewpoint to issue a SELECT * query to the database server, load the entire result set into a List<T>, and then use LINQ to filter that list. "Hey, at least we can tell our customers we're using LINQ!"

This layman's analysis of the multi-tier aspect of web applications reinforces and validates the work we did with the ASPxGridView, work that no other vendor has done. We deliberately wrote a data controller that runs in conjunction with the grid code on the web server to page data from the database server. This data controller works with standard "tabular list of records" type views, it works with sorting and grouping those records, it works with filtering the records, and it can calculate the correct summaries both overall and per group. All without reading an entire dataset from the database server into the web server's memory.

If you are writing a web application you owe it to yourself to understand how the controls you're using process the data they display. Ask your vendor: when the grid is grouping records, is this processing done on the database server or on the web server? (If your grid vendor is DevExpress, the answer is: on the database server of course. Duh.) Turn on tracing in SQL Server and then test your application. Ignore the marketing-speak from the vendor and watch what the controls are really doing as you use the application. Then, decide.

4 comment(s)
Michal Hensen

Amen....

22 May, 2008
@dan90266

Great post.  Thanks for the insight.

22 May, 2008
Seth

Good comments! The only thing missing, however, is good documentation for the IListServer interface which allows the kind of interaction you describe. It is imperative that this is documented properly as not all users of your product will use data sources which implement that interface. If that is your requirement in order to use the advance features, that is just fine. Let us bridge the gap by implementing what is required.

23 May, 2008
Timothy McCurdy

I agree with Seth...

7 October, 2008

Please login or register to post comments.