in
Forums
Blogs
DevExpress.com
Client Center
Support Center
DevExpress Channel

Paul Kimmel's Blog

Finding the Median Value with MS Access SQL

Developer Express provides tons of sample content for our customers. The sample content includes CSS, source code, databases, XML, graphics, and much more. I was writing a sample for a custom aggregate value for Developer Express’ upcoming book Professional DevExpress ASP.NET Controls to be published by Wrox and realized I didn’t know of an easy way to calculate the median value of a result set using Access SQL. (Calculating the median value with Access SQL is not what the book is about. :) The Access SQL is actually mentioned in the context of demonstrating how the ASPxPivotGrid can provide custom summaries, like calculating the median value. In effect letting users get answers to questions through the GUI.)

Three of the tables the CarsDB.mdb database includes are Cars, Orders, and Customers. Cars and Orders are connected by Cars.ID = Orders.ProductID. The Orders and Customers tables are connected by Orders.CustomerID = Customers.ID. The figure shows the three tables used for the query.

image 
Figure 1: The three tables—Cars, Orders, Customers—used for the query to find the median value.

The basic query that defines the columns to return, including the PaymentAmount, and the relationships between those tables is provided in Listing 1. It is the median PaymentAmount that I am calculating and to reduce the result set a WHERE clause is added to refine the query to Alessandro and Associates. The resultset can  be refined by adding WHERE Customers.Company LIKE ‘Ales*’. By ordering the resultset by PaymentAmount and grabbing the top 50 percent of items the median value will be included in the result data set (see Listing 2).

Listing 1: This query returns all of the rows from Cars, Orders, and Customers and the columns specified in the SELECT clause.

SELECT Cars.Model, Cars.Price, Orders.PurchaseDate, Orders.PaymentType, Orders.PaymentAmount, Orders.Quantity, Customers.Company, Customers.City
FROM (Orders INNER JOIN Customers ON Orders.CustomerID = Customers.ID) INNER JOIN Cars ON Orders.ProductID = Cars.ID

Listing 2: This query returns the ordered top half of the result set.

SELECT TOP 50 PERCENT Cars.Model, Cars.Price, Orders.PurchaseDate, Orders.PaymentType, Orders.PaymentAmount, Orders.Quantity, Customers.Company, Customers.City FROM ((Orders INNER JOIN Customers ON Orders.CustomerID = Customers.ID) INNER JOIN Cars ON Orders.ProductID = Cars.ID) WHERE (Customers.Company LIKE 'Ales*') ORDER BY Orders.PaymentAmount

The above query returns the top half of the data set. The median item will roughly be the last item in the result set. Next, if you reverse the order of the result set then the top most value will be the median value. (You could sort the results in the query in Listing 2 but then you would get the bottom 50 percent of the items in the complete data set.) The query in Listing 3 contains the result set with the median value as the first row.

Listing 3: By selecting the result set again but reversing the order you get the median value as the topmost value of the result set.

select * from
(
SELECT TOP 50 PERCENT Cars.Model, Cars.Price, Orders.PurchaseDate, Orders.PaymentType, Orders.PaymentAmount, Orders.Quantity, Customers.Company, Customers.City FROM ((Orders INNER JOIN Customers ON Orders.CustomerID = Customers.ID) INNER JOIN Cars ON Orders.ProductID = Cars.ID) WHERE (Customers.Company LIKE 'Ales*') ORDER BY Orders.PaymentAmount
)
order by PaymentAmount desc

Finally, query the results in Listing 3 and request the TOP 1-th value. The final query in Listing 4 will return the median value (or pretty closely if the result set has an even number of items). I am not saying that three nested SELECT statements will be an efficient query to run often or against a lot of data. It will return the median (or middle) value in a result set. I tried some shorter versions and variations on the final query. For example, the TOP 1 modifier was added to the SELECT * FROM—the middle SELECT—but placing it there actually returned three rows containing similar values; I just wanted one and the query seems to work.

Listing 4: The final query—its not efficient or pretty but it seems to work.

SELECT TOP 1 * FROM
(SELECT * FROM
(SELECT TOP 50 PERCENT Cars.Model, Cars.Price, Orders.PurchaseDate, Orders.PaymentType, Orders.PaymentAmount, Orders.Quantity, Customers.Company, Customers.City FROM ((Orders INNER JOIN Customers ON Orders.CustomerID = Customers.ID) INNER JOIN Cars ON Orders.ProductID = Cars.ID) WHERE (Customers.Company LIKE 'Ales*') ORDER BY Orders.PaymentAmount) ORDER BY PaymentAmount DESC)

I am not saying that this is the kind of query you routinely should or want to write. In this instance it may not even be a great query. If you are writing an Access query to calculate the median value and you have no solution then try this one. This query is actually used as a negative counter pointer in the upcoming Professional DevExpress ASP.NET Controls book. The point mentioned in the book is that if your users have a requirement whether spoken or unspoken then they may cobble an ad hoc solution together. The section this query appears in covers custom summaries in the ASPxPivotGrid. The ASPxPivotGrid has several built in summary types and supports a custom summary event handler, letting you implement your own custom events.

Published Aug 03 2009, 07:49 PM by Paul Kimmel (Developer Express)

Comments

 

Twitted by jrguay said:

Pingback from  Twitted by jrguay

August 3, 2009 3:29 PM

Leave a Comment

(required)  
(optional)
(required)  
Verification code: Required
   
Add
Copyright © 1998-2010 Developer Express Inc.
ALL RIGHTS RESERVED