Welcome To The CPearson.com Excel Newsletter
Author: Chip Pearson, email: chip@cpearson.com
at Pearson Software Consulting, LLC
Release Date: 8-October-2007
Website: www.cpearson.com
Document Format: XHTML 1.1
Issue: #10
This Issue URL: www.cpearson.com/Newsletter/Content/2007_10_08.htm.
You are receiving this email because you subscribed to the CPearson.com Excel Newsletter. If you are
receiving this message in error or you wish to change or cancel your subscription, please go to the
CPearson Newsletter Management Page or email
Chip Pearson at chip@cpearson.com
Back issues of the CPearson Excel Newsletter are available in the Archives.
The Excel Data Functions
In this issue, we'll take a look at Excel's Data family of functions. These functions are used to extract information from data in a table on a worksheet. Collectively, these functions are often called the "D-Functions", a term we will use here to refer to the collection of functions. There are twelve D-Functions, as shown in the list below. They all work essentially the same way, so we will look at only the DSUM function in this article. Once you understand how DSUM works, you will be able to use any of the D-Functions. The D-Functions are:
You will likely find it useful to download the example workbook that contains the data and formulas used in this article.
D-Function Syntax
All of the D-Functions share the same or very similar syntax. This syntax is:
=DFUNCTION(DataTable, ColumnIdentifer, CriteriaRange)
where DFUNCTION is the name of the function, DataTable is the worksheet cell range containing the data whose values are to be examined, ColumnIdentifier is the title or number of the column in
DataTable that contains the values to be processed or summarized,
and CriteriaRange
is a range that contains the selection criteria that are used to select rows from the DataTable.
The data table and the criteria range are described below.
There are two halves to a D-Function: a range of data to be examined and a range that specifies the criteria to use when testing data values. If you are familiar with databases and SQL, you can think of the D-Function as a SELECT statement, the data table like the database for a SELECT statement and the criteria range as the WHERE clause of the SELECT statement. Unlike SQL, though, a D-Function can return only a single field (column) of a single row of the table. It cannot return multiple values. If you don't know SQL, don't worry, it isn't used with the D-Functions.
The Data Table
The data table is simply a rectangular range of data values, where each row of the data is one record (e.g., one sales transaction) and each column of the row is a data value (e.g., quantity sold in the transaction) of the record. The first row of the data range must be column titles that will match the titles in the criteria range, described later. For example, the following data table, in cells A1:E16 and given a defined name of Data will be used in the examples in this article. Note that the column titles are required in the first row of the data table.
Product Salesman Quantity Price Total TV1 Bob 1 10 10 TV2 Bob 2 20 40 Stereo1 Bob 1 30 30 Stereo1 Joe 1 30 30 Stereo1 Joe 2 30 60 Stereo2 Joe 33 40 1320 Stereo2 Mike 4 40 160 Stereo3 Mike 5 50 250 IPod1 Mike 1 60 60 IPod1 David 2 60 120 IPod1 David 3 60 180 IPod2 David 4 60 240 IPod2 Bob 5 70 350 IPod2 David 6 70 420 IPod2 Mike 7 70 490
The Criteria Range
The criteria range is a bit more complicated. It is used to control which values in the data table are processed by the D-Function. The first row of the criteria range should be the column titles in the first row of the data table. In the example here, that would be:
Product Salesman Quantity Price TotalThe values entered into the cells below the column titles are used to specify the selection criteria. This serves the same purpose of a the WHERE clause of a SELECT statement in an SQL query. Before moving forward with an example, we need one more piece to fill out the structure of a D-Function. That is to create a formula that will properly address the criteria range. This formula is the key to creating a well designed workbook that uses the D-Functions.
The criteria range must not contain empty rows. Empty rows will change the logic of the criteria selection, resulting in incorrect calculations. Moreover, as we will see later, some criteria ranges will have duplicate column titles to implement logical AND operations. If we were to refer to the criteria range by its cell address, every D-Function formula would need to be changed each time the criteria range is modified to reflect the addition or deletion of criteria elements. Therefore, it will be very helpful to create a formula that returns the address of the used portion of the criteria range. This way the criteria range will resize itself when criteria values are added or deleted.
First, you need to determine the maximum number of rows that your criteria range will need. Typically this will be no more than three or four (beyond that, the logic gets quite complicated), so for safety's sake, we'll define here the criteria range to be 11 rows: 19:29. The first row of this is the column titles, the same titles as are in the first row of the data table. The next 10 rows are the actual criteria expressions. Now, select the rows 20:29 (you must select the entire rows -- click on the first row number on the left of the screen, hold down the SHIFT key and click on the last row number), and type CriteriaData in the Name box (the box to the left of the formula bar, above the A column heading). This defines the name CriteriaData to include the entire rows of the actual criteria range, excluding the column titles. Next, on the row above the column titles of the CriteriaData range, define the name DynaCrit to cell D18. In that cell, paste the followng formula:
=ADDRESS(ROW(CriteriaData)-1,COLUMN(CriteriaData))&":"& ADDRESS(MAX(ROW(CriteriaData)*(CriteriaData<>"")), MAX(COLUMN(CriteriaData)*(CriteriaData<>"")))The formula is split here for readability, but should be all on a single line in Excel. NOTE: This is an array formula so you must press CTRL SHIFT ENTER rather than just ENTER when you first enter the formula and whenever you edit it later. If you do this properly, Excel will display the formula in the formula bar enclosed in curly braces { }. If you do not properly enter this formula as an array formula, you will get a #VALUE error result and the D-Functions will fail. (See my page on Array Formulas for a detailed discussion of array formulas.) This formula returns the address of the used portion of the criteria range and this address will be used in the D-Functions. Because this formula returns the address of only the used portion of the criteria range, we need not worry about blank lines in the criteria range -- they will be excluded by the formula. That's why we were able to safely use ten rows in our criteria range, even if we never use all ten rows.This formula and reference assume that you have nothing to the right of the criteria range. All the cells to the right of the criteria range must be empty. If they are not empty, the formula will return an incorrect result .
Product Salesman Quantity Price Total David
Operations In The Criteria Range
If you put a simple value in a criteria range cell, such as we did above with David, the function
treats that as a search for an exact match. You are not, however, restricted to exact matches. You can enter a comparison function in a
criteria cell. For example, examine the criteria range below:
Product Salesman Quantity Price Total < 100This will sum the Quantity elements of those rows in Data where the Price is less than 100. You can use < for less than, > for greater than, <= for less than or equal to, >= for greater than or equal to, and <> for not equal. To test for a between condition (e.g., Price between 50 and 100), you need to add a second Price column to the criteria range (not the data range) and put a greater than (or greater than or equal to) condition in one Price column and a less than (or less than or equal to) condition in the second Price column. For example,
Product Salesman Quantity Price Price Total >= 50 <= 100This will return values for rows whose Price columns is between 50 and 100 (inclusive). Because the comparison conditions are in the same row of the criteria range, they are combined to produce a logical AND operation. In other words, the logic is If Price >= 50 And Price <= 100.
You can use the cells in the CriteriaData range to perform logical operations with the DSUM (or any of the D-Functions). This is why we made the CriteriaData 10 rows tall. In general, if two criteria values are on the same line within the criteria range, then a logical AND operation is performed. For example, look at the criteria range:
Product Salesman Quantity Price Total IPod1 David
If values appear on different lines in the criteria range, then a logical OR is peformed. For example, to return the quantity of either IPods1 or IPods2 were sold (by any salesman), you would change the criteria range to the range below. Note that this OR operation should be taken in the context of If Product = IPod1 Or Product = IPod2. Even though you are returning the results of IPods1 And IPods2, the actual logic is an OR not an AND.
Product Salesman Quantity Price Total IPod1 IPod2
Product Salesman Quantity Price Total Stereo1 Joe Stereo2 IPod2 David
If you are testing text strings in your criteria range, you can use the ? and * wildcard characters to limit matching strings. For example,
Product Salesman Quantity Quantity Price Total TV*will match all Products that begin with the characters TV.
Earlier, I alluded to the fact that column titles may be duplicated in the titles row of the criteria range. This allows us to use a logical AND operation on single data category. Insert some cells next to the Quantity column in the criteria table, one column by ten rows, and use criteria values like:
Product Salesman Quantity Quantity Price Total >= 4 <= 20
By combining various constraints on the same rows and in different rows within the criteria range, you can create quite complex logical queries on the data.
Other D-Functions
All of the other D-Function listed at the beginning of this article work just like the DSUM function. The only difference is what operation is performed. For example, DCOUNTA counts, rather than sums, values from the data table. The syntax is the same or very similar for all D-Functions. The criteria range works identically for all the D-Functions.
It is worth noting that the cells in the criteria range may contain formulas in addition to the static values we used in the examples, so by combining the logic of formulas with the logical operation allowed by placing elements on the same or on different rows within the criteria range, you can create very powerful data queries.
D-Functions Versus Array Formulas
If you are familiar with array formulas (see www.cpearson.com/Excel/ArrayFormulas.aspx for a discussion of array formulas), you have probably noticed that the D-Functions provide the same functionality that can be achieved with array formulas. So, which should you use, D-Functions or array formulas? Broadly speaking, the D-Functions are faster than their array formula counterparts, so if you have a large and complicated workbook, you may see some significant peformance gains by rewriting the array formulas a D-Functions. Be aware that not all array formulas can be written as D-Function. Sometimes an array formula is the only solution. The other drawback to D-Functions is that the criteria must be in a range of cells. You cannot provide the criteria in the formula itself. The formula requires a reference to a range of cells. Finally, a D-Function can return only a single value to a single cell. Array formulas can be written to return many values to a range of cells.