CPearson.com Excel Newsletter

Welcome To The CPearson.com Excel Newsletter

Author: Chip Pearson, email: chip@cpearson.com at Pearson Software Consulting, LLC
Release Date: 8-October-2007
Website: www.cpearson.com
Document Format: XHTML 1.1
Issue: #10
This Issue URL: www.cpearson.com/Newsletter/Content/2007_10_08.htm.

You are receiving this email because you subscribed to the CPearson.com Excel Newsletter. If you are receiving this message in error or you wish to change or cancel your subscription, please go to the CPearson Newsletter Management Page or email Chip Pearson at chip@cpearson.com

Back issues of the CPearson Excel Newsletter are available in the Archives.

The Excel Data Functions

In this issue, we'll take a look at Excel's Data family of functions. These functions are used to extract information from data in a table on a worksheet. Collectively, these functions are often called the "D-Functions", a term we will use here to refer to the collection of functions. There are twelve D-Functions, as shown in the list below. They all work essentially the same way, so we will look at only the DSUM function in this article. Once you understand how DSUM works, you will be able to use any of the D-Functions. The D-Functions are:

DAVERAGE returns the Average of the data specified by the criteria range
DCOUNT returns the Count (numeric values only) of the data specified by the criteria range
DCOUNTA returns the CountA (all non-empty) of the data specified by the criteria range
DGET returns a single element of the data specified by the criteria range
DMAX returns the Maximum of the data specified by the criteria range
DMIN returns the Minimum of the data specified by the criteria range
DPRODUCT returns the Product (multiplication) of the data specified by the criteria range
DSTDEV returns the Standard Deviation of a sample on the data specified by the criteria range
DSTDEVP returns the Standard Deviation of the population of the data specified by the criteria range
DSUM returns the Sum of the data specified by the criteria range
DVAR returns the Variance of the sample of the population of the data specified by the criteria range
DVARP returns the Variance of entire population of the data specified by the criteria range

You will likely find it useful to download the example workbook that contains the data and formulas used in this article.

D-Function Syntax

All of the D-Functions share the same or very similar syntax. This syntax is:

=DFUNCTION(DataTable, ColumnIdentifer, CriteriaRange)

where DFUNCTION is the name of the function, DataTable is the worksheet cell range containing the data whose values are to be examined, ColumnIdentifier is the title or number of the column in DataTable that contains the values to be processed or summarized, and CriteriaRange is a range that contains the selection criteria that are used to select rows from the DataTable. The data table and the criteria range are described below.

There are two halves to a D-Function: a range of data to be examined and a range that specifies the criteria to use when testing data values. If you are familiar with databases and SQL, you can think of the D-Function as a SELECT statement, the data table like the database for a SELECT statement and the criteria range as the WHERE clause of the SELECT statement. Unlike SQL, though, a D-Function can return only a single field (column) of a single row of the table. It cannot return multiple values. If you don't know SQL, don't worry, it isn't used with the D-Functions.

The Data Table

The data table is simply a rectangular range of data values, where each row of the data is one record (e.g., one sales transaction) and each column of the row is a data value (e.g., quantity sold in the transaction) of the record. The first row of the data range must be column titles that will match the titles in the criteria range, described later. For example, the following data table, in cells A1:E16 and given a defined name of Data will be used in the examples in this article. Note that the column titles are required in the first row of the data table.

Product	Salesman	Quantity	Price	Total
TV1            Bob           1         10       10
TV2            Bob           2         20       40
Stereo1        Bob           1         30       30
Stereo1        Joe           1         30       30
Stereo1        Joe           2         30       60
Stereo2        Joe          33         40     1320
Stereo2        Mike          4         40      160
Stereo3        Mike          5         50      250
IPod1          Mike          1         60       60
IPod1          David         2         60      120
IPod1          David         3         60      180
IPod2          David         4         60      240
IPod2          Bob           5         70      350
IPod2          David         6         70      420
IPod2          Mike          7         70      490

The Criteria Range

The criteria range is a bit more complicated. It is used to control which values in the data table are processed by the D-Function. The first row of the criteria range should be the column titles in the first row of the data table. In the example here, that would be:

Product	Salesman	Quantity	Price	Total

The values entered into the cells below the column titles are used to specify the selection criteria. This serves the same purpose of a the WHERE clause of a SELECT statement in an SQL query. Before moving forward with an example, we need one more piece to fill out the structure of a D-Function. That is to create a formula that will properly address the criteria range. This formula is the key to creating a well designed workbook that uses the D-Functions.

The criteria range must not contain empty rows. Empty rows will change the logic of the criteria selection, resulting in incorrect calculations. Moreover, as we will see later, some criteria ranges will have duplicate column titles to implement logical AND operations. If we were to refer to the criteria range by its cell address, every D-Function formula would need to be changed each time the criteria range is modified to reflect the addition or deletion of criteria elements. Therefore, it will be very helpful to create a formula that returns the address of the used portion of the criteria range. This way the criteria range will resize itself when criteria values are added or deleted.

First, you need to determine the maximum number of rows that your criteria range will need. Typically this will be no more than three or four (beyond that, the logic gets quite complicated), so for safety's sake, we'll define here the criteria range to be 11 rows: 19:29. The first row of this is the column titles, the same titles as are in the first row of the data table. The next 10 rows are the actual criteria expressions. Now, select the rows 20:29 (you must select the entire rows -- click on the first row number on the left of the screen, hold down the SHIFT key and click on the last row number), and type CriteriaData in the Name box (the box to the left of the formula bar, above the A column heading). This defines the name CriteriaData to include the entire rows of the actual criteria range, excluding the column titles. Next, on the row above the column titles of the CriteriaData range, define the name DynaCrit to cell D18. In that cell, paste the followng formula:

=ADDRESS(ROW(CriteriaData)-1,COLUMN(CriteriaData))&":"&
ADDRESS(MAX(ROW(CriteriaData)*(CriteriaData<>"")),
MAX(COLUMN(CriteriaData)*(CriteriaData<>"")))

The formula is split here for readability, but should be all on a single line in Excel. NOTE: This is an array formula so you must press CTRL SHIFT ENTER rather than just ENTER when you first enter the formula and whenever you edit it later. If you do this properly, Excel will display the formula in the formula bar enclosed in curly braces { }. If you do not properly enter this formula as an array formula, you will get a #VALUE error result and the D-Functions will fail. (See my page on Array Formulas for a detailed discussion of array formulas.) This formula returns the address of the used portion of the criteria range and this address will be used in the D-Functions. Because this formula returns the address of only the used portion of the criteria range, we need not worry about blank lines in the criteria range -- they will be excluded by the formula. That's why we were able to safely use ten rows in our criteria range, even if we never use all ten rows.This formula and reference assume that you have nothing to the right of the criteria range. All the cells to the right of the criteria range must be empty. If they are not empty, the formula will return an incorrect result .

Let's start with an example of the DSUM function. DSUM calculates the sum of one column in the data table summing those rows that meet the criteria in the criteria range. Put the value David in the Salesman column of the criteria range:

Product   Salesman   Quantity   Price   Total
           David

Next, in a cell below the end of the CriteriaData range, enter the following formula:

=DSUM(Data,"Quantity",INDIRECT(DynaCrit))

This formula will sum the values in the Quantity column of the data table that meet the criteria in the range whose address is stored in the DynaCrit cell. In this example, DSUM looks in the Salesman column to match the word David. For each match found in the Salesman column, DSUM retrieves the value in the Quantity column. Because the formula in the cell named DynaCrit returns the address of the used portion of the criteria range, we use the INDIRECT function to refer to the actual range whose address is in DynaCrit. (See www.cpearson.com/Excel/Indirect.htm for additional information about the INDIRECT function.) The DSUM function will sum all the values in the Quantity column where the value in the Salesman column is equal to David. All of the examples of criteria ranges below will assume that we are using this formula. To sum another column, just change "Quantity" to the title of the desired column. (The quotes are required as shown in the example formula. You can also use a cell reference in place of a hard-coded column title. If you use a cell reference, that cell should contain the column title without the quotes.)

Operations In The Criteria Range

If you put a simple value in a criteria range cell, such as we did above with David, the function treats that as a search for an exact match. You are not, however, restricted to exact matches. You can enter a comparison function in a criteria cell. For example, examine the criteria range below:

Product   Salesman   Quantity   Price   Total
                                < 100

This will sum the Quantity elements of those rows in Data where the Price is less than 100. You can use < for less than, > for greater than, <= for less than or equal to, >= for greater than or equal to, and <> for not equal. To test for a between condition (e.g., Price between 50 and 100), you need to add a second Price column to the criteria range (not the data range) and put a greater than (or greater than or equal to) condition in one Price column and a less than (or less than or equal to) condition in the second Price column. For example,

Product   Salesman   Quantity   Price   Price   Total
                                >= 50   <= 100

This will return values for rows whose Price columns is between 50 and 100 (inclusive). Because the comparison conditions are in the same row of the criteria range, they are combined to produce a logical AND operation. In other words, the logic is If Price >= 50 And Price <= 100.

You can use the cells in the CriteriaData range to perform logical operations with the DSUM (or any of the D-Functions). This is why we made the CriteriaData 10 rows tall. In general, if two criteria values are on the same line within the criteria range, then a logical AND operation is performed. For example, look at the criteria range:

Product   Salesman   Quantity   Price   Total
  IPod1     David

Because IPod1 and David are in the same row of the criteria table, only rows in Data where David sold an IPod1 are processed by DSUM. This returns a result of 5. To return the Total sales amounts, not quantities, simply change the formula to

=DSUM(Data,"Total",INDIRECT(DynaCrit))

If values appear on different lines in the criteria range, then a logical OR is peformed. For example, to return the quantity of either IPods1 or IPods2 were sold (by any salesman), you would change the criteria range to the range below. Note that this OR operation should be taken in the context of If Product = IPod1 Or Product = IPod2. Even though you are returning the results of IPods1 And IPods2, the actual logic is an OR not an AND.

Product   Salesman   Quantity   Price   Total
 IPod1     
 IPod2

Because IPod1 and IPod2 are on separate rows of the criteria range, DSUM will sum the Quantity columns of Data where Product is either IPod1 OR IPod2. You can combine elements in criteria range to perform compound logical functions. For example, consider the criteria range:

Product   Salesman   Quantity   Price   Total
  Stereo1   Joe     
  Stereo2
  IPod2     David

The DSUM formula will sum the Quantity values where the Procuct is Stereo1 and sold by Joe, all Stereo2 products, and IPod2 products sold by David. The result of the DSUM formula is 50.

If you are testing text strings in your criteria range, you can use the ? and * wildcard characters to limit matching strings. For example,

Product   Salesman   Quantity   Quantity    Price   Total
                       TV*

will match all Products that begin with the characters TV.

Earlier, I alluded to the fact that column titles may be duplicated in the titles row of the criteria range. This allows us to use a logical AND operation on single data category. Insert some cells next to the Quantity column in the criteria table, one column by ten rows, and use criteria values like:

Product   Salesman   Quantity   Quantity    Price   Total
                        >= 4      <= 20

Because the Quantity column title appears twice, we can put values on the same row to force an AND operation, and this criteria range will cause DSUM to sum the Quantity values only if the value is greater than or equal to 4 AND is less than or equal to 20. The result of the formula is 31.

By combining various constraints on the same rows and in different rows within the criteria range, you can create quite complex logical queries on the data.

Other D-Functions

All of the other D-Function listed at the beginning of this article work just like the DSUM function. The only difference is what operation is performed. For example, DCOUNTA counts, rather than sums, values from the data table. The syntax is the same or very similar for all D-Functions. The criteria range works identically for all the D-Functions.

It is worth noting that the cells in the criteria range may contain formulas in addition to the static values we used in the examples, so by combining the logic of formulas with the logical operation allowed by placing elements on the same or on different rows within the criteria range, you can create very powerful data queries.

D-Functions Versus Array Formulas

If you are familiar with array formulas (see www.cpearson.com/Excel/ArrayFormulas.aspx for a discussion of array formulas), you have probably noticed that the D-Functions provide the same functionality that can be achieved with array formulas. So, which should you use, D-Functions or array formulas? Broadly speaking, the D-Functions are faster than their array formula counterparts, so if you have a large and complicated workbook, you may see some significant peformance gains by rewriting the array formulas a D-Functions. Be aware that not all array formulas can be written as D-Function. Sometimes an array formula is the only solution. The other drawback to D-Functions is that the criteria must be in a range of cells. You cannot provide the criteria in the formula itself. The formula requires a reference to a range of cells. Finally, a D-Function can return only a single value to a single cell. Array formulas can be written to return many values to a range of cells.