Monday, January 17, 2011

DAX Time Intelligence Functions

Time intelligence functions are an advanced feature of DAX. They are typically composite functions built on top of other more basic DAX features and have more stringent requirements for the structure of the underlying database. But since time related calculations are fundamental to almost all BI projects, early adopters of DAX started using the time intelligence functions almost immediately after the first release of DAX. Due to the advanced nature of these functions, questions naturally arise about how they work and why they work that way even when people managed to get the desired results.
Marius Dumitru, architect of AS engine team, gave the following recipe to users of time intelligence functions.
1.            Never use the datetime column from the fact table in time functions.
2.            Always create a separate Time table with contiguous dates (i.e. without missing day gaps in the date values).
3.            Create relationships between fact tables and the Time table.
4.            Make sure that relationships are based on a datetime column (and NOT based on another artificial key column).
5.            Make sure you have full years’ worth of data in the Time table. For instance, even though your transaction table may only have sales data up to May 2010, the Time table should have dates up to December 2010 (same goes for fiscal years).
6.            The datetime column in the Time table should be at day granularity (without fractions of a day).
In this post, I’ll expose more details behind the implementation of time intelligence functions to shed light on the rationale behind Marius’ advice. This post is meant to supplement online documentation of SQL Server 2008 R2. I will not cover basic usage examples of time intelligence functions in typical BI applications. DAX is a young functional language that evolves rapidly. What is covered here applies to SQL Server 2008 R2.
I assume you are already familiar with advanced DAX concepts such as row context, filter context, Calculate function, Values function, measures, etc. I’ll start with some common features across all time intelligence functions before I delve into individual functions.
The <dates> argument
All time intelligence functions have a special <dates> argument. FirstNonBlank and LastNonBlank operate on non-date/time columns but the conversion rules descibed below still apply. With the exception of DatesBetween and DatesInPeriod, the <dates> argument can be one of three forms:
1.       A reference to a date/time column.
2.       A table expression that returns a single column of date/time values.
3.       A boolean expression that defines a single-column table of date/time values.
Internally, all three forms are converted to a DAX table expression in the following fashion:
Format number
<dates> format
Internal table expression
1
T[C]
CalculateTable(Values(T[C]))
2
Single column table expression
As is
3
Well-formed scalar expression
Filter(All(T[C]), <scalar expression>)


What is a well-formed scalar expression? Vaguely speaking, it is a scalar expression that DAX engine can analyze and extract a fully qualified column reference to a date/time column. T[C] = Date(2011, 1, 17) is a well-formed expression as DAX engine can extract T[C] from it. T[C] * 2 is a well-formed expression for the same reason even though it is not a boolean expression DAX engine will cast it to boolean data type implicitly. Date(2011, 1, 17) is not a well-formed scalar expression as no column reference can be extracted from it.
Note that internally inserted CalculateTable in the first format automatically converts current row contexts to filter contexts. So
MaxX(Distinct(DimDate[CalendarYear]), OpeningBalanceYear([m], DimDate[Datekey]))
will give the maximum opening balance among all years, but
                MaxX(Distinct(DimDate[CalendarYear]), OpeningBalanceYear([m], Values(DimDate[Datekey])))
will not as the <dates> argument is in the second format hence not correlated to the current calendar year on the row context.
A special rule regarding datetime filter inside Calculate/CalculateTable
If a Calculate filter has a unique column that is of data type date/time, all previous filters on all columns from the table which contains this date/time column are removed. This hacky feature implies that
                Calculate(<expression>, <TI function>) = Calculate(<expression>, <TI function>, All(DateTable)).
This is the reason behind Marius’ recommendation #4. Let’s say you have some years on the pivot-table row and then you drag a measure which uses time-intelligence function DateAdd to show sales from the previous year. The author of the measure formula may not realize that DateAdd function only returns a single column of Datekey which overwrites existing filter on the same column. This special rule makes sure that the filter on the CalendarYear column, which comes from the pivot-table, is also removed so you get back the expected result. Without this special rule, Calculate(<expression>, <TI function>) would set days of the previous year on the Datekey column but leave the previous year as the filter on the CalendarYear column. The conflicting filters would have produced a blank result.
In PowerPivot v1, the only way to mark a column as unique is to create an incoming relationship to this column, hence Marius’ recommendation #3. You obviously also need a separate Date table in order to create a relationship.
In practice, people want to use integer keys to create relationship between Fact table and Date table. If you want to use time intelligence functions in that case, you must add All(DateTable) filter yourself to the measure expression.
In future versions of PowerPivot, users may be able to mark a date/time column as unique without creating an incoming relationship to the column. When that happens, recommendation #3 would no longer be needed.
Calendar
DAX engine creates an internal calendar data structure for each date/time column used in the time intelligence functions. The calendar contains a hierarchy of year-quarter-month-day. The minimum year and the maximum year in the calendar are determined by the actual dates in the date/time column. Internally, intermediate results are represented as arrays of days. So to represent January 2011, intermediate result would hold the 31 days in that month. Although a calendar contains all days in the years involved, it marks which days actually exist in the column. Calendar member functions return result days only if they are marked as exists. So if you have missing dates in the date/time column, they cannot be returned as a result of time intelligence functions.
When a calendar data structure is initially populated, it raises an error if two values from the date/time column correspond to the same day. So if you have a date/time column below the day granularity, it cannot be used in time intelligence functions.
Obviously calendar is an internal implementation detail which is subject to change in future releases of DAX. I mention it here to explain some of the limitations imposed on the current version of time intelligence functions. I’ll refer to calendar again when I discuss individual functions next.
Primitive time intelligence functions
As I mentioned at the beginning of the post, time intelligence functions are an advanced feature of DAX. They are built on top of other DAX features and functions, so there is nothing primitive about them. But many time-intelligence functions are composed from more basic time-intelligence functions which have their native implementations, I call the latter primitive ones.
FirstDate/LastDate/StartOfMonth/EndOfMonth/StartOfQuarter/EndOfQuarter/StartOfYear/EndOfYear return a table of a single column and a single row. They can be used anywhere a table expression or a scalar expression is needed due to implicit table to scalar cast.
As I described in the section on the <dates> argument, no matter which format the user uses, internally they are all converted to a table expression. From now on I will use <dates table> to denote the table expression equivalent to <dates>. I will use DimDate[Datekey] or simply [Datekey] to denote the date/time column extracted from <dates> expression.
FirstDate(<dates>)
LastDate(<dates>)
These two functions return a single column, single row table with values equivalent to
                MinX(<dates table>, [Datekey]), or
                MaxX(<dates table>, [Datekey]).
FirstNonBlank(<column>, <expression>)
LastNonBlank(<column>, <expression>)
These two functions are not pure time-intelligence functions as the first argument is not limited to <dates> as in all other time-intelligence functions. They are internally rewritten as
                Top1(Filter(<column table>, Not(IsBlank(<expression>)), [column]), and
                Bottom1(Filter(<column table>, Not(IsBlank(<expression>)), [column]) respectively.
Top1 and Bottom1 are internal functions similar to MinX and MaxX functions but support all DAX data types include boolean and string data types.
Note that <expression> is not automatically wrapped in Calculate, therefore,
                FirstNonBlank(DimDate[Datekey], Sum(FactSales[SalesAmount]))
does not give you what you want, but
                FirstNonBlank(DimDate[Datekey], Calculate(Sum(FactSales[SalesAmount])))
will produce the expected result.
StartOfMonth(<dates>)
StartOfQuarter(<dates>)
StartOfYear(<dates>, <year_end_date>)
Although not implemented this way, these functions first find FirstDate(<dates>), then jump to first day that exists in the same month/quarter/year.
EndOfMonth(<dates>)
EndOfQuarter(<dates>)
EndOfYear(<dates>, <year_end_date>)
Although not implemented this way, these functions first find LastDate(<dates>), then jump to the last day that exists in the same month/quarter/year.
DateAdd(<dates>, <number_of_intervals>, <interval>)
ParallelPeriod(<dates>, <number_of_intervals>, <interval>)
SamePeriodLastYear(<dates>)
SamePeriodLastYear(<dates>) is identical to DateAdd(<dates>, -1, Year).
Both DateAdd and ParallelPeriod invoke a function, Move, on the calendar object. The only difference is that ParallelPeriod requires the result days to fill an entire month/quarter/year, while DateAdd does not have this requirement.
Currently, DateAdd has a limitation that <dates table> must contain continuous days so that the result days can be continuous too.  We have this limitation because when you move all 28 days in February one month forward, like below,
                DateAdd(filter(All(DimDate[Datekey]), Year([Datekey]) = 2006 && Month([Datekey]) = 2), 1, Month)
you get back 31 days in March! As we mentioned in the section about Calendar, all intermediate results of time intelligence functions are represented as an array of days. So there is no difference between all days in February and February itself. But what if you have one day missing in the middle of February? Should we return one day missing in March or 27 days in March? This seems to be a tricky but less important question, so we left it undecided and raised an error instead.
The calendar object’s Move function is the most intelligent part of all time intelligence functions. It knows that moving both 3/30 and 3/31 one month forward end up on the same day 4/30. Internally it only moves by number of days or months, number of quarters or years is simply translated to corresponding number of months. Sometimes the Move function can be too smart for the user. When it moves a continuous range of days forward N months, it checks the start day and the end day of the range to see if you are moving whole months or just individual days. This works well when all days exist in the date/time column. But if you have missing days, the Move logic still tries to guess whether you are moving whole months based on days that exist, this may not be what you would expect, hence Marius’ recommendation #2.
DatesBetween(<dates>, <start_date>, <end_date>)
DatesInPeriod(<dates>, <start_date>, <number_of_intervals>, <interval>)
These two functions are different from the other time intelligence functions in that the <dates> argument can only be a fully-qualified column reference to a date/time column. Note that online document incorrectly states that the <dates> argument in DatesInPeriod can be any of the three formats.
Internally, both functions are rewritten as
                Filter(All(DimDate[Datekey]), DatesRange([Datekey], Earlier(<start_date>), Earlier(<end_date>))) or
                Filter(All(DimDate[Datekey]), DatesRange([Datekey], Earlier(<start_date>, Earlier(<number_of_intervals>), <interval>))
In the formulas above, Earlier is an internal extended version of the public Earlier function that evaluates an entire DAX expression by skipping the last row context. DatesRange is an internal boolean function that returns true if the value of the first argument falls within an interval. In case of DatesRange(<date_value>, <start_date>, <end_date>), the interval is defined as [start_date, end_date]. In case of DatesRange(<date_value>, <start_date>, <number_of_intervals>, <interval>), the interval is defined as [start_date, end_date) where end_date is calculated by calling the Move function on the calendar object, so it is a smart moveJ Note that since <number_of_intervals> can be a negative number, like when you move backwards in time, the interval specified by DatesInPeriod can be in the reverse order. DatesBetween does not allow reversed end points.
Also note that All(DimDate[Datekey]) implies that existing filter context does not apply to the date/time column, unlike when all other time intelligence functions use the name-pair syntax.
PreviousDay(<dates>)
PreviousMonth(<dates>)
PreviousQuarter(<dates>)
PreviousYear(<dates>, <year_end_date>)
Although not implemented this way, these function are equivalent to
FirstDate(DateAdd(<dates>, -1, Day)) in case of PreviousDay, or
DatesInPeriod(
<dates>,
StartOfMonth/StartOfQuarter/StartOfYear(
DateAdd(<dates>, -1, Month/Quarter/Year)
),
1,
Month/Quarter/Year
) in case of PreviousMonth/PreviousQuarter/PreviousYear.
NextDay(<dates>)
NextMonth(<dates>)
NextQuarter(<dates>)
NextYear(<dates>, <year_end_date>)
Although not implemented this way, these functions are equivalent to
LastDate(DateAdd(<dates>, 1, Day)) in case of NextDay, or
DatesInPeriod(
<dates>,
EndOfMonth/EndOfQuarter/EndOfYear(
DateAdd(<dates>, 1, Month/Quarter/Year)
),
-1,
Month/Quarter/Year
) in case NextMonth/NextQuarter/NextYear.

Composite time intelligence functions
The remaining time intelligence functions listed below are always internally rewritten using other time intelligence function.
DatesMTD(<dates>)
                DatesBetween(<dates>, StartOfMonth(LastDate(<dates>)), LastDate(<dates>))
DatesQTD(<dates>)
                DatesBetween(<dates>, StartOfQuarter(LastDate(<dates>)), LastDate(<dates>))
DatesYTD(<dates>, <year_end_date>)
                DatesBetween(<dates>, StartOfYear(LastDate(<dates>), <year_end_date>), LastDate(<dates>))
TotalMTD(<expression>, <dates>, <filter>)
                Calculate(<expression>, DatesMTD(<dates>), <filter>)
TotalQTD(<expression>, <dates>, <filter>)
                Calculate(<expression>, DatesQTD(<dates>), <filter>)
TotalYTD(<expression>, <dates>, <filter>, <year_end_date>)
                Calculate(<expression>, DatesYTD(<dates>, <year_end_date>), <filter>)           

OpeningBalanceMonth(<expression>, <dates>, <filter>)
                Calculate(<expression>, PreviousDay(StartOfMonth(<dates>)), filter)
OpeningBalanceQuarter(<expression>, <dates>, <filter>)
                Calculate(<expression>, PreviousDay(StartOfQuarter(<dates>)), filter)
OpeningBalanceYear(<expression>, <dates>, <filter>, <year_end_date>)
                Calculate(<expression>, PreviousDay(StartOfYear(<dates>, <year_end_date>)), filter)

ClosingBalanceMonth(<expression>, <dates>, <filter>)
                Calculate(<expression>, EndOfMonth(<dates>), <filter>)
ClosingBalanceQuarter(<expression>, <dates>, <filter>)
                Calculate(<expression>, EndOfQuarter(<dates>), <filter>)
ClosingBalanceYear(<expression>, <dates>, <filter>, <year_end_date>)
                Calculate(<expression>, EndOfYear(<dates>, <year_end_date>), <filter>)

2 comments:

  1. too good piece of information, I had come to know about your site from my friend sajid, bangalore,i have read atleast 11 posts of yours by now, and let me tell you, your web-page gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new post, once again hats off to you! Thanks a lot once again, Regards, Single Row Function in sql

    ReplyDelete
  2. attractive piece of information, I had come to know about your blog from my friend arjun, ahmedabad,i have read atleast eleven posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a lot once again, Regards, Single Row Function


    ReplyDelete