Instruction
TYPE2_FLG is usually used in slowly changing dimensions in
BI Applications. This flag indicates if the dimension is type 2, and it determines
the data storing behavior in ETL. This blog is to give you a better understanding
on how TYPE2_FLG works in ETL.
Background
Slowly Changing dimension
There are many fundamental dimensions such as Customer,
Product, Location and Employee in BI application. The attributes in these
dimensions are revised from time to time. Sometimes the revised attributes
merely correct an error in the data. But many times the revised attributes
represent a true change at a point in time. These changes arrive unexpectedly,
sporadically and far less frequently than fact table measurements, so we call
this topic slowly changing dimensions (SCDs).
Slowly changing dimensions (SCD) entities like Employee,
Customer, Product and others determine how the historical changes in the
dimension tables are handled and decide how to respond to the changes. There
are three different kinds of responses are needed: slowly changing dimension
(SCD) Types 1, 2 and 3.
Type 1: Overwrite
the attributes with new changes
Type 2: Add a New
Dimension Record
Type 3: Add a New
Field
We are talking about type 2 in this blog. In the Type 2 SCD
model the whole history is stored in the database. An additional dimension
record is created and the segmenting between the old record values and the new
(current) value is easy to extract and the history is clear. A minimum of three
additional columns should be added to the dimension row with type 2 changes: 1)
row effective date or date/time stamp (EFFECTIVE_FROM_DT); 2) row expiration
date or date/time stamp (EFFECTIVE_END_DT); and 3) current row indicator
(CURRENT_FLG).
SRC_EFF_FROM_DT and EFFECTIVE_FROM_DT
The two columns have different concepts though they have
similar name. We saw many customers getting confused about the two columns.
SRC_EFF_FROM_DT is
extracted from the effective start date of the source (mainly from the main
driven source) if the source has the history. If the source doesn’t store
history or the history is not extracted, it is hard coded as #LOW_DATE.
EFFECTIVE_FROM_DT is a system
column in dimension table to track the history. Remember that we use the
knowledge modules (KM) for repeatable logic that can be reused across ETL
tasks. Updating the SCD related columns, such as EFFECTIVE_FROM_DT, is usually
handled by KM. EFFECTIVE_FROM_DT is modified when inserting a new type 2 record
in incremental run, and it is usually modified to the same date as the changed
on date from the source. EFFECTIVE_FROM_DT does not always map to the Source
Effective Dates.
In type 2 SCD model, EFFECTIVE_FROM_DT is the date used to
track the history.
TYPE2_FLG in BI Application
TYPE2_FLG is a flag used to indicate if the dimension is
type 2 or not. This flag is used in many dimensions in BI application, such as
employee, user, position, and so on. This flag is very important because it
determines the history storing behavior.
TYPE2_FLG has two values: ‘Y’ and ‘N’. ‘Y’ means the
dimension is a type 2, and ‘N’ means the dimension is type 1. Type 2 dimensions
store the history, while type 1 dimensions only store the current record.
For example, if the supervisor is changed from Peter to
Susan for an employee on 01/02/2012:
Type 1
EMPLOYEE_ID
|
SUPERVISOR_NAME
|
CURRENT_FLG
|
123
|
Susan
|
Y
|
Type 2
EMPLOYEE_ID
|
EFFECTIVE_FROM_DT
|
EFFECTIVE_TO_DT
|
SUPERVISOR_NAME
|
CURRENT_FLG
|
123
|
01/02/2012
|
Future
|
Susan
|
‘Y’
|
123
|
01/01/1999
|
01/02/2012
|
Peter
|
‘N’
|
As shown above, type 1 dimension overwrites the supervisor
with the new supervisor, and only stores the current record. Type 2 dimension
inserts a new record with the new supervisor name and keeps the old record as a
history. The EFFECTIVE_FROM_DT, EFFECTIVE_TO_DT and CURRENT_FLG are modified
accordingly: EFFECTIVE_TO_DT is changed to 01/02/2012 and CURRENT_FLG is set as
‘N’ for the old record. The ‘CURRENT_FLG’ is set as ‘Y’ for the new record with
the new EFFECTIVE_FROM_DT.
How to Setup TYPE2_FLG
The out of the box code in BI application should have setup
the default values. For the type 2 dimensions, it is usually set as ‘Y’.
The TYPE2_FLG can be configured in BIACM. This variable is
configured by different dimension groups.
The screenshot above shows that you can configure the value
of this flag for difference dimension groups by clicking the parameter value
and overwriting it to a different value.
Note: You can only configure the TYPE2_FLG for the
dimension groups that are in this BIACM list. The dimension groups that are not
in the list cannot be configured.
You should set the value of TYPE2_FLG carefully. If you
override the TYPE2_FLG to ‘N’ for a type 2 dimension, you may meet some issues.
I will describe more details in the next session.
Possible Issues Related to TYPE2_FLG
As mentioned earlier, sometimes for some reason, the value
of TYPE2_FLG may be set to ‘N’ for the type 2 dimension. This may cause some issues.
In BI application, SDE mapping brings the history from the
source in the initial full load in
some adapters, such as EBS. TYPE2_FLG affects the storing behavior for these
historic records. Here compares the different behaviors when setting TYPE2_FLG
to ‘Y’ and ‘N’ for a type 2 dimension.
Case 1-TYPE2_FLG = ‘Y’
Let’s take employee dimension (type 2 dimension) as an example
Source
EMPLOYEE_ID
|
SRC_EFF_FROM_DT
|
SUPERVISOR_NAME
|
ROW #
|
123
|
01/01/1999
|
Peter
|
1
|
123
|
01/02/2012
|
Susan
|
2
|
When loading the data into data warehouse in the initial
full run, both the rows (including the historical record #1) will be loaded.
TYPE2_FLG is ‘Y’ in this case, KM, which will handle the loading behavior, uses
this value to determine the type of employee dimension, and accordingly the
storing method.
KM will modify EFFECTIVE_TO_DT and CURRENT_FLG for the two
records as TYPE2_FLG=’Y’ in this case.
EMPLOYEE_ID
|
EFFECTIVE_FROM_DT
|
EFFECTIVE_TO_DT
|
SUPERVISOR_NAME
|
CURRENT_FLG
|
123
|
01/02/2012
|
Future
|
Susan
|
‘Y’
|
123
|
01/01/1999
|
01/02/2012
|
Peter
|
‘N’
|
Case 2 - TYPE2_FLG =’N’
This time, the TYPE2_FLG is set as ‘N’ for employee
dimension (type 2 dimension), which is incorrect. KM will treat it as type 1
rather than type 2.
Source
EMPLOYEE_ID
|
SRC_EFF_FROM_DT
|
SUPERVISOR_NAME
|
ROW #
|
123
|
01/01/1999
|
Peter
|
1
|
123
|
01/02/2012
|
Susan
|
2
|
When loading the data into data warehouse, both the rows
will be loaded because the history from the source is stored. However, because TYPE2_FLG is ‘N’, KM
won’t modify EFFECTIVE_TO_DT and CURRENT_FLG accordingly, and this will cause
issues.
Employee Table in
Data warehouse
EMPLOYEE_ID
|
EFFECTIVE_FROM_DT
|
EFFECTIVE_TO_DT
|
SUPERVISOR_NAME
|
CURRENT_FLG
|
123
|
01/02/2012
|
Future
|
Susan
|
‘Y’
|
123
|
01/01/1999
|
Future
|
Peter
|
‘Y’
|
As shown above, the two records are in an overlapping time
range, and both have CURRENT_FLG as ‘Y’. It may give duplicates when resolving
the employee from the facts. For example, the transaction date 02/04/2013 will
fall into the time range of the two records, so both will be extracted, thus
causing the duplicates in the facts.
How to Debug TYPE2_FLG Issues
As discussed in the previous session, in order to avoid this
kind of issues, you should set the value of TYPE2_FLG carefully, and set it as
‘Y’ for out of the box TYPE2 dimensions.
In addition, when you get the duplicates in the fact, you
can do the following checks.
- Check where the duplicates come from in the fact, and find out the problematic dimension if they are from the dimension.
- Check the data in the dimension for the duplicates to see if you see the similar loading behavior as the one in use case 2 of the previous session. You can first simply see if multiple records having CURRENT_FLG=’Y’.
- Check the value of the TYPE2_FLG in ODI repository.
1. Open the session log of the task
2. Open ‘Definition’
3. Expand ‘Variable and Sequence Values’
4. Find TYPE2_FLG and check the value
5. If the value is ‘N’ but the dimension is type 2,
you may hit the issue described in the previous session.
I also would like to provide you some tips to find out the
type of a dimension here. You can find out this information in ODI repository.
- For one dimension, such as employee dimension, you should first know the dimension table name, for example, W_EMPLOYEE_D
- Go to ODI repository->’Designer’->’Models’
- Find out the dimension table and open it by double clicking it
- Go to ‘Definition’ and check the OLAP type. The type of slowly changing dimension tells you that this dimension is type 2
- You can also find out which attributes are type 2 by checking the column attribute
1. Expand the dimension table, for example,
W_EMPLOYEE_D and then expand Columns
2. Open the attribute of a column by double
clicking it
3. Go to ‘Description’ and check ‘Slowly Changing
Dimension Behavior’
As shown above, ‘Add Rows on
Change’ option tells you that this attribute is type 2.
Conclusion
This blog helps you understand how TYPE2_FLG works in ETL and
recognize the importance of this flag. It also gives you a way to debug the
possible TYPE2_FLG issue.
No comments:
Post a Comment