DAC:Building and Running Micro ETL Execution Plans

  1. Micro ETL execution plans are ETL processes that you schedule at very frequent intervals, such as hourly or half-hourly. 
  2. They usually handle small subject areas or subsets of larger subject areas. DAC tracks refresh dates for tables in micro ETL execution plans separately from other execution plans and uses these refresh dates in the change capture process.
  3. After a micro ETL execution plan runs, DAC populates refresh date values in the Refresh Dates child tab of the Execution Plans tab. 
  4. If a subject area is used in a regular execution plan (an execution plan with the Keep Separate Refresh Dates option not selected) as well as a micro ETL execution plan, DAC maintains refresh dates for the tables in the regular execution plan in the Refresh Dates child tab of the Physical Data Sources tab (Setup view).
  5. In cases of a subject area being used in both a regular and micro ETL execution plan and the micro ETL execution plan is suspended for a few days but the regular execution plan runs nightly, DAC automatically detects the last refresh date for the tables common to both execution plans and intelligently extracts only the most recent records for the micro ETL execution plan.

Micro ETL processes can cause issues with data inconsistencies, data availability, and additional load on the transactional database. Therefore, you should consider the following factors before implementing a micro ETL process:
  • For related star schemas, if one schema is omitted from a micro ETL execution plan, the cross-star reports may be inaccurate. For example, if the Person fact table is refreshed more frequently than the Revenue fact table, a report that spans the Person and Revenue dimensional schemas may produce inconsistent results.
  • If you omit dimension tables from a micro ETL execution plan, the foreign keys for the fact tables will point to Unspecified rows for the new dimension records. The foreign key references will be resolved when the Complete ETL execution plan is run, but users of the reports should be aware of such inconsistencies.
  • If you do not include aggregate tables in micro ETL execution plans, the reports that use data from these tables will be inconsistent with the reports that use data from the detailed fact tables. However, if aggregate tables are included in the micro ETL execution plan, the aggregate calculations are performed for each ETL process, which will take a constant amount of time and may be inefficient to perform at such frequent intervals.
  • Hierarchy tables are rebuilt during every ETL execution plan by querying the base dimension tables. This operation takes a constant amount of time. If the base tables are big, this operation may take a long time and may be inefficient if the micro ETL execution plan runs several times a day. However, if you avoid populating the hierarchy tables during micro ETL processes, data inconsistencies will occur.
  • With micro ETL execution plans, caching will occur more frequently, which may have performance implications.
  • Micro ETL execution plans will put more load on the transactional database because of the frequent extracts.
To a build and run a micro ETL execution plan
  1. In the DAC toolbar, select the appropriate source system container from the drop-down list in the toolbar.
  2. From the Menu bar, select Views, then select Design, then select Subject Areas.
  3. In the Subject Areas tab, assemble a small subject area.
  4. In the Tasks child tab, inactivate all tasks that are not required for the execution plan.
  5. Create a new execution plan.
    1. Navigate to the Execute view, then select the Execution Plans tab.
    2. Enter a name for the execution plan
    3. Select the Keep Separate Refresh Dates check box.
    4. Click Save.
  6. Associate one or more subject areas with the execution plan. The subject areas can belong to one or more source systems.
    1. Click the Subject Areas child tab.
    2. Click Add/Remove in the bottom pane toolbar.
    3. In the Choose Subject Areas dialog, select the appropriate source system container.
    4. Query for the subject area you want to associate with the execution plan.
    5. Select the subject area and click Add.
      You can associate multiple subject areas with an execution plan, but all the subject areas must be from the same source system container.
    6. Click OK to close the window.
  7. Generate the runtime execution plan parameters.
    1. Click the Parameters subtab, and then click Generate in the bottom pane toolbar.
    2. In the Generating Parameters dialog box, enter the number of copies for each source system container, and then click OK.
    3. Click OK in the informational message.
      DAC automatically generates the parameters required for each copy. Not all copies require all of the possible parameters.
    4. On the Parameters subtab, edit the parameters for each copy of the source system container as follows:
      For each data source type, select the appropriate value from the Value drop-down list.
      Note:
      For the data source type of FlatFileConnection, make sure you have copied all files into the directory specified in the DAC system property InformaticaParameterFileLocation.
      For each Informatica SIL and SDE folder, select the appropriate value in the Value drop-down list.
    5. For each data source type, enter the appropriate name in the Value field.
    6. (Optional) If you are extracting data from more than one source system container and want to stagger the data extracts, in the Delay field for the appropriate data source, enter a value for the number of minutes you want to delay the extract.
  8. In the top pane of the Execution Plans tab, make sure the new execution plan is highlighted, and click Build.
    DAC builds the execution plan.
  9. Click the Ordered Tasks child tab and verify the following:
    1. Click Details in the toolbar, and review each task's predecessor and successor tasks to confirm tasks common to multiple sources are ordered in a manner consistent with the priority of the source connection.
    2. Confirm that load tasks appear only once even if there are multiple extracts for tables common to multiple sources.
    3. For tasks common to multiple sources, click Preview Run Details in the toolbar, and confirm the following:
      The first common task truncates the common target table and the following tasks do not.
      The first common task truncates the common target table and the following tasks do not.
    The execution plan is now ready to run as a micro ETL execution plan.
  10. Create a schedule for the micro ETL execution plan.

Registering BI Applications Data Sources

BI Applications 11.1.1.7.1 introduced a number of changes; adding new components and processes and enhancing others.  As a result many customers and partners are re-learning how to install, configure and manage the application and naturally a number of questions have arisen.  This post walks through the steps of registering a source application system in BI Applications, and then looks a little deeper into what has actually been setup behind the scenes and how to troubleshoot those connections.

Registering a Data Source in Configuration Manager

To register a data source, log into configuration manager, and navigate to the Define Business Intelligence Applications Instance tab.
Note: Configuration manager is the only place to register data sources, the rest of the details in the post are to allow you to troubleshoot, but Configuration Manager is the place for everyone, novices and experts to register a Data Source.
Click on the green ‘+’ sign to start the Register Source wizard.
In the first page of the wizard, provide a clear name for the source under Source Instance Name.  A description is optional.  Assign a unique Data Source Number – this value ties records in the data warehouse to the specific source system where those records came from.


In the second page of the wizard the physical connections for the source are setup.  Each source system has both a relational and a flat file technology associated with it. Most ETL tasks extract from the source database but several ETL tasks extract from flat files that are associated with that particular source system.  The relational and flat file details are registered directly into ODI as separate Data Servers and each Data Server must have a unique name.
When configuring the relational connection, you are defining the details required to connect to the source system.  The ETL process will use this information to connect to the source system in order to extract data.   First give a name for the ODI Data Server Name, as discussed this needs to be unique so an example might be: EBS R12.1 EMEA Dev Database.  Next give the details required to connect to the source application database using a JDBC connection, i.e. the JDBC URL, the Database Username and Database Password.  For example, to connect to an E-Business Suite database you generally provide the APPS database user and corresponding password, and to connect to a Siebel database you provide the SIEBEL database user.  To clarify, these are neither the database details and credentials to connect to the BI Apps Component Repository database schema nor the BI Applications Data Warehouse database schema.

You must also define the details for the Flat File connection.  A value for the ODI Data Server Name must be provided, this must be distinct from the the relational connection an example for this connection might be: EBS R12.1 EMEA Dev Source Files.  The Host Name is the machine where the flat files are stored – this host must be accessible by the ODI agent.  This is generally the same machine where ODI Agent is deployed or a network path accessible from that machine.  Path to file is a relative or absolute path to the location where the files associated with the source system can be found.  Each source system needs to have its own dedicated subfolder under the SrcFiles folder.


Looking at the Data Source Definition in ODI Topology

The registering of the Data Source in Configuration Manager detailed above results in connections being entered into the ODI repository ready for ETL execution. To view the connection information in ODI either ODI studio or ODI Console can be used. 
If you were to look at the Topology Navigator in ODI prior to registering any sources in Configuration Manager, you would see the BI Apps specific physical servers and schemas already defined and populated with the appropriate credentials and paths – this was provisioned during the installation process as a result of providing these details to the installer.   You cannot edit these properties from Configuration Manager and although not recommended, if you were to need to make any changes, these changes need to be done using ODI Studio.
ODI Topology prior to Source Configuration, showing Only DW Connections
ODI Topology prior to Source Configuration, showing Only DW Connections
After registering a datasource in Configuration Manager, you will see the connections in the Physical Architecture in ODI Studio.  Continuing the example, you can see that we now have an Oracle based entry and a File based entry – the names correspond to the ODI Date Server Names provided in Configuration Manager.
ODI Topology after Source Configuration, showing DW and Source Connections
ODI Topology after Source Configuration, showing DW and Source Connections
Important – Do not edit these entries directly in ODI – any changes must be made in Configuration Manager.  Any changes made in ODI will not be reflected in Configuration Manager which can lead to errors.  Only follow these steps to confirm the sources were correctly registered in ODI.
For your information you can inspect the properties of the server and schema that have been created in ODI Studio

Example: Oracle DBMS and File details for EBS R12.1.3 Source



Database Server Connection
Physical Server Definition.
Physical Server JDBC
Physical Schema
Context and mapping to Logical Schema
Flexfields showing Data Source Num ID attribute


File Server Connection
Physical Server Definition.
Physical Server JDBC
Physical Schema
Context and mapping to Logical Schema
Flexfields showing Data Source Num ID attribute



Making Changes to Data Sources

Correcting Connection Details

If you entered the wrong details for a connection to a data source, these can be corrected as follows:
1.    Select the datasource and click the ‘Edit’ button

2.    Make corrections to the details:

3.    Restart the ODI Agent.  The ODI Agent does not automatically reflect changes to the connection.  You must restart the ODI agent otherwise it will continue to use the old credentials.  The ODI Agent can be restarted from Enterprise Manager  or the WebLogic Server Console

Restarting the ODI Agent from Enterprise Manager:

Connect to Enterprise Manager and under ‘Deployments’ scroll down and drill on ‘oraclediagent’


Click on the ‘Application Deployment’ menu option and select ‘Shut Down…’ and ‘Start Up’

Restarting the ODI Agent from the WebLogic Server Console

Log into the WLS console, click on ‘Deployments’ under ‘Domain Structure’
Scroll down to ‘oraclediagent’ and check the checkbox
Click the ‘Stop’ and ‘Start’ buttons.


Changing and disabling a data source connection

The Configuration Manager UI does not support deleting connections.  If a connection exists for a data source of the correct type of application, also known as a product line version, but the details are incorrect, then you should update the existing connection to use the correct details.  If the connection is for a product line version that you don’t need to connect to, you should simply disable it:

Using a ‘Read-Only’ or other alternate user to access the OLTP tables

When registering a source in BIACM, we assume the database user defined is the owner of the OLTP tables or at least the user’s schema is populated with objects that can be selected from.  ODI will generate SQL in the form:
 SELECT … FROM <Database User Defined in BIACM>.TABLE
If you define a read-only user that has select privileges on the OLTP tables but there are no objects in this user’s schema, the SQL will fail with ‘Table or View not found’ errors.  You must populate this user’s schema with objects that can be selected from – either replicas of the actual tables that are populated with data else with synonyms or views that point to the actual tables in the base schema.  Otherwise, you must go into the ODI Studio and update the physical schema under the OLTP data server definition and provide the schema name where the tables can be found.  This will allow ODI to log on with the alternate ETL user but generate SQL in the following form:
 SELECT … FROM <Base Schema>.TABLE
A future blog article will provide more information on how to configure an alternate ETL user.  The most important thing to be aware of is that in some cases, ETL tasks may execute packages or procedures on the OLTP database (this is most commonly done when extracting from an Oracle eBusiness Suite database) – you must be sure to grant Execute privileges on these to your ETL user.

DAC Refresh dates

Refresh dates refer to the date of the last ETL process (the last time data was extracted from tables in a given database or loaded into tables in a given database). The DAC uses the refresh dates to determine whether to run the incremental load commands or to run full load commands and whether to truncate the target tables.
Refresh dates are tracked only for tables that are either a primary source or a primary target on tasks in a completed run of an execution plan. The DAC runs the full load command for tasks on which a table is a primary source or target if the refresh date against the table is null. When there are multiple primary sources, the earliest of the refresh dates will trigger a full load or an incremental load. If any one of the primary source tables has no refresh date, then the DAC will run the full load command.
Table shows the possible scenarios regarding refresh dates.
Table 12. Refresh Date Scenarios
Scenario
Table Type (in Tasks child tabs)
Refresh Date
Command DAC Will Use
Truncate Target Table?
1
Primary Source
Null
Full Load
Yes
Primary Target
Null
2
(See note below)
Primary Source
Null
Full Load
No
Primary Target
Not Null
3
(See note below)
Primary Source
Not Null
Full Load
Yes
Primary Target
Null
4
Primary Source
Not Null
Incremental Load
No
Primary Target
Not Null
  • Scenario 2. When two or more source tables load into the same target table as separate tasks, the source table in the second task may have refresh date as null while the target may have a refresh date.
  • Scenario 3. When a source loads into more than one target table in separate tasks, the refresh date may be null for the second target table while the source table may have refresh dates.