DAC:Building and Running Micro ETL Execution Plans

  1. Micro ETL execution plans are ETL processes that you schedule at very frequent intervals, such as hourly or half-hourly. 
  2. They usually handle small subject areas or subsets of larger subject areas. DAC tracks refresh dates for tables in micro ETL execution plans separately from other execution plans and uses these refresh dates in the change capture process.
  3. After a micro ETL execution plan runs, DAC populates refresh date values in the Refresh Dates child tab of the Execution Plans tab. 
  4. If a subject area is used in a regular execution plan (an execution plan with the Keep Separate Refresh Dates option not selected) as well as a micro ETL execution plan, DAC maintains refresh dates for the tables in the regular execution plan in the Refresh Dates child tab of the Physical Data Sources tab (Setup view).
  5. In cases of a subject area being used in both a regular and micro ETL execution plan and the micro ETL execution plan is suspended for a few days but the regular execution plan runs nightly, DAC automatically detects the last refresh date for the tables common to both execution plans and intelligently extracts only the most recent records for the micro ETL execution plan.

Micro ETL processes can cause issues with data inconsistencies, data availability, and additional load on the transactional database. Therefore, you should consider the following factors before implementing a micro ETL process:
  • For related star schemas, if one schema is omitted from a micro ETL execution plan, the cross-star reports may be inaccurate. For example, if the Person fact table is refreshed more frequently than the Revenue fact table, a report that spans the Person and Revenue dimensional schemas may produce inconsistent results.
  • If you omit dimension tables from a micro ETL execution plan, the foreign keys for the fact tables will point to Unspecified rows for the new dimension records. The foreign key references will be resolved when the Complete ETL execution plan is run, but users of the reports should be aware of such inconsistencies.
  • If you do not include aggregate tables in micro ETL execution plans, the reports that use data from these tables will be inconsistent with the reports that use data from the detailed fact tables. However, if aggregate tables are included in the micro ETL execution plan, the aggregate calculations are performed for each ETL process, which will take a constant amount of time and may be inefficient to perform at such frequent intervals.
  • Hierarchy tables are rebuilt during every ETL execution plan by querying the base dimension tables. This operation takes a constant amount of time. If the base tables are big, this operation may take a long time and may be inefficient if the micro ETL execution plan runs several times a day. However, if you avoid populating the hierarchy tables during micro ETL processes, data inconsistencies will occur.
  • With micro ETL execution plans, caching will occur more frequently, which may have performance implications.
  • Micro ETL execution plans will put more load on the transactional database because of the frequent extracts.
To a build and run a micro ETL execution plan
  1. In the DAC toolbar, select the appropriate source system container from the drop-down list in the toolbar.
  2. From the Menu bar, select Views, then select Design, then select Subject Areas.
  3. In the Subject Areas tab, assemble a small subject area.
  4. In the Tasks child tab, inactivate all tasks that are not required for the execution plan.
  5. Create a new execution plan.
    1. Navigate to the Execute view, then select the Execution Plans tab.
    2. Enter a name for the execution plan
    3. Select the Keep Separate Refresh Dates check box.
    4. Click Save.
  6. Associate one or more subject areas with the execution plan. The subject areas can belong to one or more source systems.
    1. Click the Subject Areas child tab.
    2. Click Add/Remove in the bottom pane toolbar.
    3. In the Choose Subject Areas dialog, select the appropriate source system container.
    4. Query for the subject area you want to associate with the execution plan.
    5. Select the subject area and click Add.
      You can associate multiple subject areas with an execution plan, but all the subject areas must be from the same source system container.
    6. Click OK to close the window.
  7. Generate the runtime execution plan parameters.
    1. Click the Parameters subtab, and then click Generate in the bottom pane toolbar.
    2. In the Generating Parameters dialog box, enter the number of copies for each source system container, and then click OK.
    3. Click OK in the informational message.
      DAC automatically generates the parameters required for each copy. Not all copies require all of the possible parameters.
    4. On the Parameters subtab, edit the parameters for each copy of the source system container as follows:
      For each data source type, select the appropriate value from the Value drop-down list.
      Note:
      For the data source type of FlatFileConnection, make sure you have copied all files into the directory specified in the DAC system property InformaticaParameterFileLocation.
      For each Informatica SIL and SDE folder, select the appropriate value in the Value drop-down list.
    5. For each data source type, enter the appropriate name in the Value field.
    6. (Optional) If you are extracting data from more than one source system container and want to stagger the data extracts, in the Delay field for the appropriate data source, enter a value for the number of minutes you want to delay the extract.
  8. In the top pane of the Execution Plans tab, make sure the new execution plan is highlighted, and click Build.
    DAC builds the execution plan.
  9. Click the Ordered Tasks child tab and verify the following:
    1. Click Details in the toolbar, and review each task's predecessor and successor tasks to confirm tasks common to multiple sources are ordered in a manner consistent with the priority of the source connection.
    2. Confirm that load tasks appear only once even if there are multiple extracts for tables common to multiple sources.
    3. For tasks common to multiple sources, click Preview Run Details in the toolbar, and confirm the following:
      The first common task truncates the common target table and the following tasks do not.
      The first common task truncates the common target table and the following tasks do not.
    The execution plan is now ready to run as a micro ETL execution plan.
  10. Create a schedule for the micro ETL execution plan.

1 comment: