OBIA Training: Restarting Load Plans

This section provides information about restarting load plans after a failure. It includes the following sections:

Overview of Load Plan Restartability
About Restartability Grain
Restarting Load Plans
Troubleshooting Load Plans
Alternate Options for Restarting Load Plans
Related Features and Considerations

1. Overview of Load Plan Restartability

When you run ETL to load data from a source system into the Oracle Business Analytics Warehouse (OBAW), it is possible that you may need to restart the ETL load after a failure. This section details all such cases, explains the options available for restart, and describes the implications of using each of those options.

Examples of circumstances and reasons for load plan failure include:

Problem with access either to source or target database due to network failure or expired or otherwise incorrect user names and passwords.
Failure of ODI agent.
Problem with space or storage. Able to connect to source or target database but the query fails to run due to lack of temp space, disk space, and so on. For files it could due to inadequate space where the file needs to be placed.
Problem with data, for example incorrect data with lengths larger than the target column can hold, or null values in Not Null columns.

After such a failure during ETL, to avoid restarting the entire load plan after a failure, which would require inefficient re-runs of all ETL tasks, you must restart the load from the same point in its execution once the cause of failure has been diagnosed and resolved.

2. About Restartability Grain

When you restart a load plan after a failure, you may not restart again from the exact point of failure, depending on where it occurred and on dependencies between load plan steps. The point of restartability is that the end result of the load plan execution is the same regardless of any load plan failure.

The following example describes one such dependency-driven requirement for re-running a step which has already completed: In a load plan with two steps, the first step truncates the table and the second inserts records into the table, intermittently committing the records. The load plan is run and fails at the second step due to a space issue. After the issue is resolved, restarting the load plan from the second step would be incorrect because the target has some inserted rows. Restart should instead begin with the first step so that the target table is truncated again and newly inserted data does not cause any duplicates.

To maintain data integrity in the case of restart, the grain would vary depending on the location in the step hierarchy of the failed step and on the Restart setting for the step in the Load Plan Manager.

Within the Steps Hierarchy in Load Plan Manager, you can view and edit the Restart setting of a step in the Restart column. The default settings for different steps in the hierarchy support data integrity in restarts:

Root steps are set to 'Restart From Failure' if Serial and 'Restart from failed Children' if Parallel.
Sub steps are set to 'Restart From Failure' if Serial and 'Restart from failed Children' if Parallel.
Scenario steps are set to 'Restart from Failed Step'

The below examples highlight the implications for each type of load plan step.

Serial Load Plan Step

Serial steps are represented by a horizontal icon in the Steps hierarchy in Load Plan Manager and by default have a Restart setting of Restart from Failure. In a case where the load plan fails when running such a step to load a Dimension Group with multiple serial sub-steps loading individual dimensions, the load plan on restart would start from the individual sub-step that failed. Any successfully completed serial sub-steps would not be run again.

Parallel Load Plan Step

Parallel steps are represented by a vertical icon in the Steps hierarchy in Load Plan Manager and by default have a Restart setting of Restart from Failed Children. In a typical run, a parallel step with five parallel sub-steps under it would have all five sub-steps executed in parallel, subject to free sessions being available. If two of those five steps completed and then the load plan were to fail, when the load plan was restarted all the steps that did not complete or failed would be started again.

Scenario Step

At the lowest order in any load plan are the scenario steps. While the parent steps, whether serial or parallel, are used to set the dependencies, the scenario steps are those which load the tables. A scenario step in turn could have one or more sub-steps, corresponding to the number of steps inside the package.

In the case of a scenario step failure during execution, consider that the scenario step may have multiple steps, all under the same session in Operator log but identified with different step numbers: 0, 1, 2, and so on. In the case of restart, the scenario would execute from the failed parent scenario step, re-running all sub-steps.

Note:

If you use the Load Plan Generator to generate any load plan it would automatically conform to the above standard. If you are manually altering a generated load plan or creating a new load plan without using Load Plan Generator, then you should ensure that you conform to the above standard.

3. Restarting Load Plans

Use ODI Studio or ODI Console to restart a load plan. This section describes how to restart load plans. It includes the following sections:

Restarting Using ODI Studio
Restarting Using ODI Console

Restarting Using ODI Studio

Follow this procedure to restart a load plan using ODI Studio. The restart option is enabled only on the last run for a load plan. A load plan can be restarted any number of times and each time it progresses from the last failure.

To restart a load plan using ODI Studio:

In ODI Operator, navigate to the Operator log and select the last failed run for a load plan.
Double-click the load plan run and select the Restart Option. You can also right-click the last run in the Operator log and select Restart.

Restarting Using ODI Console

Follow this procedure to restart a load plan using ODI Console. The restart option is enabled only on the last run for a load plan. A load plan can be restarted any number of times and each time it progresses from the last failure.

To restart a load plan using ODI Operator:

In ODI Console, navigate to Runtime > Sessions/Load Plan Executions and select the load plan execution that has failed.
Click the Restart button. The Restart button is displayed only when the selected load plan is the most recent run of the load plan.

1) Troubleshooting Load Plans

A load plan must be restarted when it has stopped with an error. An alternate case where restart may be required is when a load plan is not doing anything at all, for example when a load plan is executed and nothing has changed after 30 minutes. The following checklist can be used to assist in troubleshooting a non-responsive load plan.

1) Check the maximum number of sessions set to run against the agent. In ODI Operator, verify that the number of sessions running is equal to the maximum. If so, then the other sessions are waiting for the running sessions to complete. Proceed to the next step.

2) Clean out stale sessions. Stale sessions are sessions that are incorrectly left in a running state after an agent or repository crash. If an agent crashes or loses its connection the repository after it has started a session, it is not be able to update the status of the session in the repository, and such a session becomes stale. Until the stale session is cleaned it shows up as running in the repository but actually is not.

Stale sessions are cleaned in multiple ways. Some examples are listed below:

o You can manually request specific agents to clean stale sessions in Operator Navigator or Topology Navigator.

o Stale sessions are cleaned when you restart an agent.

o When an agent starts any new session, it checks for and resolves stale sessions. However, if the agent has lost connection to the repository, then it cannot clean stale sessions.

3) Check if the agent is alive. To test the agent to see if it is running and still has a connection to the repository, open it in the Topology Navigator in ODI Studio and select the Test tab. If the agent test fails, then restart the agent after fixing the issue.

4) Verify that the ODI Repository and the server hosting it are running and have not experienced a failure.

5) If your load plan is in error and you have verified all of the above, then restart the Load plan.

Alternate Options for Restarting Load Plans

This section describes alternate ways to approach restarting failed load plans. It includes the following sections:

Using Mark as Complete
Running a Scenario Standalone

Using Mark as Complete

In most cases the load plan restart method described earlier in this section is the recommended approach. This approach ensures data integrity and leaves no scope for manual error. However, at times you may want to run a load plan step manually. For example, if a step is inserting duplicate records which are causing failure, rerunning the step would still insert duplicates. In such a case, you may need to manually correct the data outside of the load plan and then skip that step when you restart the load plan. For this kind of situation, you can use the Mark as Complete option.

When you mark a load plan step as complete, it ensures that when the load plan is restarted, the marked step is not executed. It is then the responsibility of the person making this setting to ensure that the load for that step is carried out outside the load plan.

To mark a step as complete, right-click the step and select Mark As Complete. This can be done at the scenario step or at any step higher than that.

Marking a step complete at a higher level in the step hierarchy would mean that none of the child steps under that parent step would be executed upon load plan restart, even if they are otherwise eligible. For this reason, marking a step as complete should be treated as an advanced task and must be done only with a full understanding of its impact. There is no single recommendation that pertains in all cases, so the setting must be done carefully and only on a case-by-case basis.

Running a Scenario Standalone

When you are monitoring a load plan, you may not completely know how to fix a scenario step failure, but may wish to use the 'mark as complete' option for the failed scenario step instead of waiting for complete resolution. This prevents a step failure from precluding an entire load plan completing, while allowing you to inform the ETL team about the failed scenario step and work on a resolution. The ETL team might then fix the scenario and want to run it standalone outside the load plan to complete the load.

As in marking a step as complete, running a scenario standalone should be treated as an advanced task and the person running the scenario must be aware of the following:

A scenario run outside of a load plan by itself invokes the Table Maintenance process. This could, depending on the setting, truncate the table before the load.

To understand this, consider that when a scenario is run inside a load plan table maintenance tasks are carried out as explicit steps (the parent step name would be either Initialize or Finalize). The scenario by itself does not call the Table Maintenance process when run from within the load plan. Rather, this is controlled by the EXECUTION_ID variable, which is set to the load plan instance ID. If this variable has a value greater than 0 when a scenario is run, the Table Maintenance process is not invoked, as would be the case when a scenario is run from within a load plan with an instance ID. However, if this variable does not have a value greater than 0, then the scenario invokes the Table Maintenance process. This is the case when a scenario is run outside the load plan. If you set a value for the EXECUTION_ID when invoking the scenario from outside a load plan, the table maintenance steps would not be called.

A scenario step could have many variable values set, either dynamically in the case of a refresh variable or explicitly by overriding its value at that scenario step in the load plan. When running a scenario outside the load plan, all the scenario variables would have only their default values. For this reason, care should be taken to set the variables appropriately before calling a scenario from outside the load plan. You can check the variable values that are present in the load plan by looking at the Operator log, provided the log level was set to 6 when the load plan ran. The Oracle BI Applications Configuration Manager uses Oracle Diagnostic Logging. For information about managing log files and diagnostic data, see Oracle Fusion Middleware Administrator's Guide.

Related Features and Considerations

This section lists some of the Oracle BI Applications features that are related to restartability and describes some related considerations.

Using CKM to Filter Erroneous Data

If a scenario step is failing due to bad source data, it may sometimes be desirable to enable the CKM option to load the valid records and route the error records to a separate table. Examples of situations where this may be appropriate are the load of null values when they should have a value or data lengths longer than allowed target column lengths. Once the load completes, you could correct the erroneous data and have it automatically picked up in a subsequent load.

Note:

Use of CKM can slow the load considerably because every record and column could potentially be checked before loading. For this reason, this is not an option that you want to turn on across the entire load plan.

Regenerating a Scenario During a Load Plan Execution

Consider a case where a load plan is started and fails at a scenario step. You fix the issue and regenerate the scenario, then restart the load plan and may expect it to pick the new scenario, but this is not what happens. If a load plan has been started and a scenario regenerated, the regenerated scenario code is not picked up when the load plan is restarted. To force the regenerated scenario to be picked up, you have two options:

Start a new load plan run, accepting the overhead associated with restarting the load from the beginning.
Run the regenerated scenario as stand-alone outside the load plan, marking that scenario step as complete in the load plan before restarting the load plan.
Restarting Long Running Jobs

Consider a case where you have a scenario that takes two hours to run. The scenario fails at the insert new rows step, after loading the C$ and I$ steps. On restart, the scenario attempts to reload the C$ again. You instead want it to restart from the insert new rows steps only.This use is not supported. The restartability mechanism has been put in place in such a way that restarting a load plan is all you need to do. You do not need to clean up any data in between load plan executions, because the data is committed to the target table only after all the Knowledge Module steps are successful. If the load fails before complete success, no data is committed to that specific target table as part of the failed session. (Note: C$ and I$ tables are created afresh on restart and hence data to these tables would be committed in between).

Note:

New C$ and I$ tables are created on restart and hence data to these tables would be committed in between load plan start and restart.

Stale C$ and I$ Tables

On restart, new C$ and I$ tables are created, but since the previous load did not complete, these tables from the previous session are not dropped. The load plan generated using Load Plan Generator has a step at the end of the load plan called Clean Stale Work Tables which takes a variable called ETL_DTOP_STG_OLDER_THAN_DAYS whose default value is 30 days. When this step runs, it drops any C$ and I$ tables that are older than the specified variable value.

Note:

C$ and I$ tables are useful tables when you want to debug a failed scenario. It might not be advisable to set the ETL_DTOP_STG_OLDER_THAN_DAYS value as too small—for example, one day—as you might lose valuable information for debugging.

OBIA Training

Restarting Load Plans

No comments:

Post a Comment