Google
Information Storage and Retrieval: Looping of Workflows in Informatica

Pages

Monday, May 28, 2012

Looping of Workflows in Informatica

Scenario:

You have 3 Oracle source schemas: test_schema1, test_schema2 and test_schema3. Each schema contains 3 tables EMP, DEPT and MANUFACTURERS. The structure of tables is exactly same in each schema. You have a target schema test_schema_target which contains 3 tables: EMP_TARGET,DEPT_TARGET and MANUFACTURERS_TARGET. The stucture of tables is exactly similar as those to source schemas.

Now you have to provide an Informatica solution wherein the the tables in target schema are loaded from corresponding tables in source schema dynamically. i.e there should be only 1 set of mappings for each loading and this set should be called 3 times(since there are 3 source schemas). Everytime the schema name should should dynamically change.

Solution:

Assumption: The user in target schema has the access to tables in all 3 source schemas.

Create a control table as follows:


Build a mapping (m_par) which will select 1 row from the control table as follows:


The port v_Schema_Name will be set as SETVARIABLE($$schema_name,SCHEMA_NAME)

where $$schema_name is a mapping parameter. The source qualifier query is as follows:


SELECT
CONTROL_TABLE.SCHEMA_NAME,
CONTROL_TABLE.FLAG
FROM
 CONTROL_TABLE
WHERE
 CONTROL_TABLE.FLAG='N'
and rownum=1


The idea here is, to assign the value of schema name returned from above query to the mapping parameter $$schema_name. This mapping parameter will then be assigned to a workflow variable. This workflow variable then will be assigned to the mapping parameters of EMP, DEPT and MANUFACTURERS mappings. The flag will be updated for each successful mapping and the workflow will be called again using a CMD task. This whole process is explained below:

Now, build a mapping for each target: EMP_TARGET, DEPT_TARGET and MANUFACTURERS_TARGET.

The mapping screen shot for EMP is shown below:


The source qualifier query will be as follows:


SELECT
$$schema_name_e.EMP.EMPNO, $$schema_name_e.EMP.ENAME,
$$schema_name_e.EMP.JOB,
$$schema_name_e.EMP.MGR, $$schema_name_e.EMP.HIREDATE, $$schema_name_e.EMP.SAL,
$$schema_name_e.EMP.COMM, $$schema_name_e.EMP.DEPTNO
FROM
$$schema_name_e.EMP

Here $$schema_name_e is a mapping parameter defined in this mapping

There will be a Post-SQL on EMP_TARGET as follows:

update CONTROL_TABLE set FLAG='Y' where CONTROL_TABLE.SCHEMA_NAME='$$schema_name_e' and CONTROL_TABLE.TABLE_NAME='EMP'

Similar mapping will be created for DEPT and MANUFACTURERS also. these mappings will also have mapping parameters as $$schema_name_d and $$schema_name_m and will have the similar Source qualifier and Post-SQL in target queries.

The workflow (wf_child)  screen shot is shown below:


Here s_par is the session pointing to Control table mapping (m_par).
s_emp is pointing to EMP mapping.
s_dept is pointing to DEPT mapping.
s_manufacturers is pointing to MANUFACTURERS mapping.

In the properties of the workflow; create a wf variable as shown:



Now within the workflow, doble click on session s_par and go to Components tab:

Click on "Post session on Success Variable assignment" and assign the value $$Schema_name to $$schema_name1.




This effectively assigns the schema name extracted from control table to the workflow variable.

Create a decision task (dec_count) as $s_par.SrcSuccessRows>0

So, if there is any row pending in control_table, this workflow will proceed.

After dec_count place the sessions s_emp, s_dept and s_manufacturers in parallell. Double click on s_emp and go to components.

Set the "Pre-session variable assignment" as follows:


This effectively assigns the value of Workflow variable(derive in the previous step) to the mapping parameter
$$schema_name_e before the actual run of this session. As a result the source qualifier query will execute with this parameter value.

Repeat this exactly for s_manufacturers and s_dept to calculate the values of $$schema_name_m and $$schema_name_d.

There will be a final command task in this workflow cmd2 with the command as

pmcmd startworkflow -sv Local_IS -d "domain_name" -u "user_name" -p password -f "folder_name" wf_Parent

This calls another workflow wf_Parent(screen shot below) which in turn calls wf_Child.

The command task s_call contains a call to child workflow:

pmcmd startworkflow -sv Local_IS -d "domain_name" -u "user_name" -p password -f "folder_name" wf_Child

For both the workflows, set the property "Configure concurrent execution" as "Enabled"



The control table will keep on updating the value of FLAG column as the workflow runs for particular schema and table. After all schemas and tables are covered, the FLAG column will be Y for all rows and the loop will stop:



In this way, the Informatica workflow will run in a loop loading tables in different schemas.

Hope this helps.........

5 comments:

Prafull Dangore said...

hi..it was nice to read the explanation provided by u...only the question in my mind is how r u going to assign connection value to source thorugh mapping var/para?

Gaurav Kant Goel said...

The schema connection is provided in the overwritten query in sourec qualifier using mapping parameter. This mapping parameter value is extracted from control table on eavh run!

Dharti said...

I have the same doubt as Prafull .. The schema gets picked from the mapping parameter but at session level the connection will not change,right? Every schema has a different login atleast in Oracle. Can we dynamically change the connection too?

James Zicrov said...

I think Informatica is the best and most complex way of providing IT solutions and problems.I also feel this is the best business dashboard.

Informatica Read Rest API

Babu said...

You saved my day. I need this solution, and its rightly available because of your blog. Only Extraordinary people mentions, they are ordinary.