Google
Information Storage and Retrieval: Performance Tuning in Informatica

Pages

Wednesday, February 3, 2010

Performance Tuning in Informatica

Performance Tuning is done to improve session performance. To tune a session, performance bottlenecks are identified and eliminated. The bottlenecks should be looked in following places in the given order:

1. Target
2. Source
3. Mapping
4. Session
5. System

General Techniques to identify bottlenecks:

1. Thread statistics should be analysed to identify source, target or transformation bottlenecks. By default, the Integration Service uses 1 reader thread, 1 tranformation thread and 1 writer thread to process a session. The thread with the highest busy percentage identifies the bottleneck.

2. Add a filter transformation after each source qualifier. Set the filter condition to false so that no data is passed fromk filter. If the time to run the session remains about the same, there is a source bottleneck.

3. Add a filter transformation before each target definition. Set the filter condition to false so that no data is loaded into the target. If the time taken to run the session is almost same as before, there is a mapping bottleneck.

 General Techniques to improve session performance:

1. If a target bottleneck is identified, consider dropping key constraints and indexes on the target tables before the session is run. Rebuild those constraints and indexes after the session run.

2. Use bulk loading incase of insertion of large amount of data. When bulk loading, the Integration Service bypasses the database log which speeds up performance. But there is a trade-off here. Without writing to the database log, the target database can not perform rollback.

3. If multiple source tables are joined in one source qualifier, optimize the query by using hints and/or adding indexes on the joined columns, GROUP BY or ORDER BY columns. Configure the database to run parallel queries to improve performance.

4. Reduce unnecessary transformations in the mapping and delete unnecessary links between transformations to optimize the mapping.

5. If the session reads from a flat file source, the performance can be improved by setting the number of bytes the Integration Service reads per line.By default, the Integration service reads 1024 bytes per line. If each line in source file is less than the default setting, the 'line sequential buffer length' should be decreased in the session properties.

6. Use filters as early as possible in the mapping.

7. Eliminate unnecessary datatype conversions. Try to use integer values in place of other datatypes when performing comparisons using lookup and filter transformations.

8. The integration service reads expressions written with operators faster than the expressions with functions. Wherever possible, operators should be used. e.g in place of CONCAT function use || operator.

9. When Aggregator Transformation is used, following guidelines should be used -

  • The group by should be done on simple columns. If possible, the number type columns should be preffered for GROUP BY.

  • Use sorted input. When sorted input option is used, the Integration Service assumes that all data is sorted by group. It does not wait for the whole data to come into memory for aggrgation.

  • Use incemental aggrgation if the changes in source affect less than half the target. In this case, the Integration Service updates the target incrementally, rather than processing the entire source and recalculating the same calculations everytime you run the session.

  • Filter data before aggregation. This will avoid unnecessary aggregation.

  • Unncessary port connections should be avoided to reduce the amount of data the aggregator transformation stores in the data cache.
10. When using Joiner Transformation, following things should be considered:

  • In case of large data sets, configure joiner transformation to use sorted input. In this case, the Integration service improves performance by minimizing disk input/output. Also, designate the master source as the source with fewer duplicate key values. When Integration Service processes a sorted joiner transformation, it caches rows for 100 unique keys at a time. If the master source contains many rows with the same key value, the Integration Service needs to cache more rows and hence performance can be slowed.

  • Designate master source as the source with fewer rows. This is because during a session run, the joiner transformation compares the value of each row of the detail source against the master source. The fewer rows in the master, the fewer iterations of the join comparison occur, which speds the join process.
  • Perform join in the database or source qualifier transformation if possible
11. When using Lookup Transformation, following things should be considered:

  • If you have privileges to modify the database containing a lookup table, you can improve lookup initialization time by adding an index to the lookup table.
  • You can improve performance by indexing the columns in the lookup ORDER BY.
  • By default, the Integration Service generates an ORDER BY clause for a cached lookup. The ORDER BY clause contains all lookup ports. To increase performance, you can suppress the default ORDER BY clause and enter an override ORDER BY with fewer columns. Place two dashes ‘--’ as a comment notation after the ORDER BY clause to suppress the ORDER BY clause that the Integration Service generates.
  • If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:- Equal to (=),Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=), Not equal to (!=)
  • Improve session performance by caching small lookup tables.
  • If the lookup table is on the same database as the source table in the mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation.
12. If possible, run sessions and workflows concurrently to improve performance.

13. Increase the commit interval. By doing so, the number of times, Integration Service commits data decreases and hence performance is gained. However, if the commit interval is too high, the Integration Service will fill the database log file and cause the session to fail.

14. If the session runs if 'high precision' setting enabled, then disabling this property may reult in performance gain.

15. Set the proper error tracing level in session. e.g Verbose Data setting should only be used for debugging mappings and should be avoided in the normal run.

16. Use pipeline partitioning to improve session performance. Increasing the number of partitions or partition points increases the number of threads. If the Integration Service nodes contain enough CPU bandwidth, this approach can significantly improve performance.

3 comments:

Bernie said...

Just had to pop over and see who ED's 400th follower was.
You will love reading her blog, she is hilarious but don't tell her I said she has a beautiful heart as well......:-) Hugs

Anonymous said...

Thank you so much for being the 400th follower, it is very much appreciated...

Bernie - I caught you!!!!

James Zicrov said...

I feel Informatica is always the best way of providing complex IT solutions and hence it actually provides opportunity for creating dashboards and end reports for users.

Informatica Read Soap API