Informatica优化

浏览: 2203

原文:Mapping design optimization techniques

Challenge

Optimizing PowerCenter to create an efficient execution environment

 优化驱动中心,创造高效的执行环境

Description

Although PowerCenter environments vary widely, most sessions and/or mappings can benefit from the implementation of common objects and optimization procedures. Follow these procedures and rules of thumb when creating mappings to help ensure optimization.

 尽管驱动中心环境设置复杂,大多数会话和映射还是可以从普通对象实现和优化规程中得到性能改善。创建映射时遵循下面这些规程和经验法则可以保证优化。

General Suggestions for Optimizing

  1. Reduce the number of transformations. There is always      overhead involved in moving data between transformations. 减少转换的数量。通常转换间涉及了不必要的数据移动。
  2. Consider more shared memory for large number of      transformations. Session shared memory between 12MB and 40MB should      suffice. 考虑给转换更大的共享内存。应满足会话共享内存大小在12MB到40MB之间。
  3. Calculate once, use many times. 计算一次,多处复用
    • Avoid calculating or testing the same value over and       over. 避免重复计算
    • Calculate it once in an expression, and set a       True/False flag. 只在表达式中计算一次,然后设置真假标志位
    • Within an expression, use variable ports to calculate       a value than can be used multiple times within that transformation. 表达式中,使用变量端口去计算一个值,以便在该整个转换中多处复用
  4. Only connect what is used. 只连接需要用到的
    • Delete unnecessary links between transformations to       minimize the amount of data moved, particularly in the Source Qualifier. 删除转换间不必要的连接,减少数据移动,尤其是在源限定符转换中。
    • This is also helpful for maintenance. If a       transformation needs to be reconnected, it is best to only have necessary       ports set as input and output to reconnect. 一个有利于维护的技巧是,转换需要重新连接时,最好设置必须的端口作为输入输出。
    • In lookup transformations, change unused ports to be       neither input nor output. This makes the transformations cleaner looking.       It also makes the generated SQL override as small as possible, which cuts       down on the amount of cache necessary and thereby improves performance. 在查找表中,不用的端口不要设置成注入或者输出。这使得查找更加简洁,也生成了更小的SQL,降低了缓存的使用和提高性能。
  5. Watch the data types. 审查数据类型
    • The engine automatically converts compatible types. 引擎自动转换不兼容的类型
    • Sometimes data conversion is excessive. Data types are       automatically converted when types are different between connected ports.       Minimize data type changes between transformations by planning data flow       prior to developing the mapping.  有时数据转换是过度的。数据类型自动在两个不同的连接端口中进行类型转换。开发映射前,计划数据流向,减小数据类型的变化。
  6. Facilitate reuse. 促进复用
    • Plan for reusable transformations upfront. 先计划复用转换
    • Use variables. Use both mapping variables as well as       ports that are variables. Variable ports are especially beneficial when       they can be used to calculate a complex expression or perform a       disconnected lookup call only once instead of multiple times 使用变量
    • Use mapplets to encapsulate multiple reusable       transformations. 使用映射集合封装多个可重用转换
    • Use mapplets to leverage the work of critical       developers and minimize mistakes when performing similar functions. 执行相似功能时使用映射集合影响临界开发者的工作,减少错误。
  7. Only manipulate      data that needs to be moved and transformed.  只操作必要的数据
    • Reduce the number of non-essential records that are       passed through the entire mapping. 减少不必要的记录在数据流中流动
    • Use active transformations that reduce the number of       records as early in the mapping as possible (i.e., placing filters,       aggregators as close to source as possible). 尽可能今早减少数据流量(将过滤器,聚合器放置在贴近源的地方)
    • Select appropriate driving/master table while using       joins. The table with the lesser number of rows should be the       driving/master table for a faster join.  使用连接时选择合适的驱动表和主表。
  8. Utilize single-pass reads. 利用一次性读
    • Redesign mappings to utilize one Source Qualifier to       populate multiple targets. This way the server reads this source only       once. If you have different Source Qualifiers for the same source (e.g.,       one for delete and one for update/insert), the server reads the source       for each Source Qualifier. 重新设计映射,利用一个源限定符转换去填充多个目标。这种方法服务器只读取一次源。如果同一个源有不同的源限定符转换,
    • Remove or reduce field-level stored procedures. 移除或者减少字段级别的存储过程
    • If you use field-level stored procedures, the       PowerCenter server has to make a call to that stored procedure for every       row, slowing performance. 如果使用字段级别的存储过程,驱动中心服务器需要每行都调用相应的存储过程,降低了性能。

Lookup Transformation Optimizing Tips 查找表转换优化技巧

  1. When your source is large, cache lookup table columns for      those lookup tables of 500,000 rows or less. This typically improves      performance by 10 to 20 percent. 当你的源很大时,缓存查找表列数低于50万。这将很明显提升10%-20%性能。
  2. The rule of thumb is not to cache any table over      500,000 rows. This is only true if the standard row byte count is 1,024 or      less. If the row byte count is more than 1,024, then the 500k rows will      have to be adjusted down as the number of bytes increase (i.e., a 2,048      byte row can drop the cache row count to between 250K and 300K, so the      lookup table should not be cached in this case). This is just a general      rule though. Try running the session with a large lookup cached and not      cached. Caching is often still faster on very large lookup tables.
  3. When using a Lookup Table Transformation, improve      lookup performance by placing all conditions that use the equality      operator = first in the list of conditions under the condition tab.
  4. Cache only lookup tables if the number of lookup calls      is more than 10 to 20 percent of the lookup table rows. For fewer number      of lookup calls, do not cache if the number of lookup table rows is large.      For small lookup tables(i.e., less than 5,000 rows), cache for more than 5      to 10 lookup calls.
  5. Replace lookup with decode or IIF (for small sets of      values).
  6. If caching lookups and performance is poor, consider      replacing with an unconnected, uncached lookup.
  7. For overly large lookup tables, use dynamic caching      along with a persistent cache. Cache the entire table to a persistent file      on the first run, enable the update else insert option on the dynamic      cache and the engine will never have to go back to the database to read      data from this table. You can also partition this persistent cache at run      time for further performance gains.
  8. Review complex expressions.

 

  • Examine mappings via Repository Reporting and      Dependency Reporting within the mapping.
  • Minimize aggregate function calls.
  • Replace Aggregate Transformation object with an      Expression Transformation object and an Update Strategy Transformation for      certain types of Aggregations.


Operations and Expression Optimizing Tips 操作和表达式优化技巧

  1. Numeric operations are faster than string operations. 数值操作快于字符操作
  2. Optimize char-varchar comparisons (i.e., trim spaces      before comparing). 优化字符比较
  3. Operators are faster than functions (i.e., || vs.      CONCAT). 操作符快于函数
  4. Optimize IIF expressions. 优化IIF表达式
  5. Avoid date comparisons in lookup; replace with string. 避免在查找表中进行日期比较,使用字符串
  6. Test expression timing by replacing with constant.
  7. Use flat files.

 

  • Using flat files located on the server machine loads      faster than a database located in the server machine.
  • Fixed-width files are faster to load than delimited      files because delimited files require extra parsing.
  • If processing intricate transformations, consider      loading first to a source flat file into a relational database, which      allows the PowerCenter mappings to access the data in an optimized fashion      by using filters and custom SQL Selects where appropriate.
  1. If working with data that is not able to return sorted      data (e.g., Web Logs), consider using the Sorter Advanced External      Procedure.
  2. Use a Router Transformation to separate data flows      instead of multiple Filter Transformations.
  3. Use a Sorter Transformation or hash-auto keys      partitioning before an Aggregator Transformation to optimize the      aggregate. With a Sorter Transformation, the Sorted Ports option can be      used, even if the original source cannot be ordered.
  4. Use a Normalizer Transformation to pivot rows rather      than multiple instances of the same target.
  5. Rejected rows from an update strategy are logged to the      bad file. Consider filtering before the update strategy if retaining these      rows is not critical because logging causes extra overhead on the engine.      Choose the option in the update strategy to discard rejected rows.
  6. When using a Joiner Transformation, be sure to make the      source with the smallest amount of data the Master source.
  7. If an update override is necessary in a load, consider      using a Lookup transformation just in front of the target to retrieve the      primary key. The primary key update will be much faster than the      non-indexed lookup override.

Suggestions for Using Mapplets

A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as necessary. Use the Mapplet Designer to create mapplets.

  1. Create a mapplet when you want to use a standardized      set of transformation logic in several mappings. For example, if you have      several fact tables that require a series of dimension keys, you can      create a mapplet containing a series of Lookup transformations to find      each dimension key. You can then use the mapplet in each fact table      mapping, rather than recreate the same lookup logic in each mapping.
  2. To create a mapplet, add, connect, and configure      transformations to complete the desired transformation logic. After you      save a mapplet, you can use it in a mapping to represent the      transformations within the mapplet. When you use a mapplet in a mapping,      you use an instance of the mapplet. All uses of a mapplet are tied to the      parent mapplet. Hence, all changes made to the parent mapplet logic are      inherited by every child instance of the mapplet. When the server runs a      session using a mapplet, it expands the mapplet. The server then runs the      session as it would any other session, passing data through each transformation      in the mapplet as designed.
  3. A mapplet can be active or passive depending on the      transformations in the mapplet. Active mapplets contain at least one      active transformation. Passive mapplets only contain passive      transformations. Being aware of this property when using mapplets can save      time when debugging invalid mappings.
  4. Unsupported transformations that should not be used in      a mapplet include: COBOL source definitions, normalizer, non-reusable      sequence generator, pre- or post-session stored procedures, target      definitions, and PowerMart 3.5-style lookup functions.
  5. Do not reuse mapplets if you only need one or two      transformations of the mapplet while all other calculated ports and      transformations are obsolete.
  6. Source data for a mapplet can originate from one of two      places:

 

  • Sources within the mapplet . Use one or more source definitions connected to a      Source Qualifier or ERP Source Qualifier transformation. When you use the      mapplet in a mapping, the mapplet provides source data for the mapping and      is the first object in the mapping data flow.
  • Sources outside the mapplet . Use a mapplet Input transformation to define input      ports. When you use the mapplet in a mapping, data passes through the      mapplet as part of the mapping data flow.
  1. To pass data out of a mapplet, create mapplet output      ports. Each port in an Output transformation connected to another      transformation in the mapplet becomes a mapplet output port.

 

  • Active mapplets with more than one Output      transformations. You need one target in the mapping      for each Output transformation in the mapplet. You cannot use only one      data flow of the mapplet in a mapping.
  • Passive mapplets with more than one Output      transformations. Reduce to one Output      Transformation; otherwise you need one target in the mapping for each      Output transformation in the mapplet. This means you cannot use only one      data flow of the mapplet in a mapping.
推荐 0
本文由 wffger 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

0 个评论

要回复文章请先登录注册