今天下午自己亲手尝试了ETL中增量抽取的一种方式:CDC。增量抽取三种方式:时间戳、CDC(Change data capture),全字段对比。非常感谢老头子分享ORACLE CDC操作手册。今天暂且不说这几种增量抽取是如何实现的。使用sql_trace命令对应用了CDC捕获改变数据与未设置CDC的表插入同样的数据进行性能对比。
因为各公司用到的CDC方案不太一样,所以现在将CDC方案添加到附件中,供参考~
首先,创建两张表:person 和 person_temp。person是已经创建好的CDC源表,person_temp 和 person表结构一致。(PS:两张表都为空)
对两张表执行insert操作:
insert into person select object_id,substr(object_name,1,0),'s' from dba_objects;
commit;--插入79551条数据
insert into person_temp select object_id,substr(object_name,1,0),'s' from dba_objects;
commit;--插入79551条数据
好了,剩下的就来查看两张表的执行计划和sql_trace文件追踪记录吧。
第一条SQL是对于person表的执行,之后简称SQL1;第二条SQL是对person_temp的执行,简称SQL2。
SQL1的执行计划主体:
SQL_ID g92fxb52xmv7p, child number 0
-------------------------------------
insert into person select object_id,substr(object_name,1,0),'s' from
dba_objects
Plan hash value: 4085989552
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | O/1/M |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 2 | | | 276 (100) | | 0 |00:00:26.19 | 857K| 5339 | | | |
| 1 | LOAD TABLE CONVENTIONAL | | 2 | | | | | 0 |00:00:26.19 | 857K| 5339 | | | |
| 2 | VIEW | DBA_OBJECTS | 2 | 71810 | 5540K| 276 (2) | 00:00:04 | 159K|00:00:01.07 | 5540 | 3041 | | | |
| 3 | UNION-ALL | | 2 | | | | | 159K|00:00:01.02 | 5540 | 3041 | | | |
|* 4 | FILTER | | 2 | | | | | 159K|00:00:00.92 | 5538 | 3039 | | | |
|* 5 | HASH JOIN | | 2 | 80041 | 8598K| 275 (2) | 00:00:04 | 161K|00:00:00.75 | 2004 | 1223 | 1079K| 1079K| 2/0/0|
| 6 | INDEX FULL SCAN | I_USER2 | 2 | 107 | 428 | 1 (0) | 00:00:01 | 214 |00:00:00.01 | 2 | 0 | | | |
|* 7 | HASH JOIN | | 2 | 80041 | 8285K| 273 (2) | 00:00:04 | 161K|00:00:00.51 | 2002 | 1223 | 981K| 981K| 2/0/0|
| 8 | INDEX FULL SCAN | I_USER2 | 2 | 107 | 2354 | 1 (0) | 00:00:01 | 214 |00:00:00.01 | 2 | 0 | | | |
|* 9 | TABLE ACCESS FULL | OBJ$ | 2 | 80041 | 6565K| 272 (2) | 00:00:04 | 161K|00:00:00.33 | 2000 | 1223 | | | |
|* 10 | TABLE ACCESS BY INDEX ROWID| IND$ | 17724 | 1 | 8 | 2 (0)| 00:00:01 | 15190 |00:00:00.12 | 3534 | 1816 | | | |
|* 11 | INDEX UNIQUE SCAN | I_IND1 | 17724 | 1 | | 1 (0)| 00:00:01 | 17724 |00:00:00.04 | 494 | 38 | | | |
| 12 | NESTED LOOPS | | 0 | 1 | 29 |2 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 13 | INDEX FULL SCAN | I_USER2 | 0 | 1 | 20 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 14 | INDEX RANGE SCAN | I_OBJ4 | 0 | 1 | 9 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | |
| 15 | NESTED LOOPS | | 2 | 1 | 83 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 2 | 2 | | | |
| 16 | INDEX FULL SCAN | I_LINK1 | 2 | 1 | 79 | 0 (0)| | 0 |00:00:00.01 | 2 | 2 | | | |
|* 17 | INDEX RANGE SCAN | I_USER2 | 0 | 1 | 4 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL2的执行计划主体:
请对比一下SQL1 和 SQL2执行计划的Cost (%CPU) 以及 A-Time,发现差异很大,为什么?
虽然自己猜想对于CDC源表person来说,CBO除了要插入数据到person表之外,还要维护person_ct(person的改变表),是否是这个原因导致两个执行计划如此大差异呢?
问题先保留给各位大神,希望能抛砖引玉- -
下面继续看SQL1 和 SQL2的sql_trace
SQL1的sql_race如下:
SQL ID: g92fxb52xmv7p
Plan Hash: 4085989552
insert into person select object_id,substr(object_name,1,0),'s' from
dba_objects
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 9.25 9.19 1754 2925 83431 79551
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 9.25 9.19 1754 2925 83431 79551
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 99
Rows Row Source Operation
------- ---------------------------------------------------
0 LOAD TABLE CONVENTIONAL (cr=85862 pr=3936 pw=0 time=0 us)
79551 VIEW DBA_OBJECTS (cr=2770 pr=1673 pw=0 time=515796 us cost=276 size=5672990 card=71810)
79551 UNION-ALL (cr=2770 pr=1673 pw=0 time=488298 us)
79551 FILTER (cr=2769 pr=1672 pw=0 time=446093 us)
80818 HASH JOIN (cr=1002 pr=757 pw=0 time=367001 us cost=275 size=8804510 card=80041)
107 INDEX FULL SCAN I_USER2 (cr=1 pr=0 pw=0 time=0 us cost=1 size=428 card=107)(object id 47)
80818 HASH JOIN (cr=1001 pr=757 pw=0 time=246805 us cost=273 size=8484346 card=80041)
107 INDEX FULL SCAN I_USER2 (cr=1 pr=0 pw=0 time=0 us cost=1 size=2354 card=107)(object id 47)
80818 TABLE ACCESS FULL OBJ$ (cr=1000 pr=757 pw=0 time=165860 us cost=272 size=6723444 card=80041)
7595 TABLE ACCESS BY INDEX ROWID IND$ (cr=1767 pr=915 pw=0 time=0 us cost=2 size=8 card=1)
8862 INDEX UNIQUE SCAN I_IND1 (cr=247 pr=19 pw=0 time=0 us cost=1 size=0 card=1)(object id 41)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=2 size=29 card=1)
0 INDEX FULL SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us cost=1 size=20 card=1)(object id 47)
0 INDEX RANGE SCAN I_OBJ4 (cr=0 pr=0 pw=0 time=0 us cost=1 size=9 card=1)(object id 39)
0 NESTED LOOPS (cr=1 pr=1 pw=0 time=0 us cost=1 size=83 card=1)
0 INDEX FULL SCAN I_LINK1 (cr=1 pr=1 pw=0 time=0 us cost=0 size=79 card=1)(object id 137)
0 INDEX RANGE SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 47)
SQL2的sql_trace跟踪如下:
SQL ID: 5psjv0u50tsdm
Plan Hash: 4085989552
insert into person_temp select object_id,substr(object_name,1,0),'s' from
dba_objects
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.22 0.23 1690 3123 2295 79551
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.23 0.24 1690 3123 2295 79551
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99
Rows Row Source Operation
------- ---------------------------------------------------
0 LOAD TABLE CONVENTIONAL (cr=3235 pr=1691 pw=0 time=0 us)
79551 VIEW DBA_OBJECTS (cr=2770 pr=1684 pw=0 time=224453 us cost=276 size=5672990 card=71810)
79551 UNION-ALL (cr=2770 pr=1684 pw=0 time=212815 us)
79551 FILTER (cr=2769 pr=1683 pw=0 time=179563 us)
80818 HASH JOIN (cr=1002 pr=768 pw=0 time=161122 us cost=275 size=8804510 card=80041)
107 INDEX FULL SCAN I_USER2 (cr=1 pr=0 pw=0 time=106 us cost=1 size=428 card=107)(object id 47)
80818 HASH JOIN (cr=1001 pr=768 pw=0 time=97723 us cost=273 size=8484346 card=80041)
107 INDEX FULL SCAN I_USER2 (cr=1 pr=0 pw=0 time=106 us cost=1 size=2354 card=107)(object id 47)
80818 TABLE ACCESS FULL OBJ$ (cr=1000 pr=768 pw=0 time=57891 us cost=272 size=6723444 card=80041)
7595 TABLE ACCESS BY INDEX ROWID IND$ (cr=1767 pr=915 pw=0 time=0 us cost=2 size=8 card=1)
8862 INDEX UNIQUE SCAN I_IND1 (cr=247 pr=18 pw=0 time=0 us cost=1 size=0 card=1)(object id 41)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=2 size=29 card=1)
0 INDEX FULL SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us cost=1 size=20 card=1)(object id 47)
0 INDEX RANGE SCAN I_OBJ4 (cr=0 pr=0 pw=0 time=0 us cost=1 size=9 card=1)(object id 39)
0 NESTED LOOPS (cr=1 pr=1 pw=0 time=0 us cost=1 size=83 card=1)
0 INDEX FULL SCAN I_LINK1 (cr=1 pr=1 pw=0 time=0 us cost=0 size=79 card=1)(object id 137)
0 INDEX RANGE SCAN I_USER2 (cr=0 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 47)
发现两个sql_trace文件各项指标差异也很大,到底是什么原因导致两个SQL性能的差异,还请各位大神详细指点~
PS:第一次写博客,纠结了很久到底是要写博客还是写一个问题,最后下定决心写出来,不要笑话新手哦~