| hai's profileOracle & StarcraftBlogLists | Help |
|
January 24 hash join 与 10104 eventOPTIMAL MODE:
Optimal方式的hash join相对来说非常简单,在这个例子(optimal.doc)里面,一共需要2个partition,这也是hash join中最小的partition分配数,随着work area的增大,partition的数目也会增多,partition的分配规律遵守2的n次方这样的规则分配。在经过build table的scan以后,hash table被建立,每条记录被hash后对应的hash bucket也被填充,与此相对应的bitmap vector也bit位也被设置,这个时候开始probe table的scan,首先通过同样的hash function对join key进行运算,然后对比bitmap vector中的bit是否被标志,如果没被标志那么表示这条记录不符合关联的要求直接被丢弃。如果bit位被设置,那么表示这个值可能与hash bucket中保存的值一致,因为hash算法会有碰撞,所以一个bucket中会有不同的值存在,所以将会对比这个值和hash bucket中保存的值是不是相同,如果相同那么输出给客户端,等probe table扫描完毕,所有结果都输出给客户端。
optimal.doc 点击下载trace 文件
ONEPASS MODE:
### Partition Distribution ### Partition:0 rows:49933 clusters:5 slots:1 kept=0 Partition:1 rows:49674 clusters:5 slots:1 kept=0 Partition:2 rows:49943 clusters:5 slots:4 kept=0 Partition:3 rows:50451 clusters:5 slots:5 kept=1
onepass.doc 点击下载trace 文件
MULTIPASS MODE:
### Partition Distribution ### Partition:0 rows:99876 clusters:10 slots:1 kept=0 Partition:1 rows:100125 clusters:10 slots:4 kept=0 首先是build table的scan, Work area里面容纳了partition 1的4个slot和partition 0的一个slot,另外还有一个slot是为了i/o用的。每当一个slot(可能是一个block)满后将会被写出到磁盘,等build table scan完毕,磁盘上保存了2个partition的内容,并且bitmap vector也被建立了。这时开始了probe table的scan,系统为probe table的每个partition分配了一个slot,还分配一个slot作为读入probe table的缓冲,通过bitmap vector的过滤,符合条件的记录被填充到probe partition的slot中并与build table partition的slot来比较确切值。这个步骤完了以后work area里面应该存在一个build table partition 0的slot,一个build table partition 1的slot,一个写出缓冲的slot,一个读入缓冲的slot,一个probe table partition 0的slot,一个probe table partition 1的slot,一共6个slot,刚好填满所有可分配的slot。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取partition pair开始join,经过10次probe partition的读取。partition 0也是经过了10次probe partition的读取,这就带来了很多i/o,导致hash join的性能急剧下降。Oracle在进行multipass的过程中会有2次hash function的存在。Oracle将会读入磁盘上的build table partition再进行一次hash生成一些subpartition,这样的话每次读入probe table subpartition即可,而不用多次读入probe partition导致过大的i/o.不过在这个trace文件中并没有发现这个过程。的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。里面容纳了的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。另外还有一个trace文件反映了ROLE REVERSAL,oracle在进行join时会去评估build partition和probe partition的大小,如果发行probe partition小于build partition,那么会对换两者的角色,对原来的probe partition建立hash table,拿原来的build partition来匹配,这种方法也在一定程度上减少了i/o,提高了效率。
*** HASH JOIN GET FLUSHED PARTITIONS (PHASE 2) *** Getting a pair of flushed partions. BUILD PARTION: nrows:24948 size=(3 slots, 384K) PROBE PARTION: nrows:12410 size=(2 slots, 256K) ROLE REVERSAL OCCURRED
multipass.doc 点击下载trace 文件 上面这些只是本人基于trace文件对hash join运行模式的推断,不一定能反正hash join真实的运行情况,所以欢迎大家来讨论! November 03 运用transport tablespace进行快速oracle版本升级(9i-10g)大家可能都知道在进行oracle数据库版本升级的时候会有2种方式 1.通过dbua(database upgrade assistant) 2.exp/imp 通过dbua来升级的话由于不涉及到数据文件的改变,所以速度会比较快,但是如果dbua一旦在升级的过程中出现问题可能会导致原来的库不可用。 而通过exp/imp虽然对原来的库不会有影响,但是如果数据库比较大的话那么升级的时间将会是不可接受的(尤其对24*7)的应用来说。 针对这个问题,这次的2006 oracle openworld法国的amadeus公司提供了一个非常有创意的点子,就是利用dataguard和transport tablespace功能来实现最短时间内的安全升级。 首先让我们来了解一下amadeus公司
艾玛迪斯全球旅游分销系统公司(Amadeus Global Travel Distribution SA)是全球领先的旅游行业技术及分销供应商。1987年艾玛迪斯总部建立于西班牙马德里。在 Sophia Antipolis(法国尼斯附近)和美国波士顿设立有市场及开发部门。公司的数据中心位于德国慕尼黑附近的Erding。公司提供各种先进的旅游行业技术解决方案,至今已成为成长最快并被最广泛使用的全球分销系统(GDS)。 作为卓越的技术合作伙伴,艾玛迪斯把最先进的信息技术带入旅游行业,使众多的旅游供应商、休闲及商务旅游服务商从中获益。通过设立服务于当地市场的national marketing companies(NMCs),艾玛迪斯用其庞大的信息技术资源向全世界200个国家和地区提供优质的技术解决方案。 我们再来看一下跟它们的数据库相关的信息 他们的业务系统达到99.99%的可用率,每秒钟有30万次的数据库请求,每天有2亿8千万次transaction,这是一个相当大的数据库系统,如果用dbua或者exp/imp他们都不能接受升级的风险,于是他们的技术人员就想出了用dataguard和transport tablespace功能来实现最短时间内的安全升级。 具体的实现方法是这样的 1.先为主库建立一个dataguard数据库(可以在线做) 2.在dataguard库上安装10g软件(可以在线做) 3.整理一些不能通过transport tablespace搞定的东西,比如sequence,synonyms,grants...... 4.停止主库这边所有write的应用,提供read的服务(写入停止,提供查询) 5.强制归档主库redo log并传到dataguard恢复(写入停止,提供查询) 6.利用transport tablespace来转换数据库版本,并创建sequencee,synonyms,grants等(写入停止,提供查询)。 7.验证新环境的过程,在验证过程中如果发现有问题,则可以切换会原来的系统(写入停止,提供查询)。 8.切换应用到10g数据库(提供服务) amadeus在演习时做到10分钟内完成4,5,6,7并成功切换了系统,考虑到他们的数据库繁忙程度和数据库容量非常大,这真是一项伟大的成就。我们可以在以后的数据库版本的升级过程中借鉴他们的方法。
我们再从技术上验证一下transport tablespace可以运用在版本升级 在9i的库上创建一个test tablespace create tablespace test 创建一张表在test表空间上 create table test1(a number) tablespace test; insert into test1 values(1); SQL 9i>select * from test1; A
把test表空间置为read only模式
到处test tablespace的metadata exp \'sys/sys as sysdba\' transport_tablespace=y tablespaces=(TEST) file=test.dmp log=test.log 传输dmp文件和数据文件(在amadeus的案例里面由于10g的库和9i的库在同一台机器上,所以避免了拷贝数据文件的时间,这也是整个方案的重点之一)到远程 scp test.dmp oracle@10.0.100.115:/opt/oracle/ scp /opt/oracle/test.dbf oracle@10.0.100.115:/opt/oracle/
在目标库上导入metadata数据 imp \'sys/sys as sysdba\' transport_tablespace=y tablespaces=(TEST) file='/opt/oracle/test.dmp' datafiles= ('/opt/oracle/test.dbf') tts_owners=test fromuser=test touser=test log=tts_i.log
查看test1表,发现数据一致
A 把test表空间置为read write模式 alter tablespace test read write; insert into test1 values(2);
A 一切正常,测试完毕
这个测试简单的模仿了transport tablespace升级数据库的可能性,当然在实际过程中我们要校验是否自包含表空间,是否需要创建sequence等,但是总体来说这种方案能提供最短时间内的数据库版本升级。 October 16 itl deadlock昨天有同事碰到itl竞争导致的deadlock,这在以前比较少见,那么在这里我们来作个实验重现一下,
从理论上解释为什么itl竞争会导致deadlock。 create table test(a number); insert into test values(1);
SQL 10G>insert into test select * from test;
1 row created.
SQL 10G>/
2 rows created.
SQL 10G>/
4 rows created.
SQL 10G>/
8 rows created.
SQL 10G>/
16 rows created.
SQL 10G>/
32 rows created.
SQL 10G>/
64 rows created.
SQL 10G>/
128 rows created.
SQL 10G>/
256 rows created.
SQL 10G>/
512 rows created.
SQL 10G>/
1024 rows created.
SQL 10G>/
2048 rows created.
SQL 10G>/
4096 rows created.
SQL 10G>/
8192 rows created.
SQL 10G>/
16384 rows created.
SQL 10G>commit;
Commit complete.
SQL 10G>alter table test move pctfree 0;
Table altered.
move表,缩小pctfree为0使表不能创建多余的itl
SQL 10G>select ini_trans from dba_tables where owner='TEST' and table_name='TEST';
INI_TRANS
---------- 1 Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0010.023.0000229f 0x01c01177.05f5.1c C--- 0 scn 0x0005.e88ed5fb 0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000 0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000 这里可以看到虽然dba_tables里面的ini_trans为1,但是实际上block里面默认的itl是3个,从下面的实验中可以发现。
SQL 10G>select distinct dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block# from test order by block#; BLOCK#
---------- 61612 61613 61614 61615 61616 61618 61619 61620 61621 61622 61623 BLOCK#
---------- 61624 61626 61627 61628 61629 61630 61631 61632 61634 61635 72057 BLOCK#
---------- 72058 72059 72060 72061 72062 72063 72064 72065 72066 72067 72068 BLOCK#
---------- 72069 72070 72071 72072 72817 72818 72819 72820 72821 72822 72823 BLOCK#
---------- 72824 45 rows selected.
我们挑2个block来做实验,block 61612,block 61613
session 1: update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1; update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1; session1更新分别更新61612,61613的第2行,分别占据61612,61613的一个itl entry
session 2:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2; update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2; session2更新分别更新61612,61613的第3行,分别占据61612,61613的另一个itl entry
session 3:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0; session3 更新61612的第1行,占据61612的第3个itl entry
session 4: update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3; session4 更新61613的第4行,占据61613的第3个itl entry,这个时候2个block的itl entry都被用完,
每个block itl都是3个并且不能扩展 session 4: update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3; session4 更新61612的第4行,这时候出现"enq: TX - allocate ITL entry"等待
session 3:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0; session3 更新61613的第1行,出现"enq: TX - allocate ITL entry"等待
session 4:
SQL 10G>update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3; update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3 * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource 而当session3 更新61613的第1行时,session4这边报出了deadlock,看下图
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-0013002e-0001ac43 18 154 X 19 143 S TX-0014002a-0001c990 19 143 X 18 154 S session 154: DID 0001-0012-0001B9E7 session 143: DID 0001-0013-00000DB7 session 143: DID 0001-0013-00000DB7 session 154: DID 0001-0012-0001B9E7 Rows waited on: Session 143: no row Session 154: no row Information on the OTHER waiting sessions: Session 143: pid=19 serial=60295 audsid=6783 user: 55/TEST O/S info: user: oracle, term: pts/1, ospid: 664, machine: csdba program: sqlplus@csdba (TNS V1-V3) application name: SQL*Plus, hash value=3669949024 Current SQL Statement: update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0 End of information on OTHER waiting sessions. September 25 autotrace,explain plan与绑定变量create table fenbu as select 1 id,'Y' flag from dba_objects where rownum<100001; insert into fenbu values(1,'N'); create index IDX_FENBU_FLAG on fenbu(flag); analyze table fenbu compute statistics for table for all columns for all indexes; var a varchar2(32); SQL 10G>set autotrace trace exp; SQL 10G>alter session set events'10046 trace name context forever,level 12'; Session altered. SQL 10G>select * from fenbu where flag=:a;
---------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("FLAG"=:A) Note
Session altered.
很显然可以看到set autotrace的执行计划是错的,这是因为set auotrace,explain plan等操作
Misses in library cache during parse: 1 Rows Row Source Operation September 01 oracle compress table II上一篇文章提到压缩表发生update后会导致行迁移,但是在上篇文章里面没有做过多描述 首先创建测试表 create table test2(a varchar2(10),b varchar2(10),c varchar2(10)); begin 对压缩表添加一个列 SQL 10G>SQL 10G>SQL 10G> alter table testcom4 add d number; Table altered. 定位到一条记录,找出所在文件号,块号,文件号和rowid SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#, FILE# BLOCK# ROW# SQL 10G>select rowid from testcom4 where rownum<2; ROWID 更新这条记录 SQL 10G>update testcom4 set d=1 where rowid='AAAT9AAAMAAAO+0AAA'; 1 row updated. SQL 10G>commit; Commit complete. SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#, FILE# BLOCK# ROW# SQL 10G>alter system dump datafile 12 block 61364; System altered. perm_9ir2[3]={ 2 0 1 } SQL 10G>select dbms_utility.DATA_BLOCK_ADDRESS_FILE(to_number('300f085','xxxxxxxxxx')) FILE# BLOCK# dump新的block SQL 10G>alter system dump datafile 12 block 61573; System altered. block_row_dump: 可以看到新的block里面已经是非压缩的数据格式了,从这里可以看出对压缩表的更新确实是会导致 SQL 10G>alter table testcom4 drop column d; 报错了,提示“unsupported add/drop column operation on compressed tables” --------------------------- SQL 10G>alter table testcom4 set unused column d; Table altered. SQL 10G>alter table testcom4 drop unused columns; 希望下次下载一个patch可以解决这些问题。 August 29 library cache lock wait event说起library cache lock wait event就必须先说一下library cache lock是什么东西,library lock是oracle用来对library cache object进行并发控制的两种数据结构之一,另外一种数据结构是pin,lock将会在pin之前获得,并被加载到library cache handle上。library cache lock有3种形式,share,null,exclusive,具体的library cache lock的分析请看我的另一篇帖子。
通常我们举的关于library cache lock wait event的例子都是用procedure来模拟的,eygle发表过这方面的文章,今天我举的是另一个例子,也是一次真实的事故,事故发生的原因就是因为ddl导致cursor invalidation,但是ddl持续运行了很长时间,导致后来的session需要parse sql的时候不能获得library cache lock,进一步导致app服务器的connection pool满,导致应用crash。下面我们用一个简单的例子模拟一下这个场景。
session 1:
SQL 10G>alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;
Table altered.
session 2:
SQL 10G>select sid from v$mystat where rownum<2;
SID
---------- 142 SQL 10G>select count(*) from test; session 3: SQL 10G> select event from v$session_wait where sid=142; EVENT -------------------------------------------------------------------------------- library cache lock 当test表过大,创建uk constraint时间过长时会有越来越多的session等待library cache lock从而进一步导致应用crash,这样提醒我们当对大表进行ddl时一定要分解ddl操作,不要一句语句搞定所有操作,象alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;这种语句虽然非常便捷,但是不一定适合在大并发的时候去执行。我们应该把这个操作分解成几步来做,
1. create index ind_test on test(a,b,c,d,e) online compute statistics;
2. alter table test add constraint uk_test unique(a,b,c,d,e) using index ind_test novalidate;
3. 校验表里面的数据是否符合唯一约束
4. alter table test modify constraint uk_test validate;
这里的要点就是要把每一个ddl分解成小的操作来执行。值得指出的一点是alter table test modify constraint uk_test validate;这个步骤虽然会可能会耗时非常久,但是这个步骤是不会加载library cache exclusive lock的,所以它不会导致其他session等待library cache lock。但是constraint的状态从disable到enable的转换则还是会加载library cache exclusive lock,导致其他进程等待library cache lock。oracle可能在这个地方做了不同的处理,针对disable->enable和novalidate->validate采取了不同的lock类型,显然我们也可以知道disable->enable和novalidate->validate对library cache object的影响是不一样的,一个是可以说是从无到有,一个是只是改变了一下状态,我的猜测是disable->enable的话oracle依然加载了exclusive lock,而novalidate->validate的话oracle加载了null or share lock,有心的人可以dump一下sga来测试一下。
August 04 dbms_stats和leaf_blocksdbms_stats是oracle用来代替原有的analyze功能的一个包,与analyze相比dbms_stats具有很多优势,比如并行,比如分区信息统计等,但是dbms_stats再分析index的时候处理方式并不是太理想,dbms_stats分析index时将会只统计leaf_blocks为当前有数据的leaf block,而analyze则会统计为所有曾经被使用过的leaf block number,很显然dbms_stats的统计结果会使index fast full scan的成本被严重低估,在某些情况下会错误得选择index fast full scan做为执行路径。下面来看一个例子:
先清空原来的表
SQL 10G>truncate table t1;
Table truncated.
插入数据
SQL 10G>insert into t1 select
2 rownum id, 3 trunc(100 * dbms_random.normal) val, 4 rpad('x',100) padding 5 from 6 all_objects 7 where 8 rownum <= 10000 9 ; 10000 rows created.
SQL 10G>commit;
Commit complete.
创建索引,并限制pctfree为99,模拟大索引的产生
SQL 10G>create index ind_t1 on t1(id) pctfree 99;
Index created.
使用analyze分析索引
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name='T1';
INDEX_NAME BLEVEL LEAF_BLOCKS
------------------------------ ---------- ----------- IND_T1 2 10000 再使用dbms_stats分析,可以看到在这个时候基本统计数据是相同的
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
PL/SQL procedure successfully completed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name='T1';
INDEX_NAME BLEVEL LEAF_BLOCKS
------------------------------ ---------- ----------- IND_T1 2 10010 删除数据使表中只保留一条记录
SQL 10G>delete from t1 where rownum<10000;
9999 rows deleted.
SQL 10G>commit;
Commit complete.
再用analyze分析索引,可以发现leaf_blocks依然是10000
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name='T1';
INDEX_NAME BLEVEL LEAF_BLOCKS
------------------------------ ---------- ----------- IND_T1 2 10000 看cost这一项显示index fast full scan的成本为2679,这是正确的
SQL 10G>set autotrace trace exp;
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1; Execution Plan
-------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| ------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2679 (19)| | 1 | SORT AGGREGATE | | 1 | | | 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 2679 (19)| ------------------------------------------------------------- 使用dbms_stats分析索引,leaf_blocks被统计为1,只统计了当前在用的leaf block
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
PL/SQL procedure successfully completed.
SQL 10G>set autotrace off;
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name='T1'; INDEX_NAME BLEVEL LEAF_BLOCKS
------------------------------ ---------- ----------- IND_T1 2 1 看cost这一项显示index fast full scan的成本为1,这显然是出现了错误
SQL 10G>set autotrace trace;
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1; Execution Plan
-------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| ------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1 (0)| | 1 | SORT AGGREGATE | | 1 | | | 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)| ------------------------------------------------------------- 再来看看它究竟需要读取多少个块,是不是cost=1就够了
SQL 10G> ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan ---------------------------------------------------------- -------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| ------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1 (0)| | 1 | SORT AGGREGATE | | 1 | | | 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)| ------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 10035 consistent gets 10016 physical reads 0 redo size 411 bytes sent via SQL*Net to client 385 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 显然,这里发生了10016个physical reads,cost=1是远远不够的。不知道oracle会不会就这个问题有改进方案,大家拭目以待。
July 24 10g r2 db_file_multiblock_read_count在cbo中的变化10g r2和前几个版本比起来对db_file_multiblock_read_count在cbo成本计算中的公式做了调整,看一下下面我实验后的对照表。 SQL*Plus: Release 10.2.0.1.0 - Production on Mon Jul 24 17:07:42 2006 Connected to: 分别测试了两个版本下1000行,10000行,100000行记录的6个对比 exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>1000);
10g r2版本
July 01 今夜,阿根廷与世界杯再见悲情的阿根廷的主教练,他继承了阿根廷的悲情的传统。巴西莱、帕萨雷拉、贝尔萨、在这一刻灵魂附体,佩克尔曼一个人他代表了阿根廷足球悲情的传统,在这一刻他不是一个人在流泪,他不是一个人!
坎比亚索,面对这个点球。他面对的全世界阿根廷球迷的目光和期待。
莱曼曾经在之前扑出阿亚拉点球,坎比亚索应该深知这一点,他还能够微笑着面对他面前的这个人吗?10秒钟以后他会是怎样的表情? 球被扑了!比赛结束了!阿根廷队失败了。他们再一次倒在法西斯的球队面前,悲情的阿根廷的主教练!佩克尔曼今天倒下!阿根廷别再为我哭泣! 这个点球是一个绝对理论上的臭脚。绝对的臭脚,阿根廷队淘汰出了四强! 这个失败属于阿根廷,属于阿亚拉,属于弗朗哥,属于佩克尔曼,属于所有热爱阿根廷足球的人! 阿根廷队真的会后悔的,佩克尔曼在下半时他们领先一球的情况下打得太保守、太沉稳了,他失去了自己在小组赛的那种勇气,面对德国悠久的历史,他失去了他在小组赛中那种痛打落水狗的作风,他终于自食其果。阿根廷队该回家了,也许他们不用回遥远的阿根廷,他们不用回家,因为他们大多数人都在欧洲生活,再见! June 30 first_rows_n和all_rowsfirst_rows_n和all_rows都是oracle optimizer_mode的选项,他们有什么区别呢,会对优化器产生怎么样的影响呢?让我们一起来解开迷题. all_rows模式: first_rows_n模式:
准备好测试表和索引后来看看测试脚本 all_rows模式: first_rows_1模式: first_rows_10模式: first_rows_100模式: 由于篇幅太长,所以把10053的trace文件简化了一下,只留下join这一部分的内容,并把merge join的部分去除了 all_rows: ************************** HA Join
在All Rows模式下最终优化器选择了Best join order: 1,Cost: 195.3030,
*********************** 经过重新计算后, *********************** 计算第二种join顺序的成本值,T1做驱动表,T做内部表, *********************** 在FIRST_Rows_1模式下最终优化器选择了Best join order: 1,Cost: 4.0039,
FIRST_Rows_10: Final - First K Rows Plan: Best join order: 1 FIRST_Rows_100: Final - First K Rows Plan: Best join order: 1 值得注意,FIRST_Rows_100选择了hash
ALL_ROWS: FIRST_Rows_1: FIRST_Rows_10: FIRST_Rows_100: 总结来说,first_rows_n基于成本计算,根据优先返回行数N重新计算各个对象的访问成本, June 23 and_equal,index_join,index_combineand_equal,index_join,index_combine这三种都是oracle利用索引关联获得数据的方法,三者的目的都是为了最大限度的利用索引,减少回表的代价.但是三者的实现方法是有区别的,下面一一来分析.
and_equal:
这种方式需要查询条件里面包括所有索引列,然后取得每个索引中得到的rowid列表,然后对这些列表做merge join,过滤出相同的rowid后再去表中获取数据或者直接从索引中获得数据.and_equal有一些限制,比如它只对单列索引有效,只对非唯一索引有效,使用到的索引不能超过5个,查询条件只能是"=".在10g中,and_equal已经被废弃了,只能通过hint才能生效. create table test as select * from dba_objects;
create index ind_test_owner on test(owner);
create index ind_test_object_name on test(object_name);
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test'; Execution Plan
---------------------------------------------------------- -------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| |* 1 | AND-EQUAL | | | | | |* 2 | INDEX RANGE SCAN| IND_TEST_OWNER | 1 | | 1 (0)| |* 3 | INDEX RANGE SCAN| IND_TEST_OBJECT_NAME | 2 | | 1 (0)| ------------------------------------------------------------------------------- 如果查询条件只包含owner
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test'; Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 29 | 2 (0)| |* 2 | INDEX RANGE SCAN | IND_TEST_OWNER | 1 | | 1 (0)| ----------------------------------------------------------------------------------- 修改owner,object_name为非空
alter table test modify(owner not null);
alter table test modify(object_name not null); SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test'; Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 29 | 2 (0)| |* 2 | INDEX RANGE SCAN | IND_TEST_OWNER | 1 | | 1 (0)| ----------------------------------------------------------------------------------- 效果一样
查询条件是">"的情况
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner>'test' and object_name='test';
Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| |* 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 40 | 2 (0)| |* 2 | INDEX RANGE SCAN | IND_TEST_OBJECT_NAME | 2 | | 1 (0)| ----------------------------------------------------------------------------------------- 查询条件是in的情况
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner in('test','dba') and object_name='test'; Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| |* 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 40 | 2 (0)| |* 2 | INDEX RANGE SCAN | IND_TEST_OBJECT_NAME | 2 | | 1 (0)| ----------------------------------------------------------------------------------------- 再来看一下回表的情况 SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| |* 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 40 | 2 (0)| | 2 | AND-EQUAL | | | | | |* 3 | INDEX RANGE SCAN | IND_TEST_OWNER | 1 | | 1 (0)| |* 4 | INDEX RANGE SCAN | IND_TEST_OBJECT_NAME | 2 | | 1 (0)| ----------------------------------------------------------------------------------------- 先通过and_equal取得rowid列表,然后从表中返回数据.
index_join:
index join顾名思义是对index进行关联,oracle通过hash index join的方式实现了避免对表的访问.所有的数据都从索引中直接获得.它不受查询条件影响,可以是唯一索引,也可以是多列索引. SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
Execution Plan
---------------------------------------------------------- --------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 | 3 (34)| |* 1 | VIEW | index$_join$_001 | 1 | 29 | 3 (34)| |* 2 | HASH JOIN | | | | | |* 3 | INDEX RANGE SCAN| IND_TEST_OWNER | 1 | 29 | 1 (0)| |* 4 | INDEX RANGE SCAN| IND_TEST_OBJECT_NAME | 1 | 29 | 1 (0)| -------------------------------------------------------------------------------- 可以不带查询条件,只不过由index range scan变成了index fast full scan
SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
2 ; Execution Plan
---------------------------------------------------------- ------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 51984 | 1472K| 478 (2)| | 1 | VIEW | index$_join$_001 | 51984 | 1472K| 478 (2)| |* 2 | HASH JOIN | | | | | | 3 | INDEX FAST FULL SCAN| IND_TEST_OWNER | 51984 | 1472K| 153 (2)| | 4 | INDEX FAST FULL SCAN| IND_TEST_OBJECT_NAME | 51984 | 1472K| 322 (1)| ------------------------------------------------------------------------------------ 如果不是所有数据都能从索引获得,那么将不会使用index join SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| |* 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 40 | 2 (0)| |* 2 | INDEX RANGE SCAN | IND_TEST_OWNER | 1 | | 1 (0)| ----------------------------------------------------------------------------------- index_combine: index combine最早是用在bitmap index上的,在9i开始oracle默认可以使用在btree索引上,这是由_b_tree_bitmap_plans参数来控制的.oracle将btree索引中获得的rowid信息通过BITMAP CONVERSION FROM ROWIDS的步骤转换成bitmap进行匹配,然后匹配完成后通过BITMAP CONVERSION TO ROWIDS再转换出rowid获得数据或者回表获得数据. SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
Execution Plan
---------------------------------------------------------- ---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| | 1 | BITMAP CONVERSION TO ROWIDS | | 1 | 29 | 2 (0)| | 2 | BITMAP AND | | | | | | 3 | BITMAP CONVERSION FROM ROWIDS| | | | | |* 4 | INDEX RANGE SCAN | IND_TEST_OWNER | | | 1 (0)| | 5 | BITMAP CONVERSION FROM ROWIDS| | | | | |* 6 | INDEX RANGE SCAN | IND_TEST_OBJECT_NAME | | | 1 (0)| --------------------------------------------------------------------------------------------- 回表取数据的情况
SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
---------------------------------------------------------- ----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 1 | 40 | 2 (0)| | 2 | BITMAP CONVERSION TO ROWIDS | | | | | | 3 | BITMAP AND | | | | | | 4 | BITMAP CONVERSION FROM ROWIDS| | | | | |* 5 | INDEX RANGE SCAN | IND_TEST_OWNER | | | 1 (0)| | 6 | BITMAP CONVERSION FROM ROWIDS| | | | | |* 7 | INDEX RANGE SCAN | IND_TEST_OBJECT_NAME | | | 1 (0)| ---------------------------------------------------------------------------------------------- 不带查询条件的情况,index combine将不被使用
SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
2 ; Execution Plan
---------------------------------------------------------- -----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 51984 | 1472K| 1480 (1)| | 1 | TABLE ACCESS BY INDEX ROWID| TEST | 51984 | 1472K| 1480 (1)| | 2 | INDEX FULL SCAN | IND_TEST_OWNER | 51984 | | 123 (2)| ----------------------------------------------------------------------------------- index_combine会是and_equal的很好的替代者,随着and_equal的退出,index_combine将更多得被我们看到. April 06 转载小惠惠的文章
March 22 pctversion in lob segment在oracle的官方文档上有这么一段话来解释pctversion
PCTVERSION integerSpecify the maximum percentage of overall LOB storage space used for maintaining old versions of the LOB. The default value is 10, meaning that older versions of the LOB data are not overwritten until they consume 10% of the overall LOB storage space. You can specify the 由于lob segment采用了自己特殊的一致性读的实现,不是使用undo tablespace来保留前映象,而是当发生update时在lob segment内分配一个chunk去插入一条新的记录。这样的话,如果一条记录修改了多次,那么它就存在多个版本,对于很大的lob对象来说,这是十分浪费空间的。所以oracle需要有一个办法来控制这个保留前映象的空间的大小,pctversion就是为了实现这个功能的。pctversion是一个百分比的值,就是当前所有lob空间中用来存放前映象的百分比,如果前映象的空间大于pctversion了,那么oracle将会重用这些前映象空间而不去扩展。
对比一下不同pctversion对空间占用的影响
SQL 10G>select TABLE_NAME,SEGMENT_NAME,CHUNK,PCTVERSION,RETENTION from user_lobs where segment_name='TEXT_LOB';
TABLE_NAME SEGMENT_NAME CHUNK PCTVERSION RETENTION
------------------------------ ------------------------------ ---------- ---------- ---------- TESTLOB TEXT_LOB 8192 1 SQL 10G>truncate table testlob;
Table truncated.
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- ----------------------------------------- TEXT_LOB .0625M SQL 10G>insert into testlob values(1,rpad('a',4000)); 1 row created.
SQL 10G>commit;
Commit complete.
SQL 10G>select dbms_lob.getlength(text) from testlob;
DBMS_LOB.GETLENGTH(TEXT)
------------------------ 4000 SQL 10G>begin
2 for i in 1..10 loop 3 update testlob set text=text||text; 4 commit; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed.
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- ----------------------------------------- TEXT_LOB 13M SQL 10G>truncate table testlob;
Table truncated.
SQL 10G>alter table testlob modify lob(text) (pctversion 99);
Table altered.
SQL 10G>insert into testlob values(1,rpad('a',4000));
1 row created.
SQL 10G>commit;
Commit complete.
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- ----------------------------------------- TEXT_LOB .0625M SQL 10G>begin
2 for i in 1..10 loop 3 update testlob set text=text||text; 4 commit; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed.
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- ----------------------------------------- TEXT_LOB 17M SQL 10G>select dbms_lob.getlength(text) from testlob;
DBMS_LOB.GETLENGTH(TEXT)
------------------------ 4096000 可以看到pctversion的设置对lob空间大小还是有影响的,在很大的lob segment中这种影响会表现的十分明显,有兴趣的人可以对大lob段做测试。 March 16 parallel query in partition table上一篇文章里面提到parallel query相关的checkpoint,也了解了direct read会对系统性能带来的潜在影响,这次让我们看一下parallel query在partition table上的表现。
创建2种分区表
CREATE TABLE TESTHASH(A NUMBER) PARTITION BY HASH (A) (PARTITION P1, PARTITION P2, PARTITION P3, PARTITION P4) CREATE TABLE TESTLIST(A NUMBER) PARTITION BY LIST (A) (PARTITION P1 VALUES(1), PARTITION P2 VALUES(2), PARTITION P3 VALUES(3), PARTITION P4 VALUES(4)) SQL 10G>select distinct a from testhash partition (p1); 6 SQL 10G>select distinct a from testhash partition (p2);
9 SQL 10G>select distinct a from testhash partition (p3);
2 SQL 10G>select distinct a from testhash partition (p4);
1 SQL 10G>select count(*) from testhash partition (p1);
1048623 SQL 10G>select count(*) from testhash partition (p2);
1048623 SQL 10G>select count(*) from testhash partition (p3);
1048623 SQL 10G>select count(*) from testhash partition (p4);
1048623 SQL 10G>select distinct a from testlist partition (p1); 1 SQL 10G>select distinct a from testlist partition (p2); 2 SQL 10G>select distinct a from testlist partition (p3); 3 SQL 10G>select distinct a from testlist partition (p4); 4 SQL 10G>select count(*) from testlist partition (p1); 1048623 SQL 10G>select count(*) from testlist partition (p2); 1048623 SQL 10G>select count(*) from testlist partition (p3); 1048623 SQL 10G>select count(*) from testlist partition (p4); 1048623 两张分区表都是4个分区,每个分区包含1048623条记录 另外还有几个考查direct read与否的指标 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 137327 consistent gets direct 574 physical reads direct 574 table scans (direct read) 0 我们运行几个sql来看parallel query在分区表上的表现 SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist; COUNT(*) ---------- 4194492 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 137449 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) SQL 10G>select 137449-137327 "consistent gets from cache",6934-574 "consistent gets direct",6934-574 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 122 6360 6360 可以看到当parallel query包含所有分区的时候将会去做direct read 看看一个分区的情况 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 137449 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) 55 SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a=1; COUNT(*) ---------- 1048623 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE
---------------------------------------------------------------- ---------- consistent gets from cache 139307 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) 55 SQL 10G>select 139307-137449 "consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 1858 0 0 当查询一个分区时可以看到并没有采用direct read,而是采用普通的consistent get 两个分区的情况 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 139307 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) 55 SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2);
COUNT(*) ---------- 2097246 SQL 10G>SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 142788 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) 55 SQL 10G>select 142788-139307"consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 3481 0 0 同样没有发生direct read 三个分区呢? SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 142788 consistent gets direct 6934 physical reads direct 6934 table scans (direct read) 55 SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2,3); COUNT(*) ---------- 3145869 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE
---------------------------------------------------------------- ---------- consistent gets from cache 142907 consistent gets direct 11704 physical reads direct 11704 table scans (direct read) 110 SQL 10G>select 142907-142788 "consistent gets from cache",11704-6934 "consistent gets direct",11704-6934 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 119 4770 4770 可以看到发生了3个分区的direct read 如果我们查询不存在的分区值呢 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 142907 consistent gets direct 11704 physical reads direct 11704 table scans (direct read) 110 SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(5,6,7);
COUNT(*) ---------- 0 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME VALUE ---------------------------------------------------------------- ---------- consistent gets from cache 142907 consistent gets direct 11704 physical reads direct 11704 table scans (direct read) 110 SQL 10G>select 142907-142907 "consistent gets from cache",11704-11704"consistent gets direct",11704-11704"physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 0 0 0 很好,没有读testlist表的任何block 再来看一下hash partition table的表现 select /*+ parallel(testhash,4)*/count(*) from testhash; SQL 10G>select 149410-149287 "consistent gets from cache",18064-11704 "consistent gets direct",18064-11704 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 123 6360 6360 select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1); SQL 10G>select 151269-149410 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 1859 0 0 select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2); SQL 10G>select 154751-151269 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 3482 0 0 select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2,6); SQL 10G>select 154871-154751 "consistent gets from cache",22834-18064 "consistent gets direct",22834-18064 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 120 4770 4770 select /*+ parallel(testhash,4)*/count(*) from testhash where a in(3); SQL 10G>select 160572-158713 "consistent gets from cache",37144-37144 "consistent gets direct",37144-37144 "physical reads direct" from dual; consistent gets from cache consistent gets direct physical reads direct -------------------------- ---------------------- --------------------- 1859 0 0 可以看到hash partition table在其他情况都和list partition table一样,但是在查询不存在的分区值时还是会去读取testhash表的block,当然这是由于hash partition的机制决定的。 总结来说,parallel query在partition table上并不是时时在做direct read,它是和查询包含的partition number有关的,至于如何利用好parallel query的优势那又是另外的话题了,大家看完这2篇文章后可以自己去找寻正确的结论。 March 15 DBWR parallel query checkpoint buffers writtenDBWR parallel query checkpoint buffers written是oracle的一种特殊的checkpoint的产物,这种checkpoint在oracle8 以前称作Extent-based Checkpoint,当发生parallel query时oracle必须做checkpoint把脏数据写入磁盘,因为parallel query走的时direct read,直接从文件读入pga,如果有脏数据没有被写回磁盘,那么读出来的结果将会不一致。在oracle8以前,如果发生direct read,那么当每一个extent被读入的时候oracle会先去检查data buffer中有没和这个extent相关的dirty block,如果有就会驱动dbwr来写出,同时增加一次checkpoint的次数。这样的话如果这个表有N个extent都存在dirty block,那么将会发生N次checkpoint,这样的话将会使parallel query的执行时间大大超过normal query。在oracle8中这种情况有了改进,Extent-based Checkpoint变成了object-based checkpoint,不再会对单独的extent做checkpoint。
让我们再回到主题,DBWR parallel query checkpoint buffers written
首先来看一下v$statname
SQL 10G>select statistic#,name from v$statname where name like '%DBWR%';
STATISTIC# NAME
---------- ---------------------------------------------------------------- 69 DBWR checkpoint buffers written 70 DBWR thread checkpoint buffers written 71 DBWR tablespace checkpoint buffers written 72 DBWR parallel query checkpoint buffers written 73 DBWR object drop buffers written 74 DBWR transaction table writes 75 DBWR undo block writes 76 DBWR revisited being-written buffer 77 DBWR make free requests 78 DBWR lru scans 79 DBWR checkpoints STATISTIC# NAME
---------- ---------------------------------------------------------------- 80 DBWR fusion writes 再来创建一张测试表
create table test(a number)
SQL 10G>select count(*) from test;
COUNT(*)
---------- 4194496 SQL 10G>select distinct a from test;
A
---------- 1 这个测试表包含了4194496条值为1的记录
先刷新data buffer
SQL 10G>ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
我们来更新其中一条记录
SQL 10G>update test set a=2 where rownum<2;
1 row updated.
SQL 10G>commit;
Commit complete.
执行parallel query
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME VALUE
---------------------------------------------------------------- ---------- DBWR parallel query checkpoint buffers written 28615 DBWR checkpoints 552 SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
A
---------- 2 1 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#; NAME VALUE
---------------------------------------------------------------- ---------- DBWR parallel query checkpoint buffers written 28616 DBWR checkpoints 553 checkpoint和buffer written都增加1 再来更新分布在多个extent上的block来证实一下是不是现在的oracle版本用的是object-based checkpoint
SQL 10G>update test set a=10 where mod(dbms_rowid.ROWID_BLOCK_NUMBER(rowid),1000)=0 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
6 rows updated.
SQL 10G>commit;
Commit complete.
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME VALUE
---------------------------------------------------------------- ---------- DBWR parallel query checkpoint buffers written 28616 DBWR checkpoints 553 SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
A
---------- 10 2 1 SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME VALUE
---------------------------------------------------------------- ---------- DBWR parallel query checkpoint buffers written 28622 DBWR checkpoints 554 可以看到写出了6个块,但是只发生一次checkpoint,所以我们可以知道oracle已经采用了object-based checkpoint。
总结来说,DBWR parallel query checkpoint buffers written就是当发生parallel query时导致checkpoint而写出的块数。
February 14 ora_rowscn10g里面有了一项新功能,我们可以查看某个表的某一行最后一次改动的scn,这个就是由ora_rowscn来提供。
SQL 10G>create table test (a number);
Table created.
SQL 10G>select ora_rowscn from test;
no rows selected
SQL 10G>insert into test values(1);
1 row created.
SQL 10G>column ora_rowscn format 999999999999999999999
insert 完成后我们可以查询到ora_rowscn
SQL 10G>select ora_rowscn from test; ORA_ROWSCN
---------------------- 21749185488 SQL 10G>commit;
Commit complete.
当事务递交,ora_rowscn发生变化
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
---------------------- 21749185507 SQL 10G>insert into test values(2);
1 row created.
再插入一条记录,查看ora_rowscn,发现两条记录的ora_rowscn相同
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
---------------------- 21749185507 21749185507 SQL 10G>commit;
Commit complete.
事务递交后再次发现ora_rowscn发生变化
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
---------------------- 21749186216 21749186216 SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid),dbms_rowid.ROWID_BLOCK_NUMBER(rowid) from test;
DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) ------------------------------------ ------------------------------------ 1 69650 1 69650 SQL 10G>alter system dump datafile 1 block 69650;
System altered.
SQL 10G>exit
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650) scn: 0x0005.105a3ea8 seq: 0x02 flg: 0x02 tail: 0x3ea80602 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 Block header dump: 0x00411012 Object id on Block? Y seg/obj: 0xff72 csc: 0x05.105a3bd0 itc: 2 flg: O typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000d.026.000004da 0x01c01470.0184.2c --U- 1 fsc 0x0000.105a3be3 0x02 0x000e.008.000004e7 0x01c018a1.017d.17 --U- 1 fsc 0x0000.105a3ea8 data_block_dump,data header at 0xb72f845c =============== tsiz: 0x1fa0 hsiz: 0x16 pbl: 0xb72f845c bdba: 0x00411012 76543210 flag=-------- ntab=1 nrow=2 frre=-1 fsbo=0x16 fseo=0x1f94 avsp=0x1f78 tosp=0x1f78 0xe:pti[0] nrow=2 offs=0 0x12:pri[0] offs=0x1f9a 0x14:pri[1] offs=0x1f94 block_row_dump: tab 0, row 0, @0x1f9a tl: 6 fb: --H-FL-- lb: 0x1 cc: 1 col 0: [ 2] c1 02 tab 0, row 1, @0x1f94 tl: 6 fb: --H-FL-- lb: 0x2 cc: 1 col 0: [ 2] c1 03 end_of_block_dump SQL 10G>select to_number('5105a3ea8','xxxxxxxxxxxxxxxxxx') ora_rowscn from dual;
ORA_ROWSCN
------------------ 21749186216 从上面的实验中可以发现,当表以常规方式创建的时候,ora_rowscn取自data block header的scn,而每行数据并没保存自己的rowscn,下面来看一下以rowdependencies创建的表是什么情况。
SQL 10G>create table test (a number) rowdependencies;
Table created.
SQL 10G>select ora_rowscn from test;
no rows selected
SQL 10G>insert into test values(1);
1 row created.
insert后查询ora_rowscn ,发现ora_rowscn为空
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
------------------ SQL 10G>commit; Commit complete.
事务递交后,可以查询到ora_rowscn
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
------------------ 21749198001 SQL 10G>insert into test values(2);
1 row created.
插入第2条数据,依然发现有一个ora_rowscn为空
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
------------------ 21749198001 SQL 10G>commit; Commit complete.
递交后查询ora_rowscn,发现2条记录的ora_rowscn并不一样,这和上面的情况不一样,我们知道这是因为ora_rowscn被保存在行内的缘故
SQL 10G>select ora_rowscn from test;
ORA_ROWSCN
------------------ 21749198001 21749198017 SQL 10G>alter system dump datafile 1 block 69650;
System altered.
SQL 10G>exit
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650) scn: 0x0005.105a6cc1 seq: 0x02 flg: 0x02 tail: 0x6cc10602 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 seg/obj: 0xff73 csc: 0x05.105a687c itc: 2 flg: O typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0010.005.000004e5 0x01c02e78.01c5.1e --U- 1 fsc 0x0000.105a6cb1 0x02 0x000c.02c.000004d4 0x01c00feb.01a5.24 --U- 1 fsc 0x0000.105a6cc1 data_block_dump,data header at 0xc70e45c =============== tsiz: 0x1fa0 hsiz: 0x16 pbl: 0x0c70e45c bdba: 0x00411012 76543210 flag=--R----- ntab=1 nrow=2 frre=-1 fsbo=0x16 fseo=0x1f88 avsp=0x1f6c tosp=0x1f6c 0xe:pti[0] nrow=2 offs=0 0x12:pri[0] offs=0x1f94 0x14:pri[1] offs=0x1f88 block_row_dump: tab 0, row 0, @0x1f94 tl: 12 fb: --H-FL-- lb: 0x1 cc: 1 dscn 0x0000.00000000 col 0: [ 2] c1 02 tab 0, row 1, @0x1f88 tl: 12 fb: --H-FL-- lb: 0x2 cc: 1 dscn 0x0000.00000000 col 0: [ 2] c1 03 end_of_block_dump dump block后发现,现在dscn还是0,我们查询出来的ora_rowscn实际上是从Scn/Fsc
中获得的,真正ora_rowscn被保存在行内是在itl发生cleanout时会把Scn/Fsc刷到dscn SQL 10G>update test set a=1 where a=1;
1 row updated.
SQL 10G>commit;
Commit complete.
SQL 10G>update test set a=1 where a=1;
1 row updated.
SQL 10G>commit;
Commit complete.
SQL 10G>alter system dump datafile 1 block 69650;
System altered.
SQL 10G>exit
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650) scn: 0x0005.105a6dff seq: 0x02 flg: 0x02 tail: 0x6dff0602 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 Object id on Block? Y seg/obj: 0xff73 csc: 0x05.105a6dfd itc: 2 flg: O typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0012.024.000004db 0x01c024fd.016a.08 C--- 0 scn 0x0005.105a6df8 0x02 0x000d.00f.000004dc 0x01c031b3.0194.28 --U- 1 fsc 0x0000.105a6dff data_block_dump,data header at 0xc70e45c =============== tsiz: 0x1fa0 hsiz: 0x16 pbl: 0x0c70e45c bdba: 0x00411012 76543210 flag=--R----- ntab=1 nrow=2 frre=-1 fsbo=0x16 fseo=0x1f88 avsp=0x1f6c tosp=0x1f6c 0xe:pti[0] nrow=2 offs=0 0x12:pri[0] offs=0x1f94 0x14:pri[1] offs=0x1f88 block_row_dump: tab 0, row 0, @0x1f94 tl: 12 fb: --H-FL-- lb: 0x2 cc: 1 dscn 0x0005.105a6df8 col 0: [ 2] c1 02 tab 0, row 1, @0x1f88 tl: 12 fb: --H-FL-- lb: 0x0 cc: 1 dscn 0x0005.105a6cc1 col 0: [ 2] c1 03 end_of_block_dump 另外,oracle还提供了ora_rowscn到timestamp的转换
SQL 10G>select scn_to_timestamp(ora_rowscn) from test;
SCN_TO_TIMESTAMP(ORA_ROWSCN)
--------------------------------------------------------------------------- 14-FEB-06 03.55.05.000000000 PM 14-FEB-06 03.44.05.000000000 PM 更多ora_rowscn的用途请各位自己去摸索吧。 January 19 oracle rowid搞oracle的人都很清楚rowid这个东西,rowid就是唯一标志记录物理位置的一个id,在oracle 8版本以前,rowid由file#+block#+row#组成,占用6个bytes的空间,10 bit 的 file# ,22bit 的 block# ,16 bit 的 row#。
从oracle 8开始rowid变成了extend rowid,由data_object_id#+rfile#+block#+row#组成,占用10个bytes的空间, 32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.由于rowid的组成从file#变成了rfile#,所以数据文件数的限制也从整个库不能超过1023个变成了每个表空间不能超过1023个数据文件。
说了rowid的组成,那么我们再来看看rowid在索引里面占用的字节数又是什么样子的。在oracle 8以前索引中存储的rowid占用字节数也是6bytes,在oracle8之后,虽然oracle使用了extend rowid,但是在普通索引里面依然存储了bytes的rowid,只有在global index中存储的是10bytes的extend rowid,而extend rowid也是global index出现的一个必要条件,下面我们会解释原因。
为什么golbal index需要把data_object_id#也包含在index rowid entry中呢?如果不包含会这么样?首先我们需要知道index的rowid entry的存在是为了能根据它找到表的这条记录存在哪个具体的物理位置,我们需要知道它在哪个数据文件,在哪个block,在那一行,普通的索引oracle根据rfile#,block#,row#就可以知道了,但是partition table可以分布在多个表空间,也就是可以分布在多个数据文件,当我们建立local index时,index rowid entry并不包含data_object_id#,因为oracle可以知道这个index对应的是哪一个table分区,并可以得到table分区的ts#(tablespace号),那么oracle根据ts#和rfile#就可以找到具体的数据文件。但是如果换成是golbal index,如果不包含data_object_id#,那么我们并不能知道这个索引对应着哪个表分区,也自然不能知道它的rfile#和file#的转换关系,所以它将找不到所对应的记录。包含data_object_id#后,oracle可以根据data_object_id#实现rfile#和file#的转换然后找到记录对应的物理位置。需要注意的是要理解以上概念我们还是需要了解file#和rfile#的区别。
关于file#和rfile#的区别可以参考biti_rainy的一篇blog
继续上面的话题,我们猜想oracle实现rfile#和file#的转换是不是由一些递归sql来实现,所以我们做了一个测试来看是否oracle从file$等基表中实现呢?我们做了一个10046 trace,发现并不存在相应的递归sql,那oracle怎么实现呢?可能是直接取一些x$table中的数据了吧,那哪个x$table保留这些信息呢?
x$kccfe!!!
SQL 10G>desc x$kccfe
Name Null? Type ----------------------------------------- -------- ---------------------------- ADDR RAW(4) INDX NUMBER INST_ID NUMBER FENUM NUMBER FECSZ NUMBER FEBSZ NUMBER FESTA NUMBER FECRC_SCN VARCHAR2(16) FECRC_TIM VARCHAR2(20) FECRC_THR NUMBER FECRC_RBA_SEQ NUMBER FECRC_RBA_BNO NUMBER FECRC_RBA_BOF NUMBER FECRC_ETB RAW(132) FECPS VARCHAR2(16) FECPT VARCHAR2(20) FECPC NUMBER FESTS VARCHAR2(16) FESTT VARCHAR2(20) FEBSC VARCHAR2(16) FEFNH NUMBER FEFNT NUMBER FEDUP NUMBER FEURS VARCHAR2(16) FEURT VARCHAR2(20) FEOFS VARCHAR2(16) FEONC_SCN VARCHAR2(16) FEONC_TIM VARCHAR2(20) FEONC_THR NUMBER FEONC_RBA_SEQ NUMBER FEONC_RBA_BNO NUMBER FEONC_RBA_BOF NUMBER FEONC_ETB RAW(132) FEPOR NUMBER FETSN NUMBER FETSI NUMBER FERFN NUMBER FEPFT NUMBER FEDOR NUMBER FEPDI NUMBER FEFDB NUMBER FEPLG_SCN VARCHAR2(16) FEPAX NUMBER FEFLG NUMBER 从这个x$table中oracle可以实现file和rfile的转换。
最后我们来看一个例子
SQL 10G>desc test
Name Null? Type ----------------------------------------- -------- ---------------------------- A NUMBER B VARCHAR2(32) test是一张分区表
SQL 10G>select table_name,partition_name,tablespace_name from dba_tab_partitions where table_name='TEST';
TABLE_NAME PARTITION_NAME TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------ TEST P1 TESTROWID TEST P2 TESTROWID 这条记录所在的物理位置
SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
DATA_OBJECT_ID# RFILE# BLOCK# ROW#
--------------- ---------- ---------- ---------- 63665 28 15 0 我们对它创建全局索引
SQL 10G>create index ind_test on test(a);
Index created.
再创建本地索引
SQL 10G>create index ind_test_local on test(b) local;
Index created. SQL 10G>select dump(rowid,16) rid from test;
RID
---------------------------------------------------------------------------- Typ=69 Len=10: 0,0,f8,b1,7,0,0,f,0,0 去看看全局索引和本地索引中rowid entry的区别
SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST';
HEADER_BLOCK HEADER_FILE
------------ ----------- 1403 4 SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST_LOCAL';
HEADER_BLOCK HEADER_FILE
------------ ----------- 11 33 11 34 SQL 10G>ALTER SYSTEM DUMP DATAFILE 4 BLOCK 1404;
System altered.
SQL 10G>ALTER SYSTEM DUMP DATAFILE 33 BLOCK 12;
System altered.
全局索引ind_test的rowid entry
col 1; len 10; (10): 00 00 f8 b1 07 00 00 0f 00 00 本地索引ind_test_local的rowid entry
col 1; len 6; (6): 07 00 00 0f 00 00
可以看出本地索引存储了6bytes rowid,全局索引存储了10bytes rowid
再来看一下00 00 f8 b1 07 00 00 0f 00 00
转换成bit就是
00000000 00000000 11111000 10110001 00000111 00000000 00000000 00001111 00000000 00000000
32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.
00000000 00000000 11111000 10110001 data_object_id#
转换成10进制就是
2^15+2^14+2^13+2^12+2^11+2^7+2^5+2^4+2^0=63665
00000111 00 rfile#
2^4+2^3+2^2=28
0000000000000000001111 block#
2^3+2^2+2^1+2^0=15
0000000000000000 rowi# 0
SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
DATA_OBJECT_ID# RFILE# BLOCK# ROW#
--------------- ---------- ---------- ---------- 63665 28 15 0 验证通过
再来看一下如果file#超过1023后oracle会怎么处理
SQL 10G>select file#,ts#,rfile# from v$datafile where file#>1023;
FILE# TS# RFILE#
---------- ---------- ---------- 1024 14 1 1025 14 3 1026 14 4 1027 14 5 1028 14 6 1029 14 7 1030 14 8 1031 14 9 1032 14 10 1033 14 11 1034 14 12 FILE# TS# RFILE#
---------- ---------- ---------- 1035 14 13 1036 14 14 1037 15 14 可以看到在一个tablespace里面rfile#从1开始到1023
SQL 10G>select file#,rfile# from v$datafile where ts#=14 order by file#;
FILE# RFILE#
---------- ---------- 2 2 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 。。。。。。 FILE# RFILE#
---------- ---------- 1015 1015 1016 1016 1017 1017 1018 1018 1019 1019 1020 1020 1021 1021 1022 1022 1023 1023 1024 1 1025 3 FILE# RFILE# ---------- ---------- 1026 4 1027 5 1028 6 1029 7 1030 8 1031 9 1032 10 1033 11 1034 12 1035 13 1036 14 January 16 what is data_object_id前些天有人问data_object_id究竟是个什么东东,既然有了object_id那还要data_object_id干吗?
其实object_id和data_object_id同样是表示数据库对象的一个唯一标志,但是object_id表示的是逻辑id,data_object_id表示的是物理id。如果一些object没有物理属性的话那它就不存在data_object_id,例如procedure,function,package,data type,db link,mv定义,view定义,临时表,分区表定义等等这些object都是没有对应着某个segment,因此它们的data_object_id都为空。
当表刚创建的时候它的object_id和data_object_id都是相等的,但是如果表经过move或truncate后那么data_object_id将会有变化。
SQL 10G>select object_id,data_object_id from user_objects where object_name='T'; OBJECT_ID DATA_OBJECT_ID
---------- -------------- 63053 63053 刚开始创建表时object_id=data_object_id
SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK BLOCKS
----------- ------------ ---------- 4 2019 8 SQL 10G>alter table t move;
Table altered.
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
OBJECT_ID DATA_OBJECT_ID
---------- -------------- 63053 63463 在move以后可以看到data_object_id发生变化了
SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK BLOCKS
----------- ------------ ---------- 4 467 8 SQL 10G>truncate table t;
Table truncated.
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
OBJECT_ID DATA_OBJECT_ID
---------- -------------- 63053 63464 SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK BLOCKS
----------- ------------ ---------- 4 467 8 truncate之后虽然segment的位置没有移动,但是data_object_id还是发生变化了。
SQL 10G>alter table t add(b number);
Table altered.
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
OBJECT_ID DATA_OBJECT_ID ---------- -------------- 63053 63464 我们对t表加一个字段,发现data_object_id没有发生变化,我们可以认为只有当segment发生变化时data_object_id才会随之变化。我们来看看data_object_id从哪里来的
SQL 10G> select text from dba_views where view_name='DBA_OBJECTS';
select u.name, o.name, o.subname, o.obj#, o.dataobj#,
decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER', 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE', 7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE', 11, 'PACKAGE BODY', 12, 'TRIGGER', 13, 'TYPE', 14, 'TYPE BODY', 19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB', 22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE', 28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE', 32, 'INDEXTYPE', 33, 'OPERATOR', 34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION', 40, 'LOB PARTITION', 41, 'LOB SUBPARTITION', 42, NVL((SELECT distinct 'REWRITE EQUIVALENCE' FROM sum$ s
WHERE s.obj#=o.obj# and bitand(s.xpflags, 8388608) = 8388608), 'MATERIALIZED VIEW'), 43, 'DIMENSION', 44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN', 48, 'CONSUMER GROUP', 51, 'SUBSCRIPTION', 52, 'LOCATION', 55, 'XML SCHEMA', 56, 'JAVA DATA', 57, 'SECURITY PROFILE', 59, 'RULE', 60, 'CAPTURE', 61, 'APPLY', 62, 'EVALUATION CONTEXT', 66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW', 72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
81, 'FILE GROUP', 'UNDEFINED'), o.ctime, o.mtime, to_char(o.stime, 'YYYY-MM-DD:HH24:MI:SS'), decode(o.status, 0, 'N/A', 1, 'VALID', 'INVALID'), decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'), decode(bitand(o.flags, 4), 0, 'N', 4, 'Y', 'N'), decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N') from sys.obj$ o, sys.user$ u where o.owner# = u.user# and o.linkname is null and (o.type# not in (1 /* INDEX - handled below */, 10 /* NON-EXISTENT */)
or (o.type# = 1 and 1 = (select 1 from sys.ind$ i where i.obj# = o.obj# and i.type# in (1, 2, 3, 4, 6, 7, 9)))) and o.name != '_NEXT_OBJECT' and o.name != '_default_auditing_options_' union all select u.name, l.name, NULL, to_number(null), to_number(null), 'DATABASE LINK', l.ctime, to_date(null), NULL, 'VALID','N','N', 'N' from sys.link$ l, sys.user$ u where l.owner# = u.user#
可以看到data_object_id是从obj$.dataobj#来的,而obj$.dataobj#又对应着seg$.HWMINCR
SQL 10G>select max(HWMINCR) from sys.seg$;
MAX(HWMINCR) ------------ 63464 这里的HWMINCR就是data_object_id的来源,每次seg$里生成新的一条记录都会增加HWMINCR这个值,同时obj$.dataobj#也会跟着变化。
December 28 dynamic samplingdynamic sampling对于我们来说其实不算太陌生,从9i r2开始,dynamic sampling其实已经不动声色地融入到数据库中了。
我们经常会碰到由于一些表没有分析导致执行计划错误的情况,但是dynamic sampling的出现一定程度的减少了错误的产生。dynamic sampling针对没有分析过的表可以采样估计表的选择性,对于生产正确的执行计划有一定的帮助。
dynamic sampling分为10个级别,从0-10,由参数optimizer_dynamic_sampling控制
下面是取自"Performance Tuning Guide and Reference "的对10种级别的定义
dynamic_sampling hint的级别定义如下
下面做一些测试
SQL> create table test as select object_id a from dba_objects; Table created.
SQL> show parameter optimizer_dynamic_sampling NAME TYPE VALUE
Execution Plan
SQL> alter session set optimizer_dynamic_sampling=10; Session altered.
SQL> select count(*) from test; Execution Plan
SQL 10G>alter session set optimizer_dynamic_sampling=2; Session altered.
SQL 10G>select count(*) from test;
Execution Plan ------------------------------------------------------------------- Note
看看关联查询的情况
SQL> create table test1 as select * from test; Table created.
SQL> select count(*) from test a,test1 b where a.a=b.a; Execution Plan
如果两个表都没分析,那么dynamic sampling不起作用,分析其中一个表
SQL> analyze table test1 compute statistics; Table analyzed.
SQL> select count(*) from test a,test1 b where a.a=b.a; Execution Plan 4 2 TABLE ACCESS (FULL) OF 'TEST' (Cost=6 Card=23706 Bytes
dynamic sampling起作用了。
在10g里面有点区别,即使2个表都没分析过,dynamic sampling也可以起作用
SQL 10G>alter session set optimizer_dynamic_sampling=1; Session altered.
SQL 10G>select count(*) from test a,test1 b where a.a=b.a; Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | | 151 (4)| 00:00 | 1 | SORT AGGREGATE | | 1 | 26 | | | |* 2 | HASH JOIN | | 43244 | 1097K| 1056K| 151 (4)| 00:00 | 3 | TABLE ACCESS FULL| TEST1 | 43235 | 548K| | 21 (5)| 00:00 | 4 | TABLE ACCESS FULL| TEST | 45439 | 576K| | 21 (5)| 00:00 --------------------------------------------------------------------------------
2 - access("A"."A"="B"."A")
Note
其实归根到底是由于10g没有了基于规则的优化器,在9i里面如果一个查询中的所有表没有统计数据,那么他将选择基于规则的优化器而忽略dynamic sampling,而在10g因为这个原因所以dynamic sampling生效。
|
|
|