hai's profileOracle & StarcraftBlogLists Tools Help
April 23

本blog停止更新

新的更新全部会在www.orawh.com
January 24

hash join 与 10104 event

OPTIMAL MODE:

 Optimal方式的hash join相对来说非常简单,在这个例子(optimal.doc)里面,一共需要2partition,这也是hash join中最小的partition分配数,随着work area的增大,partition的数目也会增多,partition的分配规律遵守2n次方这样的规则分配。在经过build tablescan以后,hash table被建立,每条记录被hash后对应的hash bucket也被填充,与此相对应的bitmap vectorbit位也被设置,这个时候开始probe tablescan,首先通过同样的hash functionjoin key进行运算,然后对比bitmap vector中的bit是否被标志,如果没被标志那么表示这条记录不符合关联的要求直接被丢弃。如果bit位被设置,那么表示这个值可能与hash bucket中保存的值一致,因为hash算法会有碰撞,所以一个bucket中会有不同的值存在,所以将会对比这个值和hash bucket中保存的值是不是相同,如果相同那么输出给客户端,等probe table扫描完毕,所有结果都输出给客户端。

 

optimal.doc  点击下载trace 文件

ONEPASS MODE:


当我们修改了hash_area_size2m后,hash join的运行模式变成了onepass状态,从上面的trace(onepass.doc)文件中可以发现,partition变成了4

### Partition Distribution ###

Partition:0    rows:49933      clusters:5      slots:1      kept=0

Partition:1    rows:49674      clusters:5      slots:1      kept=0

Partition:2    rows:49943      clusters:5      slots:4      kept=0

Partition:3    rows:50451      clusters:5      slots:5      kept=1


其中partition 3可以被放在work area中,其他几个partition不可避免的要写出到磁盘上。接下来开始probe tablescan,在扫描的过程中经过bitmap vector的过滤,不符合的记录被丢弃,因为bitmapVector中也保存了每个hash bucket是否在内存还是在磁盘,所以符合的记录就会去查到底这个hash bucket位于哪里,如果在内存,直接比较确切值,丢弃或者输出给客户端(从trace文件中看不出这个过程,trace文件看起来是dump所有不能放入内存的partitition,然后读入partition pairhash join)。如果在磁盘,那么probe table的这条记录也写出到磁盘,形成一个partition pair
partition 3的数据join完毕,接下来是partition 2的所有部分被读入内存,同时probe table相对的partition也被读入内存进行hash join,然后是partition 1,partition 0。可以看到这里都只读了一次probe table partition,所以我们称之为onepass hash join.

onepass.doc  点击下载trace 文件

MULTIPASS MODE:


 这个就是multipass hash join的过程,trace文件异常的长,首先分配了2partitions,没有一个partition能被完全放在work area里面

### Partition Distribution ###

 Partition:0    rows:99876      clusters:10     slots:1      kept=0

Partition:1    rows:100125     clusters:10     slots:4      kept=0

首先是build tablescan Work area里面容纳了partition 14slotpartition 0的一个slot,另外还有一个slot是为了i/o用的。每当一个slot(可能是一个block)满后将会被写出到磁盘,等build table scan完毕,磁盘上保存了2partition的内容,并且bitmap vector也被建立了。这时开始了probe tablescan,系统为probe table的每个partition分配了一个slot,还分配一个slot作为读入probe table的缓冲,通过bitmap vector的过滤,符合条件的记录被填充到probe partitionslot中并与build table partitionslot来比较确切值。这个步骤完了以后work area里面应该存在一个build table partition 0slot,一个build table partition 1slot,一个写出缓冲的slot,一个读入缓冲的slot,一个probe table partition 0slot,一个probe table partition 1slot,一共6slot,刚好填满所有可分配的slot。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取partition pair开始join,经过10probe partition的读取。partition 0也是经过了10probe partition的读取,这就带来了很多i/o,导致hash join的性能急剧下降。Oracle在进行multipass的过程中会有2hash function的存在。Oracle将会读入磁盘上的build table partition再进行一次hash生成一些subpartition,这样的话每次读入probe table subpartition即可,而不用多次读入probe partition导致过大的i/o.不过在这个trace文件中并没有发现这个过程。的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。里面容纳了的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。另外还有一个trace文件反映了ROLE REVERSALoracle在进行join时会去评估build partitionprobe partition的大小,如果发行probe partition小于build partition,那么会对换两者的角色,对原来的probe partition建立hash table,拿原来的build partition来匹配,这种方法也在一定程度上减少了i/o,提高了效率。

*** HASH JOIN GET FLUSHED PARTITIONS (PHASE 2) ***

 Getting a pair of flushed partions.  

BUILD PARTION: nrows:24948 size=(3 slots, 384K) 

PROBE PARTION: nrows:12410 size=(2 slots, 256K)    

ROLE REVERSAL OCCURRED


### Hash table overall statistics ###
Total buckets: 262144 Empty buckets: 250012 Non-empty buckets: 12132
Total number of rows: 12410
Maximum number of rows in a bucket: 3
Average number of rows in non-empty buckets: 1.022915

multipass.doc  点击下载trace 文件

上面这些只是本人基于trace文件对hash join运行模式的推断,不一定能反正hash join真实的运行情况,所以欢迎大家来讨论!  

November 03

运用transport tablespace进行快速oracle版本升级(9i-10g)

大家可能都知道在进行oracle数据库版本升级的时候会有2种方式

1.通过dbua(database upgrade assistant)

2.exp/imp

通过dbua来升级的话由于不涉及到数据文件的改变,所以速度会比较快,但是如果dbua一旦在升级的过程中出现问题可能会导致原来的库不可用。

而通过exp/imp虽然对原来的库不会有影响,但是如果数据库比较大的话那么升级的时间将会是不可接受的(尤其对24*7)的应用来说。

 针对这个问题,这次的2006 oracle openworld法国的amadeus公司提供了一个非常有创意的点子,就是利用dataguard和transport tablespace功能来实现最短时间内的安全升级。

 首先让我们来了解一下amadeus公司

 

艾玛迪斯全球旅游分销系统公司(Amadeus Global Travel Distribution SA)是全球领先的旅游行业技术及分销供应商。1987年艾玛迪斯总部建立于西班牙马德里。在 Sophia Antipolis(法国尼斯附近)和美国波士顿设立有市场及开发部门。公司的数据中心位于德国慕尼黑附近的Erding。公司提供各种先进的旅游行业技术解决方案,至今已成为成长最快并被最广泛使用的全球分销系统(GDS)。

作为卓越的技术合作伙伴,艾玛迪斯把最先进的信息技术带入旅游行业,使众多的旅游供应商、休闲及商务旅游服务商从中获益。通过设立服务于当地市场的national marketing companies(NMCs),艾玛迪斯用其庞大的信息技术资源向全世界200个国家和地区提供优质的技术解决方案。

我们再来看一下跟它们的数据库相关的信息

他们的业务系统达到99.99%的可用率,每秒钟有30万次的数据库请求,每天有2亿8千万次transaction,这是一个相当大的数据库系统,如果用dbua或者exp/imp他们都不能接受升级的风险,于是他们的技术人员就想出了用dataguard和transport tablespace功能来实现最短时间内的安全升级。

 具体的实现方法是这样的

1.先为主库建立一个dataguard数据库(可以在线做)

2.在dataguard库上安装10g软件(可以在线做)

3.整理一些不能通过transport tablespace搞定的东西,比如sequence,synonyms,grants......

4.停止主库这边所有write的应用,提供read的服务(写入停止,提供查询)

5.强制归档主库redo log并传到dataguard恢复(写入停止,提供查询)

6.利用transport tablespace来转换数据库版本,并创建sequencee,synonyms,grants等(写入停止,提供查询)。

7.验证新环境的过程,在验证过程中如果发现有问题,则可以切换会原来的系统(写入停止,提供查询)。

8.切换应用到10g数据库(提供服务)

amadeus在演习时做到10分钟内完成4,5,6,7并成功切换了系统,考虑到他们的数据库繁忙程度和数据库容量非常大,这真是一项伟大的成就。我们可以在以后的数据库版本的升级过程中借鉴他们的方法。

 

我们再从技术上验证一下transport tablespace可以运用在版本升级

在9i的库上创建一个test tablespace 

create tablespace test
datafile '/opt/oracle/test.dbf' size 10m
extent management local autoallocate;

创建一张表在test表空间上 

create table test1(a number) tablespace test;

insert into test1 values(1);
commit;

SQL 9i>select * from test1;

         A
----------
         1

 

把test表空间置为read only模式


alter tablespace test read only;

 到处test tablespace的metadata

exp \'sys/sys as sysdba\'  transport_tablespace=y tablespaces=(TEST)  file=test.dmp log=test.log

传输dmp文件和数据文件(在amadeus的案例里面由于10g的库和9i的库在同一台机器上,所以避免了拷贝数据文件的时间,这也是整个方案的重点之一)到远程

scp test.dmp oracle@10.0.100.115:/opt/oracle/

scp /opt/oracle/test.dbf oracle@10.0.100.115:/opt/oracle/

 

在目标库上导入metadata数据

imp \'sys/sys as sysdba\' transport_tablespace=y tablespaces=(TEST) file='/opt/oracle/test.dmp' datafiles=

('/opt/oracle/test.dbf') tts_owners=test fromuser=test touser=test log=tts_i.log

 

查看test1表,发现数据一致


SQL 10G>select * from test1;

         A
----------
         1

把test表空间置为read write模式

alter tablespace test read write;

insert into test1 values(2);


SQL 10G>select * from test1;

         A
----------
         1
         2

一切正常,测试完毕

 

这个测试简单的模仿了transport tablespace升级数据库的可能性,当然在实际过程中我们要校验是否自包含表空间,是否需要创建sequence等,但是总体来说这种方案能提供最短时间内的数据库版本升级。

October 16

itl deadlock

昨天有同事碰到itl竞争导致的deadlock,这在以前比较少见,那么在这里我们来作个实验重现一下,
从理论上解释为什么itl竞争会导致deadlock。
create table test(a number);
insert into test values(1);
SQL 10G>insert into test select * from test;
1 row created.
SQL 10G>/
2 rows created.
SQL 10G>/
4 rows created.
SQL 10G>/
8 rows created.
SQL 10G>/
16 rows created.
SQL 10G>/
32 rows created.
SQL 10G>/
64 rows created.
SQL 10G>/
128 rows created.
SQL 10G>/
256 rows created.
SQL 10G>/
512 rows created.
SQL 10G>/
1024 rows created.
SQL 10G>/
2048 rows created.
SQL 10G>/
4096 rows created.
SQL 10G>/
8192 rows created.
SQL 10G>/
16384 rows created.
SQL 10G>commit;
Commit complete.
SQL 10G>alter table test move pctfree 0;
Table altered.
move表,缩小pctfree为0使表不能创建多余的itl
SQL 10G>select ini_trans from  dba_tables where owner='TEST' and  table_name='TEST';
 INI_TRANS
----------
         1
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0010.023.0000229f  0x01c01177.05f5.1c  C---    0  scn 0x0005.e88ed5fb
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
这里可以看到虽然dba_tables里面的ini_trans为1,但是实际上block里面默认的itl是3个,从下面的实验中可以发现。

SQL 10G>select distinct dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block# from test order by block#;
    BLOCK#
----------
     61612
     61613
     61614
     61615
     61616
     61618
     61619
     61620
     61621
     61622
     61623
    BLOCK#
----------
     61624
     61626
     61627
     61628
     61629
     61630
     61631
     61632
     61634
     61635
     72057
    BLOCK#
----------
     72058
     72059
     72060
     72061
     72062
     72063
     72064
     72065
     72066
     72067
     72068
    BLOCK#
----------
     72069
     72070
     72071
     72072
     72817
     72818
     72819
     72820
     72821
     72822
     72823
    BLOCK#
----------
     72824
45 rows selected.
我们挑2个block来做实验,block 61612,block 61613

session 1:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1;
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1;
session1更新分别更新61612,61613的第2行,分别占据61612,61613的一个itl entry
session 2:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2;
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2;
session2更新分别更新61612,61613的第3行,分别占据61612,61613的另一个itl entry
session 3:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
session3 更新61612的第1行,占据61612的第3个itl entry

session 4:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
session4 更新61613的第4行,占据61613的第3个itl entry,这个时候2个block的itl entry都被用完,
每个block itl都是3个并且不能扩展

session 4:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
session4 更新61612的第4行,这时候出现"enq: TX - allocate ITL entry"等待
session 3:
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
session3 更新61613的第1行,出现"enq: TX - allocate ITL entry"等待
session 4:
SQL 10G>update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
而当session3 更新61613的第1行时,session4这边报出了deadlock,看下图
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-0013002e-0001ac43        18     154     X             19     143           S
TX-0014002a-0001c990        19     143     X             18     154           S
session 154: DID 0001-0012-0001B9E7     session 143: DID 0001-0013-00000DB7
session 143: DID 0001-0013-00000DB7     session 154: DID 0001-0012-0001B9E7
Rows waited on:
Session 143: no row
Session 154: no row
Information on the OTHER waiting sessions:
Session 143:
  pid=19 serial=60295 audsid=6783 user: 55/TEST
  O/S info: user: oracle, term: pts/1, ospid: 664, machine: csdba
            program: sqlplus@csdba (TNS V1-V3)
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0
End of information on OTHER waiting sessions.
September 25

autotrace,explain plan与绑定变量

create table fenbu as select 1 id,'Y' flag from dba_objects where rownum<100001;

insert into fenbu values(1,'N');
commit;

create index IDX_FENBU_FLAG on fenbu(flag);

analyze table fenbu compute statistics for table for all columns for all indexes;

var a varchar2(32);
exec :a:='N';

SQL 10G>set autotrace trace exp;

SQL 10G>alter session set events'10046 trace name context forever,level 12';  

Session altered.

SQL 10G>select * from fenbu    where   flag=:a;


Execution Plan
----------------------------------------------------------

----------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 50001 |   244K|   202  (80)|
|*  1 |  TABLE ACCESS FULL| FENBU | 50001 |   244K|   202  (80)|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"=:A)

Note
-----
   - 'PLAN_TABLE' is old version

 


SQL 10G>alter session set events'10046 trace name context off';

Session altered.

 

很显然可以看到set autotrace的执行计划是错的,这是因为set auotrace,explain plan等操作
并不会发生bind peeking,它并不会把绑定变量的值反映到执行计划里面,不会去看直方图的
数据分布,所以它生成的计划并不可信,我们可以来看一下10046的真实计划。


select *
from
 fenbu    where   flag=:a


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 55 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID FENBU (cr=4 pr=0 pw=0 time=102 us)
      1   INDEX RANGE SCAN IDX_FENBU_FLAG (cr=3 pr=0 pw=0 time=78 us)(object id 83455)

September 01

oracle compress table II

上一篇文章提到压缩表发生update后会导致行迁移,但是在上篇文章里面没有做过多描述
,这次我们来仔细看一下update compressed table会发生什么事情。此外我们再来看看压
缩表结构修改是怎么处理的。

首先创建测试表

create table test2(a varchar2(10),b varchar2(10),c varchar2(10));

begin
for i in 1000000000..1000100000 loop
insert into test2 values(i,'1',to_char(mod(i,100)));
commit;
end loop;
end;
/
SQL 10G>create table testcom4 compress as select * from test2 order by c;
Table created.

对压缩表添加一个列

SQL 10G>SQL 10G>SQL 10G>

alter table testcom4 add d number;

Table altered.

定位到一条记录,找出所在文件号,块号,文件号和rowid

SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from
testcom4 where rownum&lt;2;

FILE# BLOCK# ROW#
---------- ---------- ----------
12 61364 0

SQL 10G>select rowid from testcom4 where rownum&lt;2;

ROWID
------------------
AAAT9AAAMAAAO+0AAA

更新这条记录

SQL 10G>update testcom4 set d=1 where rowid='AAAT9AAAMAAAO+0AAA';

1 row updated.

SQL 10G>commit;

Commit complete.

SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from testcom4
where rowid='AAAT9AAAMAAAO+0AAA';

FILE# BLOCK# ROW#
---------- ---------- ----------
12 61364 0
dump这个block看看行迁移是怎么发生的

SQL 10G>alter system dump datafile 12 block 61364;

System altered.

perm_9ir2[3]={ 2 0 1 }
...
block_row_dump:
tab 0, row 0, @0x1f79
tl: 7 fb: --H-FL-- lb: 0x0 cc: 2
col 0: [ 1] 31
col 1: [ 1] 30
bindmp: 01 bc 02 c9 31 c9 30
tab 1, row 0, @0x1f69
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x0300f085.0 这里指向了新的数据块
bindmp: 20 02 00 03 00 f0 85 00 00
定位新的块

SQL 10G>select dbms_utility.DATA_BLOCK_ADDRESS_FILE(to_number('300f085','xxxxxxxxxx'))
file#,dbms_utility.DATA_BLOCK_ADDRESS_BLOCK(to_number('300f085','xxxxxxxxxx'))
block# from dual;

FILE# BLOCK#
---------- ----------
12 61573

dump新的block

SQL 10G>alter system dump datafile 12 block 61573;

System altered.

block_row_dump:
tab 0, row 0, @0x1f65
tl: 27 fb: ----FL-- lb: 0x1 cc: 4
hrid: 0x0300efb4.0
col 0: [10] 31 30 30 30 30 39 33 30 30 30
col 1: [ 1] 31
col 2: [ 1] 30
col 3: [ 2] c1 02

可以看到新的block里面已经是非压缩的数据格式了,从这里可以看出对压缩表的更新确实是会导致
压缩失效。
那么能不能删除新加的列呢?试一下

SQL 10G>alter table testcom4 drop column d;
alter table testcom4 drop column d
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

报错了,提示“unsupported add/drop column operation on compressed tables”
metalink上说这是oracle的一个bug,在10g修复,但是在我的10g r2的版本上还是
不通过。9i的版本更加离谱,连add column都不行。

---------------------------
SQL 9I> alter table testcom4 add d number;
alter table testcom4 add d number
*
ERROR at line 1:
ORA-22856: cannot add columns to object tables
---------------------------
10g可以进行set unused的操作

SQL 10G>alter table testcom4 set unused column d;

Table altered.
但是drop unused columns依然报错,依然是一个bug

SQL 10G>alter table testcom4 drop unused columns;
alter table testcom4 drop unused columns
*
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

希望下次下载一个patch可以解决这些问题。

August 29

library cache lock wait event

 
说起library cache lock wait event就必须先说一下library cache lock是什么东西,library lock是oracle用来对library cache object进行并发控制的两种数据结构之一,另外一种数据结构是pin,lock将会在pin之前获得,并被加载到library cache handle上。library cache lock有3种形式,share,null,exclusive,具体的library cache lock的分析请看我的另一篇帖子
 
通常我们举的关于library cache lock wait event的例子都是用procedure来模拟的,eygle发表过这方面的文章,今天我举的是另一个例子,也是一次真实的事故,事故发生的原因就是因为ddl导致cursor invalidation,但是ddl持续运行了很长时间,导致后来的session需要parse sql的时候不能获得library cache lock,进一步导致app服务器的connection pool满,导致应用crash。下面我们用一个简单的例子模拟一下这个场景。
 
session 1:
SQL 10G>alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;
Table altered.
 
 
session 2:
 
SQL 10G>select sid from v$mystat where rownum<2;
       SID
----------
       142

SQL 10G>select count(*) from test;
 

session 3:
SQL 10G> select event from v$session_wait where sid=142;
EVENT
--------------------------------------------------------------------------------
library cache lock
当test表过大,创建uk constraint时间过长时会有越来越多的session等待library cache lock从而进一步导致应用crash,这样提醒我们当对大表进行ddl时一定要分解ddl操作,不要一句语句搞定所有操作,象alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;这种语句虽然非常便捷,但是不一定适合在大并发的时候去执行。我们应该把这个操作分解成几步来做,
1. create index ind_test on test(a,b,c,d,e) online compute statistics;
2.  alter table test add constraint uk_test unique(a,b,c,d,e) using index ind_test novalidate;
3.  校验表里面的数据是否符合唯一约束
4.  alter table test modify constraint uk_test validate;
 
这里的要点就是要把每一个ddl分解成小的操作来执行。值得指出的一点是alter table test modify constraint uk_test validate;这个步骤虽然会可能会耗时非常久,但是这个步骤是不会加载library cache exclusive lock的,所以它不会导致其他session等待library cache lock。但是constraint的状态从disable到enable的转换则还是会加载library cache exclusive lock,导致其他进程等待library cache lock。oracle可能在这个地方做了不同的处理,针对disable->enable和novalidate->validate采取了不同的lock类型,显然我们也可以知道disable->enable和novalidate->validate对library cache object的影响是不一样的,一个是可以说是从无到有,一个是只是改变了一下状态,我的猜测是disable->enable的话oracle依然加载了exclusive lock,而novalidate->validate的话oracle加载了null or share lock,有心的人可以dump一下sga来测试一下。
 
 
 
August 04

dbms_stats和leaf_blocks

dbms_stats是oracle用来代替原有的analyze功能的一个包,与analyze相比dbms_stats具有很多优势,比如并行,比如分区信息统计等,但是dbms_stats再分析index的时候处理方式并不是太理想,dbms_stats分析index时将会只统计leaf_blocks为当前有数据的leaf block,而analyze则会统计为所有曾经被使用过的leaf block number,很显然dbms_stats的统计结果会使index fast full scan的成本被严重低估,在某些情况下会错误得选择index fast full scan做为执行路径。下面来看一个例子:
 
先清空原来的表
SQL 10G>truncate table t1;
Table truncated.
 
插入数据
SQL 10G>insert into t1 select
  2  rownum id,
  3  trunc(100 * dbms_random.normal) val,
  4  rpad('x',100) padding
  5  from
  6  all_objects
  7  where
  8  rownum <= 10000
  9  ;
10000 rows created.
 
SQL 10G>commit;
Commit complete.
 
创建索引,并限制pctfree为99,模拟大索引的产生
SQL 10G>create index ind_t1 on t1(id) pctfree 99;
Index created.
 
使用analyze分析索引
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
IND_T1                                  2       10000
 
再使用dbms_stats分析,可以看到在这个时候基本统计数据是相同的
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
PL/SQL procedure successfully completed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
IND_T1                                  2       10010
 
删除数据使表中只保留一条记录
SQL 10G>delete from t1 where rownum<10000;
9999 rows deleted.
 
SQL 10G>commit;
Commit complete.
 
再用analyze分析索引,可以发现leaf_blocks依然是10000
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
 
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
IND_T1                                  2       10000
 
看cost这一项显示index fast full scan的成本为2679,这是正确的
SQL 10G>set autotrace trace exp;

SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan
-------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |  2679  (19)|
|   1 |  SORT AGGREGATE       |        |     1 |            |
|   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |  2679  (19)|
-------------------------------------------------------------
 
使用dbms_stats分析索引,leaf_blocks被统计为1,只统计了当前在用的leaf block
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
PL/SQL procedure successfully completed.
 
SQL 10G>set autotrace off;

SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
IND_T1                                  2           1
 
看cost这一项显示index fast full scan的成本为1,这显然是出现了错误
SQL 10G>set autotrace trace;
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan
-------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |     1   (0)|
|   1 |  SORT AGGREGATE       |        |     1 |            |
|   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |     1   (0)|
-------------------------------------------------------------
 
再来看看它究竟需要读取多少个块,是不是cost=1就够了
 
SQL 10G>   ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;

Execution Plan
----------------------------------------------------------
-------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |     1   (0)|
|   1 |  SORT AGGREGATE       |        |     1 |            |
|   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |     1   (0)|
-------------------------------------------------------------
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      10035  consistent gets
      10016  physical reads
          0  redo size
        411  bytes sent via SQL*Net to client
        385  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
显然,这里发生了10016个physical reads,cost=1是远远不够的。不知道oracle会不会就这个问题有改进方案,大家拭目以待。
 
 
July 24

10g r2 db_file_multiblock_read_count在cbo中的变化

10g r2和前几个版本比起来对db_file_multiblock_read_count在cbo成本计算中的公式做了调整,看一下下面我实验后的对照表。
 
我的测试环境
 
[oracle@csdba ~]$ uname -a
Linux csdba 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005 i686 i686 i386 GNU/Linux
 

SQL*Plus: Release 10.2.0.1.0 - Production on Mon Jul 24 17:07:42 2006
Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning and Data Mining Scoring Engine options
 

分别测试了两个版本下1000行,10000行,100000行记录的6个对比

exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>1000);


exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>10000);


exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>100000);

 

10g r2版本

MBRC BLOCKS COST ADJUSTED_MBRC
4 1000 377 2.652519894
8 1000 273 3.663003663
16 1000 221 4.524886878
32 1000 195 5.128205128
64 1000 182 5.494505495
128 1000 176 5.681818182
MBRC BLOCKS COST ADJUSTED_MBRC
4 10000 3763 2.657454159
8 10000 2722 3.673769287
16 10000 2201 4.543389368
32 10000 1941 5.151983514
64 10000 1811 5.521811154
128 10000 1745 5.730659026
MBRC BLOCKS COST ADJUSTED_MBRC
4 100000 37615 2.658513891
8 100000 27199 3.676605758
16 100000 21990 4.547521601
32 100000 19386 5.158361704
64 100000 18084 5.529750055
128 100000 17433 5.736247347


           
9i r2

MBRC BLOCKS COST ADJUSTED_MBRC
4 1000 241 4.149377593
8 1000 153 6.535947712
16 1000 98 10.20408163
32 1000 63 15.87301587
64 1000 40 25
128 1000 26 38.46153846
MBRC BLOCKS COST ADJUSTED_MBRC
4 10000 2397 4.171881519
8 10000 1519 6.583278473
16 10000 963 10.38421599
32 10000 611 16.36661211
64 10000 388 25.77319588
128 10000 246 40.6504065
MBRC BLOCKS COST ADJUSTED_MBRC
4 100000 23953 4.1748424
8 100000 15179 6.588049279
16 100000 9619 10.39609107
32 100000 6096 16.40419948
64 100000 3863 25.88661662
128 100000 2449 40.83299306


最后得出的测试结果是在10g r2里面,db_file_multiblock_read_count
对cost计算的影响明显变小,看起来oracle对db_file_multiblock_read_count
采取了更谨慎的态度,这样一来不会因为设置了
db_file_multiblock_read_count为一个较大的值而导致数据库倾向于全表扫描。

 

 

July 01

今夜,阿根廷与世界杯再见

 
悲情的阿根廷的主教练,他继承了阿根廷的悲情的传统。巴西莱、帕萨雷拉、贝尔萨、在这一刻灵魂附体,佩克尔曼一个人他代表了阿根廷足球悲情的传统,在这一刻他不是一个人在流泪,他不是一个人!
 
坎比亚索,面对这个点球。他面对的全世界阿根廷球迷的目光和期待。

  莱曼曾经在之前扑出阿亚拉点球,坎比亚索应该深知这一点,他还能够微笑着面对他面前的这个人吗?10秒钟以后他会是怎样的表情?

  球被扑了!比赛结束了!阿根廷队失败了。他们再一次倒在法西斯的球队面前,悲情的阿根廷的主教练!佩克尔曼今天倒下!阿根廷别再为我哭泣!

  这个点球是一个绝对理论上的臭脚。绝对的臭脚,阿根廷队淘汰出了四强!

  这个失败属于阿根廷,属于阿亚拉,属于弗朗哥,属于佩克尔曼,属于所有热爱阿根廷足球的人!

  阿根廷队真的会后悔的,佩克尔曼在下半时他们领先一球的情况下打得太保守、太沉稳了,他失去了自己在小组赛的那种勇气,面对德国悠久的历史,他失去了他在小组赛中那种痛打落水狗的作风,他终于自食其果。阿根廷队该回家了,也许他们不用回遥远的阿根廷,他们不用回家,因为他们大多数人都在欧洲生活,再见!

June 30

first_rows_n和all_rows

first_rows_n和all_rows都是oracle optimizer_mode的选项,他们有什么区别呢,会对优化器产生怎么样的影响呢?让我们一起来解开迷题.

all_rows模式:
all_rows是oracle优化器默认的模式,它将选择一种在最短时间内返回所有数据的执行计划,它将基于整体成本的考虑.

first_rows_n模式:
first_rows_n是从9i开始引入来代替以前的first_rows模式,虽然first_rows模式仍然存在,但是oracle已经不推荐使用.因为它基本上是基于oracle可执行文件硬编码的很多规则实现,比如它会尝试彻底去避免hash join或者merge join除非nest loop的非驱动表会进行全表扫描,first_rows也会偏向于使用索引而不是全表扫描,这在某些情况下也会带来反面的效果.所以oracle引入first_rows_n来代替first_rows,first_rows_n是根据成本而不是基于硬编码的规则来选择执行计划.n可以是1,10,100,1000或者直接用first_rows(n) hint指定任意正数.这里的n是我们想获取结果集的前n条记录,举个例子,如果n为1,那么oracle会选择一个最快速度返回结果集第一条记录的执行计划而不管是否它获取结果集的所有记录的执行成本是不是最优.这种需求在很多分页语句的需求中会碰到.


那么oracle是怎么判断first_rows_n的成本并作出选择的呢,10053跟踪事件能给我们答案


create table t as select * from dba_objects;
create table t1 as select * from t;
create index ind_object_id on t(object_id) compute statistics;
create index ind_t1_object_id on t1(object_id) compute statistics;
analyze table t compute statistics for table for all columns;
analyze table t1 compute statistics for table for all columns;

准备好测试表和索引后来看看测试脚本

all_rows模式:
alter session set events'10053 trace name context forever,level 1';
alter session set optimizer_mode=all_rows;
select t.owner from t,t1 where t.object_id  =  t1.object_id;
alter session set events'10053 trace name context off';

first_rows_1模式:
alter session set events'10053 trace name context forever,level 1';
alter session set optimizer_mode=first_rows_1;
select t.owner from t,t1 where t.object_id   =  t1.object_id;
alter session set events'10053 trace name context off';

first_rows_10模式:
alter session set events'10053 trace name context forever,level 1';
alter session set optimizer_mode=first_rows_10;
select t.owner from t,t1 where t.object_id   =  t1.object_id;
alter session set events'10053 trace name context off';

first_rows_100模式:
alter session set events'10053 trace name context forever,level 1';
alter session set optimizer_mode=first_rows_100;
select t.owner from t,t1 where t.object_id   =  t1.object_id;
alter session set events'10053 trace name context off';

由于篇幅太长,所以把10053的trace文件简化了一下,只留下join这一部分的内容,并把merge join的部分去除了
测试环境是10g r2

all_rows:

**************************
GENERAL PLANS
**************************
Considering cardinality-based initial join order.
***********************
Join order[1]:   T[T]#0  T1[T1]#1
***************
Now joining: T1[T1]#1
***************
NL Join
  Outer table: Card: 51986.00  Cost: 164.59  Resp: 164.59  Degree: 1  Bytes: 9
  Inner table: T1  Alias: T1
  Access Path: TableScan
    NL Join:  Cost: 8493121.71  Resp: 8493121.71  Degree: 0
      Cost_io: 8358538.00  Cost_cpu: 839658589661
      Resp_io: 8358538.00  Resp_cpu: 839658589661
  Access Path: index (index (FFS))
    Index: IND_T1_OBJECT_ID
    resc_io: 25.16  resc_cpu: 7056806
    ix_sel: 0.0000e+00  ix_sel_with_filters: 1
  Inner table: T1  Alias: T1
  Access Path: index (FFS)
    NL Join:  Cost: 1366740.53  Resp: 1366740.53  Degree: 0
      Cost_io: 1307937.00  Cost_cpu: 366871247240
      Resp_io: 1307937.00  Resp_cpu: 366871247240
  Access Path: index (AllEqJoinGuess)
    Index: IND_T1_OBJECT_ID
    resc_io: 1.00  resc_cpu: 8371
    ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
    NL Join: Cost: 52220.34  Resp: 52220.34  Degree: 1
      Cost_io: 52148.00  Cost_cpu: 451348998
      Resp_io: 52148.00  Resp_cpu: 451348998
  Best NL cost: 52220.34
          resc: 52220.34 resc_io: 52148.00 resc_cpu: 451348998
          resp: 52220.34 resp_io: 52148.00 resp_cpu: 451348998
Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
Join Card - Rounded: 51982 Computed: 51982.00

HA Join
  Outer table:
    resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
  Inner table: T1  Alias: T1
    resc: 28.13  card: 51986.00  bytes: 4  deg: 1  resp: 28.13
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.58  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
HA Join (swap)
  Outer table:
    resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
  Inner table:  T  Alias: T
    resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.58  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
  HA cost: 195.30
     resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
     resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
Best:: JoinMethod: Hash
       Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
***********************
Best so far: Table#: 0  cost: 164.5888  card: 51986.0000  bytes: 467874
             Table#: 1  cost: 195.3030  card: 51982.0000  bytes: 675766
            
计算第一种join顺序的成本值,T做驱动表,T1做内部表,
Best:: JoinMethod: Hash
       Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
在这里可以看到最优join方式是hash join,
最终的成本是195.30,返回结果集记录数是51982


***********************
Join order[2]:  T1[T1]#1   T[T]#0
***************
Now joining:  T[T]#0
***************
NL Join
  Outer table: Card: 51986.00  Cost: 28.13  Resp: 28.13  Degree: 1  Bytes: 4
  Inner table:  T  Alias: T
  Access Path: TableScan
    NL Join:  Cost: 8492985.25  Resp: 8492985.25  Degree: 0
      Cost_io: 8358403.00  Cost_cpu: 839649495148
      Resp_io: 8358403.00  Resp_cpu: 839649495148
  Access Path: index (AllEqJoinGuess)
    Index: IND_OBJECT_ID
    resc_io: 2.00  resc_cpu: 15913
    ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
    NL Join (ordered): Cost: 104132.73  Resp: 104132.73  Degree: 1
      Cost_io: 103999.00  Cost_cpu: 834303785
      Resp_io: 103999.00  Resp_cpu: 834303785
  Best NL cost: 104132.73
          resc: 104132.73 resc_io: 103999.00 resc_cpu: 834303785
          resp: 104132.73 resp_io: 103999.00 resp_cpu: 834303785
Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
Join Card - Rounded: 51982 Computed: 51982.00
HA Join
  Outer table:
    resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
  Inner table:  T  Alias: T
    resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.58  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
  HA cost: 195.30
     resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
     resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
Join order aborted: cost > best plan cost
计算第二种join顺序的成本值,T1做驱动表,T做内部表,
Join order aborted: cost > best plan cost
第二种join顺序被放弃,因为成本大于已经第一种join顺序的最优成本


***********************
(newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:2000
*********************************
Number of join permutations tried: 2
*********************************
(newjo-save)    [1 0 ]
Final - All Rows Plan:  Best join order: 1
  Cost: 195.3030  Degree: 1  Card: 51982.0000  Bytes: 675766
  Resc: 195.3030  Resc_io: 189.0000  Resc_cpu: 39324090
  Resp: 195.3030  Resp_io: 189.0000  Resc_cpu: 39324090

在All Rows模式下最终优化器选择了Best join order: 1,Cost: 195.3030,
尝试了2种join 顺序(Number of join permutations tried: 2)

 

 


first_rows_1模式:


***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]:   T[T]#0  T1[T1]#1
***************
Now joining: T1[T1]#1
***************
NL Join
  Outer table: Card: 51986.00  Cost: 164.59  Resp: 164.59  Degree: 1  Bytes: 9
  Inner table: T1  Alias: T1
  Access Path: TableScan
    NL Join:  Cost: 8493121.71  Resp: 8493121.71  Degree: 0
      Cost_io: 8358538.00  Cost_cpu: 839658589661
      Resp_io: 8358538.00  Resp_cpu: 839658589661
  Access Path: index (index (FFS))
    Index: IND_T1_OBJECT_ID
    resc_io: 25.16  resc_cpu: 7056806
    ix_sel: 0.0000e+00  ix_sel_with_filters: 1
  Inner table: T1  Alias: T1
  Access Path: index (FFS)
    NL Join:  Cost: 1366740.53  Resp: 1366740.53  Degree: 0
      Cost_io: 1307937.00  Cost_cpu: 366871247240
      Resp_io: 1307937.00  Resp_cpu: 366871247240
  Access Path: index (AllEqJoinGuess)
    Index: IND_T1_OBJECT_ID
    resc_io: 1.00  resc_cpu: 8371
    ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
    NL Join: Cost: 52220.34  Resp: 52220.34  Degree: 1
      Cost_io: 52148.00  Cost_cpu: 451348998
      Resp_io: 52148.00  Resp_cpu: 451348998
  Best NL cost: 52220.34
          resc: 52220.34 resc_io: 52148.00 resc_cpu: 451348998
          resp: 52220.34 resp_io: 52148.00 resp_cpu: 451348998
Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
Join Card - Rounded: 51982 Computed: 51982.00
HA Join
  Outer table:
    resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
  Inner table: T1  Alias: T1
    resc: 28.13  card: 51986.00  bytes: 4  deg: 1  resp: 28.13
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.58  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
HA Join (swap)
  Outer table:
    resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
  Inner table:  T  Alias: T
    resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.58  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
  HA cost: 195.30
     resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
     resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
Best:: JoinMethod: Hash
       Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
***********************
Best so far: Table#: 0  cost: 164.5888  card: 51986.0000  bytes: 467874
             Table#: 1  cost: 195.3030  card: 51982.0000  bytes: 675766
*********************************
Number of join permutations tried: 1
*********************************
(newjo-save)    [1 0 ]
Final - All Rows Plan:  Best join order: 1
  Cost: 195.3030  Degree: 1  Card: 51982.0000  Bytes: 675766
  Resc: 195.3030  Resc_io: 189.0000  Resc_cpu: 39324090
  Resp: 195.3030  Resp_io: 189.0000  Resc_cpu: 39324090
kkoipt: Query block SEL$1 (#0)
******* UNPARSED QUERY IS *******
SELECT /*+ NO_STAR_TRANSFORMATION NO_EXPAND */ "T"."OWNER" "OWNER" FROM "TEST"."T" "T","TEST"."T1" "T1" WHERE "T"."OBJECT_ID"="T1"."OBJECT_ID"
kkoqbc-end
          : call(in-use=32712, alloc=49112), compile(in-use=35284, alloc=36696)
First K Rows: K/N ratio = 0.000019237428341, qbc=0x905f2620
First K Rows: Setup end
***********************
 
在FIRST_ROWS_1模式下,oracle会先按ALL_ROWS模式计算一种join顺序(Number of join permutations tried: 1)
,得到返回结果集的大小,
从而计算出FIRST_ROWS_1中的1条记录和所有结果集记录的一个比率值,
Join Card - Rounded: 51982 Computed: 51982.00
First K Rows: K/N ratio = 1/51982=0.000019237428341
通过这个K/N ratio,oracle会重新计算join cost


SINGLE TABLE ACCESS PATH (First K Rows)
 
Table:  T  Alias: T    
    Card: Original: 2  Rounded: 2  Computed: 2.00  Non Adjusted: 2.00
  Access Path: TableScan
    Cost:  2.00  Resp: 2.00  Degree: 0
      Cost_io: 2.00  Cost_cpu: 7541
      Resp_io: 2.00  Resp_cpu: 7541
  Best:: AccessPath: TableScan
         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 2.00  Bytes: 9
***************************************
SINGLE TABLE ACCESS PATH (First K Rows)
  Table: T1  Alias: T1    
    Card: Original: 25996  Rounded: 25996  Computed: 25996.00  Non Adjusted: 25996.00
  Access Path: TableScan
    Cost:  83.30  Resp: 83.30  Degree: 0
      Cost_io: 82.00  Cost_cpu: 8079850
      Resp_io: 82.00  Resp_cpu: 8079850
  Access Path: index (index (FFS))
    Index: IND_T1_OBJECT_ID
    resc_io: 14.00  resc_cpu: 3532204
    ix_sel: 0.0000e+00  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  14.57  Resp: 14.57  Degree: 1
      Cost_io: 14.00  Cost_cpu: 3532204
      Resp_io: 14.00  Resp_cpu: 3532204
  Access Path: index (FullScan)
    Index: IND_T1_OBJECT_ID
    resc_io: 59.00  resc_cpu: 5618765
    ix_sel: 1  ix_sel_with_filters: 1
    Cost: 59.90  Resp: 59.90  Degree: 1
  Best:: AccessPath: IndexFFS  Index: IND_T1_OBJECT_ID
         Cost: 14.57  Degree: 1  Resp: 14.57  Card: 25996.00  Bytes: 4
First K Rows: unchanged join prefix len = 1

***********************
Join order[1]:   T[T]#0  T1[T1]#1
***************
Now joining: T1[T1]#1
***************
NL Join
  Outer table: Card: 2.00  Cost: 2.00  Resp: 2.00  Degree: 1  Bytes: 9
  Inner table: T1  Alias: T1
  Access Path: TableScan
    NL Join:  Cost: 166.59  Resp: 166.59  Degree: 0
      Cost_io: 164.00  Cost_cpu: 16167241
      Resp_io: 164.00  Resp_cpu: 16167241
  Access Path: index (index (FFS))
    Index: IND_T1_OBJECT_ID
    resc_io: 13.50  resc_cpu: 3532204
    ix_sel: 0.0000e+00  ix_sel_with_filters: 1
  Inner table: T1  Alias: T1
  Access Path: index (FFS)
    NL Join:  Cost: 30.13  Resp: 30.13  Degree: 0
      Cost_io: 29.00  Cost_cpu: 7071948
      Resp_io: 29.00  Resp_cpu: 7071948
  Access Path: index (AllEqJoinGuess)
    Index: IND_T1_OBJECT_ID
    resc_io: 1.00  resc_cpu: 8371
    ix_sel: 3.8475e-05  ix_sel_with_filters: 3.8475e-05
    NL Join: Cost: 4.00  Resp: 4.00  Degree: 1
      Cost_io: 4.00  Cost_cpu: 24284
      Resp_io: 4.00  Resp_cpu: 24284
  Best NL cost: 4.00
          resc: 4.00 resc_io: 4.00 resc_cpu: 24284
          resp: 4.00 resp_io: 4.00 resp_cpu: 24284
Join Card:  1.00 = outer (2.00) * inner (25996.00) * sel (1.9234e-05)
Join Card - Rounded: 1 Computed: 1.00

HA Join
  Outer table:
    resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
  Inner table: T1  Alias: T1
    resc: 14.57  card: 25996.00  bytes: 4  deg: 1  resp: 14.57
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.17  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 181.32  Resp: 181.32  [multiMatchCost=0.00]
HA Join (swap)
  Outer table:
    resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
  Inner table:  T  Alias: T
    resc: 2.00  card: 2.00  bytes: 9  deg: 1  resp: 2.00
    using dmeth: 2  #groups: 1
    Cost per ptn: 1.75  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 31.88  Resp: 31.88  [multiMatchCost=0.00]
  HA cost: 31.88
     resc: 31.88 resc_io: 29.00 resc_cpu: 17981913
     resp: 31.88 resp_io: 29.00 resp_cpu: 17981913
Best:: JoinMethod: NestedLoop
       Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.00  Bytes: 13
***********************
Best so far: Table#: 0  cost: 2.0012  card: 2.0000  bytes: 18
             Table#: 1  cost: 4.0039  card: 1.0000  bytes: 13
***********************

经过重新计算后,
计算第一种join顺序的成本值,T做驱动表,T1做内部表,
Best:: JoinMethod: NestedLoop
       Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.00  Bytes: 13
在这里可以看到最优join方式是nest loop,这和ALL_ROWS下选择有了区别
最终的成本是4.00,返回结果集记录数是1(Join Card - Rounded: 1)

***********************
Join order[2]:  T1[T1]#1   T[T]#0
***************
Now joining:  T[T]#0
***************
NL Join
  Outer table: Card: 2.00  Cost: 2.00  Resp: 2.00  Degree: 1  Bytes: 4
  Inner table:  T  Alias: T
  Access Path: TableScan
    NL Join:  Cost: 166.59  Resp: 166.59  Degree: 0
      Cost_io: 164.00  Cost_cpu: 16167061
      Resp_io: 164.00  Resp_cpu: 16167061
  Access Path: index (AllEqJoinGuess)
    Index: IND_OBJECT_ID
    resc_io: 2.00  resc_cpu: 15913
    ix_sel: 3.8475e-05  ix_sel_with_filters: 3.8475e-05
    NL Join (ordered): Cost: 5.01  Resp: 5.01  Degree: 1
      Cost_io: 5.00  Cost_cpu: 31647
      Resp_io: 5.00  Resp_cpu: 31647
  Best NL cost: 5.01
          resc: 5.01 resc_io: 5.00 resc_cpu: 31647
          resp: 5.01 resp_io: 5.00 resp_cpu: 31647
Join Card:  1.00 = outer (2.00) * inner (25996.00) * sel (1.9234e-05)
Join Card - Rounded: 1 Computed: 1.00
HA Join
  Outer table:
    resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
  Inner table:  T  Alias: T
    resc: 83.30  card: 25996.00  bytes: 9  deg: 1  resp: 83.30
    using dmeth: 2  #groups: 1
    Cost per ptn: 2.17  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 113.59  Resp: 113.59  [multiMatchCost=0.00]
HA Join (swap)
  Outer table:
    resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
  Inner table: T1  Alias: T1
    resc: 2.00  card: 2.00  bytes: 4  deg: 1  resp: 2.00
    using dmeth: 2  #groups: 1
    Cost per ptn: 1.75  #ptns: 1
    hash_area: 0 (max=0)   Hash join: Resc: 168.34  Resp: 168.34  [multiMatchCost=0.00]
  HA cost: 168.34
     resc: 168.34 resc_io: 164.00 resc_cpu: 27076246
     resp: 168.34 resp_io: 164.00 resp_cpu: 27076246
Join order aborted: cost > best plan cost

计算第二种join顺序的成本值,T1做驱动表,T做内部表,
Join order aborted: cost > best plan cost
第二种join顺序被放弃,因为成本大于已经第一种join顺序的最优成本

***********************
(newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:1000
*********************************
Number of join permutations tried: 2
*********************************
(newjo-save)    [1 0 ]
Final - First K Rows Plan:  Best join order: 1
  Cost: 4.0039  Degree: 1  Card: 1.0000  Bytes: 13
  Resc: 4.0039  Resc_io: 4.0000  Resc_cpu: 24284
  Resp: 4.0039  Resp_io: 4.0000  Resc_cpu: 24284
kkoipt: Query block SEL$1 (#0)

在FIRST_Rows_1模式下最终优化器选择了Best join order: 1,Cost: 4.0039,
尝试了2种join 顺序(Number of join permutations tried: 2)
实际上是3种,包括了一次在ALL_ROWS模式下的计算


另外再看一下
FIRST_Rows_10
FIRST_Rows_100
最终的执行计划选择和成本计算

FIRST_Rows_10:

Final - First K Rows Plan:  Best join order: 1
  Cost: 13.0163  Degree: 1  Card: 10.0000  Bytes: 130
  Resc: 13.0163  Resc_io: 13.0000  Resc_cpu: 101517
  Resp: 13.0163  Resp_io: 13.0000  Resc_cpu: 101517

FIRST_Rows_100:

Final - First K Rows Plan:  Best join order: 1
  Cost: 31.8883  Degree: 1  Card: 51982.0000  Bytes: 1143604
  Resc: 31.8883  Resc_io: 29.0000  Resc_cpu: 18019724
  Resp: 31.8883  Resp_io: 29.0000  Resc_cpu: 18019724

值得注意,FIRST_Rows_100选择了hash


再看一下执行计划

ALL_ROWS:
-------------------------------------------------------------------------------
| Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  | 51982 |   659K|   195   (4)|
|*  1 |  HASH JOIN            |                  | 51982 |   659K|   195   (4)|
|   2 |   INDEX FAST FULL SCAN| IND_T1_OBJECT_ID | 51986 |   203K|    28   (4)|
|   3 |   TABLE ACCESS FULL   | T                | 51986 |   456K|   165   (2)|
-------------------------------------------------------------------------------

FIRST_Rows_1:
----------------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |     1 |    13 |     4   (0)|
|   1 |  NESTED LOOPS      |                  |     1 |    13 |     4   (0)|
|   2 |   TABLE ACCESS FULL| T                | 25996 |   228K|     2   (0)|
|*  3 |   INDEX RANGE SCAN | IND_T1_OBJECT_ID |     1 |     4 |     1   (0)|
----------------------------------------------------------------------------

FIRST_Rows_10:
----------------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |    10 |   130 |    13   (0)|
|   1 |  NESTED LOOPS      |                  |    10 |   130 |    13   (0)|
|   2 |   TABLE ACCESS FULL| T                | 47264 |   415K|     2   (0)|
|*  3 |   INDEX RANGE SCAN | IND_T1_OBJECT_ID |     1 |     4 |     1   (0)|
----------------------------------------------------------------------------

FIRST_Rows_100:
-------------------------------------------------------------------------------
| Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  | 51982 |  1116K|    32  (10)|
|*  1 |  HASH JOIN            |                  | 51982 |  1116K|    32  (10)|
|   2 |   INDEX FAST FULL SCAN| IND_T1_OBJECT_ID | 51986 |   203K|    28   (4)|
|   3 |   TABLE ACCESS FULL   | T                | 51986 |   456K|     2   (0)|
-------------------------------------------------------------------------------

总结来说,first_rows_n基于成本计算,根据优先返回行数N重新计算各个对象的访问成本,
从而生成最快返回前N条记录的执行计划.

June 23

and_equal,index_join,index_combine

and_equal,index_join,index_combine这三种都是oracle利用索引关联获得数据的方法,三者的目的都是为了最大限度的利用索引,减少回表的代价.但是三者的实现方法是有区别的,下面一一来分析.
and_equal:
这种方式需要查询条件里面包括所有索引列,然后取得每个索引中得到的rowid列表,然后对这些列表做merge join,过滤出相同的rowid后再去表中获取数据或者直接从索引中获得数据.and_equal有一些限制,比如它只对单列索引有效,只对非唯一索引有效,使用到的索引不能超过5个,查询条件只能是"=".在10g中,and_equal已经被废弃了,只能通过hint才能生效.
create table test as select * from dba_objects;
create index ind_test_owner on test(owner);
create index ind_test_object_name on  test(object_name);

SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
-------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      |     1 |    29 |     2   (0)|
|*  1 |  AND-EQUAL        |                      |       |       |            |
|*  2 |   INDEX RANGE SCAN| IND_TEST_OWNER       |     1 |       |     1   (0)|
|*  3 |   INDEX RANGE SCAN| IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
-------------------------------------------------------------------------------
如果查询条件只包含owner

SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    29 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    29 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
-----------------------------------------------------------------------------------
修改owner,object_name为非空
alter table test modify(owner not null);
alter table test modify(object_name not null);

SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    29 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    29 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
-----------------------------------------------------------------------------------
效果一样
查询条件是">"的情况
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner>'test' and object_name='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
-----------------------------------------------------------------------------------------
查询条件是in的情况
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner in('test','dba') and object_name='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
-----------------------------------------------------------------------------------------

再来看一下回表的情况
SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
|   2 |   AND-EQUAL                 |                      |       |       |            |
|*  3 |    INDEX RANGE SCAN         | IND_TEST_OWNER       |     1 |       |     1   (0)|
|*  4 |    INDEX RANGE SCAN         | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
-----------------------------------------------------------------------------------------
先通过and_equal取得rowid列表,然后从表中返回数据.
 
index_join:
index join顾名思义是对index进行关联,oracle通过hash index join的方式实现了避免对表的访问.所有的数据都从索引中直接获得.它不受查询条件影响,可以是唯一索引,也可以是多列索引.
SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation          | Name                 | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                      |     1 |    29 |     3  (34)|
|*  1 |  VIEW              | index$_join$_001     |     1 |    29 |     3  (34)|
|*  2 |   HASH JOIN        |                      |       |       |            |
|*  3 |    INDEX RANGE SCAN| IND_TEST_OWNER       |     1 |    29 |     1   (0)|
|*  4 |    INDEX RANGE SCAN| IND_TEST_OBJECT_NAME |     1 |    29 |     1   (0)|
--------------------------------------------------------------------------------
可以不带查询条件,只不过由index range scan变成了index fast full scan
SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
  2  ;
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      | 51984 |  1472K|   478   (2)|
|   1 |  VIEW                  | index$_join$_001     | 51984 |  1472K|   478   (2)|
|*  2 |   HASH JOIN            |                      |       |       |            |
|   3 |    INDEX FAST FULL SCAN| IND_TEST_OWNER       | 51984 |  1472K|   153   (2)|
|   4 |    INDEX FAST FULL SCAN| IND_TEST_OBJECT_NAME | 51984 |  1472K|   322   (1)|
------------------------------------------------------------------------------------

如果不是所有数据都能从索引获得,那么将不会使用index join
SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    40 |     2   (0)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    40 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
-----------------------------------------------------------------------------------

index_combine:
index combine最早是用在bitmap index上的,在9i开始oracle默认可以使用在btree索引上,这是由_b_tree_bitmap_plans参数来控制的.oracle将btree索引中获得的rowid信息通过BITMAP CONVERSION FROM ROWIDS的步骤转换成bitmap进行匹配,然后匹配完成后通过BITMAP CONVERSION TO ROWIDS再转换出rowid获得数据或者回表获得数据.
 
SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                      |     1 |    29 |     2   (0)|
|   1 |  BITMAP CONVERSION TO ROWIDS    |                      |     1 |    29 |     2   (0)|
|   2 |   BITMAP AND                    |                      |       |       |            |
|   3 |    BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
|*  4 |     INDEX RANGE SCAN            | IND_TEST_OWNER       |       |       |     1   (0)|
|   5 |    BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
|*  6 |     INDEX RANGE SCAN            | IND_TEST_OBJECT_NAME |       |       |     1   (0)|
---------------------------------------------------------------------------------------------
回表取数据的情况
SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                      |     1 |    40 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID     | TEST                 |     1 |    40 |     2   (0)|
|   2 |   BITMAP CONVERSION TO ROWIDS    |                      |       |       |            |
|   3 |    BITMAP AND                    |                      |       |       |            |
|   4 |     BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
|*  5 |      INDEX RANGE SCAN            | IND_TEST_OWNER       |       |       |     1   (0)|
|   6 |     BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
|*  7 |      INDEX RANGE SCAN            | IND_TEST_OBJECT_NAME |       |       |     1   (0)|
----------------------------------------------------------------------------------------------
不带查询条件的情况,index combine将不被使用
SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
  2  ;
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 51984 |  1472K|  1480   (1)|
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           | 51984 |  1472K|  1480   (1)|
|   2 |   INDEX FULL SCAN           | IND_TEST_OWNER | 51984 |       |   123   (2)|
-----------------------------------------------------------------------------------
index_combine会是and_equal的很好的替代者,随着and_equal的退出,index_combine将更多得被我们看到.
April 06

转载小惠惠的文章

原文链接请看http://spaces.msn.com/roujiaweize/ 

替小惠惠的blog做一下广告哈

oracle cursor 游标(二)

 

关于 ORA-01000: maximum open cursors exceeded 这个问题的相关东西,我搜了一些文章贴一下。
 
ORA-01000 maximum open cursors exceeded

Cause: A host language program attempted to open too many cursors. The initialization parameter OPEN_CURSORS determines the maximum number of cursors per user.

Action: Modify the program to use fewer cursors. If this error occurs often, shut down Oracle, increase the value of OPEN_CURSORS, and then restart Oracle.

 
 
关于cursor 的参数有这么几个:
SQL> show parameter cursor
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cursor_sharing                       string      EXACT
cursor_space_for_time                boolean     FALSE
open_cursors                         integer     800
session_cached_cursors               integer     0
 
这里,cursor_sharing跟open_cursors的数量没有关系。open_cursors的数量,它包括了oracle服务器端session_cached_cursors的数量,以及应用服务器端cursor cache size的数量。
 
关于session_cached_cursors这个参数,可以看 汪海 写的2篇文章:
 
 
这个问题的根源,除了上面3个参数,主要在于程序的问题,在itpub 上我们可以看到:
 

很多朋友在Java开发中,使用Oracle数据库的时候,经常会碰到有ORA-01000: maximum open cursors exceeded.的错误。

实际上,这个错误的原因,主要还是代码问题引起的。
ora-01000: maximum open cursors exceeded.
表示已经达到一个进程打开的最大游标数。

这样的错误很容易出现在Java代码中的主要原因是:Java代码在执行conn.createStatement()和conn.prepareStatement()的时候,实际上都是相当与在数据库中打开了一个cursor。尤其是,如果你的createStatement和prepareStatement是在一个循环里面的话,就会非常容易出现这个问题。因为游标一直在不停的打开,而且没有关闭。

一般来说,我们在写Java代码的时候,createStatement和prepareStatement都应该要放在循环外面,而且使用了这些Statment后,及时关闭。最好是在执行了一次executeQuery、executeUpdate等之后,如果不需要使用结果集(ResultSet)的数据,就马上将Statment关闭。

对于出现ORA-01000错误这种情况,单纯的加大open_cursors并不是好办法,那只是治标不治本。实际上,代码中的隐患并没有解除。
而且,绝大部分情况下,open_cursors只需要设置一个比较小的值,就足够使用了,除非有非常特别的要求。

March 22

pctversion in lob segment

在oracle的官方文档上有这么一段话来解释pctversion
 
PCTVERSION integer

Specify the maximum percentage of overall LOB storage space used for maintaining old versions of the LOB. The default value is 10, meaning that older versions of the LOB data are not overwritten until they consume 10% of the overall LOB storage space.

You can specify the PCTVERSION parameter whether the database is running in manual or automatic undo mode. PCTVERSION is the default in manual undo mode. RETENTION is the default in automatic undo mode.

 
 
由于lob segment采用了自己特殊的一致性读的实现,不是使用undo tablespace来保留前映象,而是当发生update时在lob segment内分配一个chunk去插入一条新的记录。这样的话,如果一条记录修改了多次,那么它就存在多个版本,对于很大的lob对象来说,这是十分浪费空间的。所以oracle需要有一个办法来控制这个保留前映象的空间的大小,pctversion就是为了实现这个功能的。pctversion是一个百分比的值,就是当前所有lob空间中用来存放前映象的百分比,如果前映象的空间大于pctversion了,那么oracle将会重用这些前映象空间而不去扩展。
 
对比一下不同pctversion对空间占用的影响
 
SQL 10G>select TABLE_NAME,SEGMENT_NAME,CHUNK,PCTVERSION,RETENTION from user_lobs where segment_name='TEXT_LOB';
TABLE_NAME                     SEGMENT_NAME                        CHUNK PCTVERSION  RETENTION
------------------------------ ------------------------------ ---------- ---------- ----------
TESTLOB                        TEXT_LOB                             8192          1
 
SQL 10G>truncate table testlob;
Table truncated.
 
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- -----------------------------------------
TEXT_LOB                                                                          .0625M
 

SQL 10G>insert into testlob values(1,rpad('a',4000));
1 row created.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>select dbms_lob.getlength(text) from testlob;
DBMS_LOB.GETLENGTH(TEXT)
------------------------
                    4000
 
SQL 10G>begin        
  2  for i in 1..10 loop
  3  update testlob set text=text||text;
  4  commit;
  5  end loop;
  6  end;
  7  /
PL/SQL procedure successfully completed.
 
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- -----------------------------------------
TEXT_LOB                                                                          13M
 
SQL 10G>truncate table testlob;
Table truncated.
 
SQL 10G>alter table testlob modify lob(text) (pctversion 99);
Table altered.
 
SQL 10G>insert into testlob values(1,rpad('a',4000));
1 row created.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- -----------------------------------------
TEXT_LOB                                                                          .0625M
 
SQL 10G>begin
  2  for i in 1..10 loop
  3  update testlob set text=text||text;
  4  commit;
  5  end loop;
  6  end;
  7  /
PL/SQL procedure successfully completed.
 
SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
--------------------------------------------------------------------------------- -----------------------------------------
TEXT_LOB                                                                          17M
 
SQL 10G>select dbms_lob.getlength(text) from testlob;
DBMS_LOB.GETLENGTH(TEXT)
------------------------
                 4096000
 
 
 
可以看到pctversion的设置对lob空间大小还是有影响的,在很大的lob segment中这种影响会表现的十分明显,有兴趣的人可以对大lob段做测试。
March 16

parallel query in partition table

上一篇文章里面提到parallel query相关的checkpoint,也了解了direct read会对系统性能带来的潜在影响,这次让我们看一下parallel query在partition table上的表现。
 
 
创建2种分区表
 
CREATE TABLE TESTHASH(A NUMBER)
PARTITION BY HASH (A)
(PARTITION P1,
PARTITION P2,
PARTITION P3,
PARTITION P4)
 
 
CREATE TABLE TESTLIST(A NUMBER)
PARTITION BY LIST (A)
(PARTITION P1 VALUES(1),
PARTITION P2 VALUES(2),
PARTITION P3 VALUES(3),
PARTITION P4 VALUES(4))
 
SQL 10G>select distinct a from testhash partition (p1);
         6
SQL 10G>select distinct a from testhash partition (p2);
         9
SQL 10G>select distinct a from testhash partition (p3);
         2
SQL 10G>select distinct a from testhash partition (p4);
         1
 
SQL 10G>select count(*) from testhash partition (p1);
   1048623
SQL 10G>select count(*) from testhash partition (p2);
   1048623
SQL 10G>select count(*) from testhash partition (p3);
   1048623
SQL 10G>select count(*) from testhash partition (p4);
   1048623
 
 
 
SQL 10G>select distinct a from testlist partition (p1);
         1
SQL 10G>select distinct a from testlist partition (p2);
         2
SQL 10G>select distinct a from testlist partition (p3);
         3
SQL 10G>select distinct a from testlist partition (p4);
         4
 
SQL 10G>select count(*) from testlist partition (p1);
   1048623
SQL 10G>select count(*) from testlist partition (p2);
   1048623
SQL 10G>select count(*) from testlist partition (p3);
   1048623
SQL 10G>select count(*) from testlist partition (p4);
   1048623
 
两张分区表都是4个分区,每个分区包含1048623条记录
 
 
另外还有几个考查direct read与否的指标
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
----------------------------------------------------------------          ----------
consistent gets from cache                                           137327
consistent gets direct                                                  574
physical reads direct                                                     574
table scans (direct read)                                                 0
 
我们运行几个sql来看parallel query在分区表上的表现
 
 
SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist;
  COUNT(*)
----------
   4194492
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           137449
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)
 
 
SQL 10G>select 137449-137327 "consistent gets from cache",6934-574 "consistent gets direct",6934-574 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                       122                   6360                  6360
 
可以看到当parallel query包含所有分区的时候将会去做direct read
 
看看一个分区的情况
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           137449
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)                                                55
 
SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a=1;
  COUNT(*)
----------
   1048623
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           139307
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)                                                55
 
SQL 10G>select 139307-137449 "consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                      1858                      0                     0
 
 
当查询一个分区时可以看到并没有采用direct read,而是采用普通的consistent get
 
 
两个分区的情况
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           139307
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)                                                55
 
SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2);
 
  COUNT(*)
----------
   2097246
 
 
SQL 10G>SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           142788
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)                                                55
 
SQL 10G>select 142788-139307"consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                      3481                      0                     0
 
同样没有发生direct read
 
 
三个分区呢?
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           142788
consistent gets direct                                                 6934
physical reads direct                                                  6934
table scans (direct read)                                                55
 
SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2,3);
  COUNT(*)
----------
   3145869
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           142907
consistent gets direct                                                11704
physical reads direct                                                 11704
table scans (direct read)                                               110
 
 
SQL 10G>select 142907-142788 "consistent gets from cache",11704-6934 "consistent gets direct",11704-6934 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                       119                   4770                  4770
 
可以看到发生了3个分区的direct read
 
如果我们查询不存在的分区值呢
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           142907
consistent gets direct                                                11704
physical reads direct                                                 11704
table scans (direct read)                                               110
 
SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(5,6,7);
  COUNT(*)
----------
         0
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets from cache                                           142907
consistent gets direct                                                11704
physical reads direct                                                 11704
table scans (direct read)                                               110
SQL 10G>select 142907-142907 "consistent gets from cache",11704-11704"consistent gets direct",11704-11704"physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                         0                      0                     0
 
很好,没有读testlist表的任何block
 
再来看一下hash partition table的表现
 
select /*+ parallel(testhash,4)*/count(*) from testhash;
 
SQL 10G>select 149410-149287 "consistent gets from cache",18064-11704 "consistent gets direct",18064-11704 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                       123                   6360                  6360
                      
select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1);
 
SQL 10G>select 151269-149410 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                      1859                      0                     0
                     
select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2);
 
SQL 10G>select 154751-151269 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                      3482                      0                     0
 
select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2,6);
 
SQL 10G>select 154871-154751 "consistent gets from cache",22834-18064 "consistent gets direct",22834-18064 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                       120                   4770                  4770
                      
select /*+ parallel(testhash,4)*/count(*) from testhash where a in(3);  
                   
SQL 10G>select 160572-158713 "consistent gets from cache",37144-37144 "consistent gets direct",37144-37144 "physical reads direct" from dual;
 
consistent gets from cache consistent gets direct physical reads direct
-------------------------- ---------------------- ---------------------
                      1859                      0                     0
 
可以看到hash partition table在其他情况都和list partition table一样,但是在查询不存在的分区值时还是会去读取testhash表的block,当然这是由于hash partition的机制决定的。
 
总结来说,parallel query在partition table上并不是时时在做direct read,它是和查询包含的partition number有关的,至于如何利用好parallel query的优势那又是另外的话题了,大家看完这2篇文章后可以自己去找寻正确的结论。
 
March 15

DBWR parallel query checkpoint buffers written

DBWR parallel query checkpoint buffers written是oracle的一种特殊的checkpoint的产物,这种checkpoint在oracle8 以前称作Extent-based Checkpoint,当发生parallel query时oracle必须做checkpoint把脏数据写入磁盘,因为parallel query走的时direct read,直接从文件读入pga,如果有脏数据没有被写回磁盘,那么读出来的结果将会不一致。在oracle8以前,如果发生direct read,那么当每一个extent被读入的时候oracle会先去检查data buffer中有没和这个extent相关的dirty block,如果有就会驱动dbwr来写出,同时增加一次checkpoint的次数。这样的话如果这个表有N个extent都存在dirty block,那么将会发生N次checkpoint,这样的话将会使parallel query的执行时间大大超过normal query。在oracle8中这种情况有了改进,Extent-based Checkpoint变成了object-based checkpoint,不再会对单独的extent做checkpoint。
 
让我们再回到主题,DBWR parallel query checkpoint buffers written
 
首先来看一下v$statname
 
SQL 10G>select statistic#,name from v$statname where name like '%DBWR%';
STATISTIC# NAME
---------- ----------------------------------------------------------------
        69 DBWR checkpoint buffers written
        70 DBWR thread checkpoint buffers written
        71 DBWR tablespace checkpoint buffers written
        72 DBWR parallel query checkpoint buffers written
        73 DBWR object drop buffers written
        74 DBWR transaction table writes
        75 DBWR undo block writes
        76 DBWR revisited being-written buffer
        77 DBWR make free requests
        78 DBWR lru scans
        79 DBWR checkpoints
STATISTIC# NAME
---------- ----------------------------------------------------------------
        80 DBWR fusion writes
 
再来创建一张测试表
 
create table test(a number)
 
 
SQL 10G>select count(*) from test;
  COUNT(*)
----------
   4194496
 
SQL 10G>select distinct a from test;
         A
----------
         1
 
这个测试表包含了4194496条值为1的记录
 
先刷新data buffer
 
SQL 10G>ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
 
我们来更新其中一条记录
 
SQL 10G>update test set a=2 where rownum<2;
1 row updated.
 
SQL 10G>commit;
Commit complete.
 
执行parallel query
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME                                                                         VALUE
----------------------------------------------------------------                ----------
DBWR parallel query checkpoint buffers written                 28615
DBWR checkpoints                                                        552
 
SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
         A
----------
         2
         1

SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME                                                                        VALUE
----------------------------------------------------------------                ----------
DBWR parallel query checkpoint buffers written                 28616
DBWR checkpoints                                                        553

checkpoint和buffer written都增加1
 
再来更新分布在多个extent上的block来证实一下是不是现在的oracle版本用的是object-based checkpoint
 
SQL 10G>update test set a=10 where mod(dbms_rowid.ROWID_BLOCK_NUMBER(rowid),1000)=0 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
6 rows updated.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME                                                                         VALUE
----------------------------------------------------------------                 ----------
DBWR parallel query checkpoint buffers written                 28616
DBWR checkpoints                                                        553
 
SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
         A
----------
        10
         2
         1
 
SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
NAME                                                                         VALUE
----------------------------------------------------------------                 ----------
DBWR parallel query checkpoint buffers written                 28622
DBWR checkpoints                                                        554
 
 
可以看到写出了6个块,但是只发生一次checkpoint,所以我们可以知道oracle已经采用了object-based checkpoint。
 
 
总结来说,DBWR parallel query checkpoint buffers written就是当发生parallel query时导致checkpoint而写出的块数。
 
 
 
February 14

ora_rowscn

10g里面有了一项新功能,我们可以查看某个表的某一行最后一次改动的scn,这个就是由ora_rowscn来提供。
 
SQL 10G>create table test (a number);
Table created.
 
SQL 10G>select ora_rowscn from test;
no rows selected
 
SQL 10G>insert into test values(1);
1 row created.
 
SQL 10G>column ora_rowscn format 999999999999999999999
 
insert 完成后我们可以查询到ora_rowscn

SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
----------------------
           21749185488
 
SQL 10G>commit;
Commit complete.
 
当事务递交,ora_rowscn发生变化
 
SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
----------------------
           21749185507
 
SQL 10G>insert into test values(2);
1 row created.
 
再插入一条记录,查看ora_rowscn,发现两条记录的ora_rowscn相同
 
SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
----------------------
           21749185507
           21749185507
 
SQL 10G>commit;
Commit complete.
 
事务递交后再次发现ora_rowscn发生变化
 
SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
----------------------
           21749186216
           21749186216
 
SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid),dbms_rowid.ROWID_BLOCK_NUMBER(rowid) from test;

DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------ ------------------------------------
                                   1                                69650
                                   1                                69650
 
SQL 10G>alter system dump datafile 1 block 69650;
System altered.
 
SQL 10G>exit
 
 
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650)
scn: 0x0005.105a3ea8 seq: 0x02 flg: 0x02 tail: 0x3ea80602
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x00411012
 Object id on Block? Y
 seg/obj: 0xff72  csc: 0x05.105a3bd0  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000d.026.000004da  0x01c01470.0184.2c  --U-    1  fsc 0x0000.105a3be3
0x02   0x000e.008.000004e7  0x01c018a1.017d.17  --U-    1  fsc 0x0000.105a3ea8
 
data_block_dump,data header at 0xb72f845c
===============
tsiz: 0x1fa0
hsiz: 0x16
pbl: 0xb72f845c
bdba: 0x00411012
     76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f94
avsp=0x1f78
tosp=0x1f78
0xe:pti[0]      nrow=2  offs=0
0x12:pri[0]     offs=0x1f9a
0x14:pri[1]     offs=0x1f94
block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 02
tab 0, row 1, @0x1f94
tl: 6 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [ 2]  c1 03
end_of_block_dump
 
 
 
SQL 10G>select to_number('5105a3ea8','xxxxxxxxxxxxxxxxxx') ora_rowscn from dual;
        ORA_ROWSCN
------------------
       21749186216
从上面的实验中可以发现,当表以常规方式创建的时候,ora_rowscn取自data block header的scn,而每行数据并没保存自己的rowscn,下面来看一下以rowdependencies创建的表是什么情况。
 
 
SQL 10G>create table test (a number) rowdependencies;
Table created.
 
SQL 10G>select ora_rowscn from test;
no rows selected
 
SQL 10G>insert into test values(1);
1 row created.
 
insert后查询ora_rowscn ,发现ora_rowscn为空
 
SQL 10G>select ora_rowscn from test;
        ORA_ROWSCN
------------------

SQL 10G>commit;
Commit complete.
 
事务递交后,可以查询到ora_rowscn
 
 
SQL 10G>select ora_rowscn from test;
        ORA_ROWSCN
------------------
       21749198001
 
SQL 10G>insert into test values(2);
1 row created.
 
插入第2条数据,依然发现有一个ora_rowscn为空
 
SQL 10G>select ora_rowscn from test;
        ORA_ROWSCN
------------------
       21749198001

SQL 10G>commit;
Commit complete.
 
递交后查询ora_rowscn,发现2条记录的ora_rowscn并不一样,这和上面的情况不一样,我们知道这是因为ora_rowscn被保存在行内的缘故
 
SQL 10G>select ora_rowscn from test;
        ORA_ROWSCN
------------------
       21749198001
       21749198017
 
 
 
SQL 10G>alter system dump datafile 1 block 69650;
System altered.
 
SQL 10G>exit
 
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650)
scn: 0x0005.105a6cc1 seq: 0x02 flg: 0x02 tail: 0x6cc10602
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
seg/obj: 0xff73  csc: 0x05.105a687c  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0010.005.000004e5  0x01c02e78.01c5.1e  --U-    1  fsc 0x0000.105a6cb1
0x02   0x000c.02c.000004d4  0x01c00feb.01a5.24  --U-    1  fsc 0x0000.105a6cc1
 
data_block_dump,data header at 0xc70e45c
===============
tsiz: 0x1fa0
hsiz: 0x16
pbl: 0x0c70e45c
bdba: 0x00411012
     76543210
flag=--R-----
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f88
avsp=0x1f6c
tosp=0x1f6c
0xe:pti[0]      nrow=2  offs=0
0x12:pri[0]     offs=0x1f94
0x14:pri[1]     offs=0x1f88
block_row_dump:
tab 0, row 0, @0x1f94
tl: 12 fb: --H-FL-- lb: 0x1  cc: 1
dscn 0x0000.00000000
col  0: [ 2]  c1 02
tab 0, row 1, @0x1f88
tl: 12 fb: --H-FL-- lb: 0x2  cc: 1
dscn 0x0000.00000000
col  0: [ 2]  c1 03
end_of_block_dump 
 
 
dump block后发现,现在dscn还是0,我们查询出来的ora_rowscn实际上是从Scn/Fsc
中获得的,真正ora_rowscn被保存在行内是在itl发生cleanout时会把Scn/Fsc刷到dscn
 
SQL 10G>update test set a=1 where a=1;
1 row updated.
 
SQL 10G>commit;
Commit complete.
 
 
SQL 10G>update test set a=1 where a=1;
1 row updated.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>alter system dump datafile 1 block 69650;
System altered.
 
SQL 10G>exit
 
Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
buffer tsn: 0 rdba: 0x00411012 (1/69650)
scn: 0x0005.105a6dff seq: 0x02 flg: 0x02 tail: 0x6dff0602
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
 Object id on Block? Y
 seg/obj: 0xff73  csc: 0x05.105a6dfd  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0012.024.000004db  0x01c024fd.016a.08  C---    0  scn 0x0005.105a6df8
0x02   0x000d.00f.000004dc  0x01c031b3.0194.28  --U-    1  fsc 0x0000.105a6dff
 
data_block_dump,data header at 0xc70e45c
===============
tsiz: 0x1fa0
hsiz: 0x16
pbl: 0x0c70e45c
bdba: 0x00411012
     76543210
flag=--R-----
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f88
avsp=0x1f6c
tosp=0x1f6c
0xe:pti[0]      nrow=2  offs=0
0x12:pri[0]     offs=0x1f94
0x14:pri[1]     offs=0x1f88
block_row_dump:
tab 0, row 0, @0x1f94
tl: 12 fb: --H-FL-- lb: 0x2  cc: 1
dscn 0x0005.105a6df8
col  0: [ 2]  c1 02
tab 0, row 1, @0x1f88
tl: 12 fb: --H-FL-- lb: 0x0  cc: 1
dscn 0x0005.105a6cc1
col  0: [ 2]  c1 03
end_of_block_dump
 
 
另外,oracle还提供了ora_rowscn到timestamp的转换
 
 
SQL 10G>select scn_to_timestamp(ora_rowscn) from test;
SCN_TO_TIMESTAMP(ORA_ROWSCN)
---------------------------------------------------------------------------
14-FEB-06 03.55.05.000000000 PM
14-FEB-06 03.44.05.000000000 PM
 
更多ora_rowscn的用途请各位自己去摸索吧。
January 19

oracle rowid

搞oracle的人都很清楚rowid这个东西,rowid就是唯一标志记录物理位置的一个id,在oracle 8版本以前,rowid由file#+block#+row#组成,占用6个bytes的空间,10 bit 的 file# ,22bit 的 block# ,16 bit 的 row#。
 
从oracle 8开始rowid变成了extend rowid,由data_object_id#+rfile#+block#+row#组成,占用10个bytes的空间, 32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.由于rowid的组成从file#变成了rfile#,所以数据文件数的限制也从整个库不能超过1023个变成了每个表空间不能超过1023个数据文件。
 
 
说了rowid的组成,那么我们再来看看rowid在索引里面占用的字节数又是什么样子的。在oracle 8以前索引中存储的rowid占用字节数也是6bytes,在oracle8之后,虽然oracle使用了extend rowid,但是在普通索引里面依然存储了bytes的rowid,只有在global index中存储的是10bytes的extend rowid,而extend rowid也是global index出现的一个必要条件,下面我们会解释原因。
 
 
为什么golbal index需要把data_object_id#也包含在index rowid entry中呢?如果不包含会这么样?首先我们需要知道index的rowid entry的存在是为了能根据它找到表的这条记录存在哪个具体的物理位置,我们需要知道它在哪个数据文件,在哪个block,在那一行,普通的索引oracle根据rfile#,block#,row#就可以知道了,但是partition table可以分布在多个表空间,也就是可以分布在多个数据文件,当我们建立local index时,index rowid entry并不包含data_object_id#,因为oracle可以知道这个index对应的是哪一个table分区,并可以得到table分区的ts#(tablespace号),那么oracle根据ts#和rfile#就可以找到具体的数据文件。但是如果换成是golbal index,如果不包含data_object_id#,那么我们并不能知道这个索引对应着哪个表分区,也自然不能知道它的rfile#和file#的转换关系,所以它将找不到所对应的记录。包含data_object_id#后,oracle可以根据data_object_id#实现rfile#和file#的转换然后找到记录对应的物理位置。需要注意的是要理解以上概念我们还是需要了解file#和rfile#的区别。
 
关于file#和rfile#的区别可以参考biti_rainy的一篇blog
 
继续上面的话题,我们猜想oracle实现rfile#和file#的转换是不是由一些递归sql来实现,所以我们做了一个测试来看是否oracle从file$等基表中实现呢?我们做了一个10046 trace,发现并不存在相应的递归sql,那oracle怎么实现呢?可能是直接取一些x$table中的数据了吧,那哪个x$table保留这些信息呢?
 
x$kccfe!!!
 
 
SQL 10G>desc x$kccfe
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(4)
 INDX                                               NUMBER
 INST_ID                                            NUMBER
 FENUM                                              NUMBER
 FECSZ                                              NUMBER
 FEBSZ                                              NUMBER
 FESTA                                              NUMBER
 FECRC_SCN                                          VARCHAR2(16)
 FECRC_TIM                                          VARCHAR2(20)
 FECRC_THR                                          NUMBER
 FECRC_RBA_SEQ                                      NUMBER
 FECRC_RBA_BNO                                      NUMBER
 FECRC_RBA_BOF                                      NUMBER
 FECRC_ETB                                          RAW(132)
 FECPS                                              VARCHAR2(16)
 FECPT                                              VARCHAR2(20)
 FECPC                                              NUMBER
 FESTS                                              VARCHAR2(16)
 FESTT                                              VARCHAR2(20)
 FEBSC                                              VARCHAR2(16)
 FEFNH                                              NUMBER
 FEFNT                                              NUMBER
 FEDUP                                              NUMBER
 FEURS                                              VARCHAR2(16)
 FEURT                                              VARCHAR2(20)
 FEOFS                                              VARCHAR2(16)
 FEONC_SCN                                          VARCHAR2(16)
 FEONC_TIM                                          VARCHAR2(20)
 FEONC_THR                                          NUMBER
 FEONC_RBA_SEQ                                      NUMBER
 FEONC_RBA_BNO                                      NUMBER
 FEONC_RBA_BOF                                      NUMBER
 FEONC_ETB                                          RAW(132)
 FEPOR                                              NUMBER
 FETSN                                              NUMBER
 FETSI                                              NUMBER
 FERFN                                              NUMBER
 FEPFT                                              NUMBER
 FEDOR                                              NUMBER
 FEPDI                                              NUMBER
 FEFDB                                              NUMBER
 FEPLG_SCN                                          VARCHAR2(16)
 FEPAX                                              NUMBER
 FEFLG                                              NUMBER
 
从这个x$table中oracle可以实现file和rfile的转换。
 
 
最后我们来看一个例子
 
SQL 10G>desc test
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 A                                                  NUMBER
 B                                                  VARCHAR2(32)
 
test是一张分区表
 
SQL 10G>select table_name,partition_name,tablespace_name from dba_tab_partitions where table_name='TEST';
TABLE_NAME                     PARTITION_NAME                 TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
TEST                           P1                             TESTROWID
TEST                           P2                             TESTROWID
 
 
这条记录所在的物理位置
 
SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
--------------- ---------- ---------- ----------
          63665         28         15          0
 
 
 
 
我们对它创建全局索引
 
SQL 10G>create index ind_test on test(a);  
Index created.
 
再创建本地索引
 
SQL 10G>create index ind_test_local on test(b) local;  
Index created.
 
 
SQL 10G>select dump(rowid,16) rid from test;
RID
----------------------------------------------------------------------------
Typ=69 Len=10: 0,0,f8,b1,7,0,0,f,0,0
 
去看看全局索引和本地索引中rowid entry的区别
 
SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST';
HEADER_BLOCK HEADER_FILE
------------ -----------
        1403           4
 
 
SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST_LOCAL';
HEADER_BLOCK HEADER_FILE
------------ -----------
          11          33
          11          34
 
 
SQL 10G>ALTER SYSTEM DUMP DATAFILE 4 BLOCK 1404;
System altered.
 
 
SQL 10G>ALTER SYSTEM DUMP DATAFILE 33 BLOCK 12;
System altered.
 
 
全局索引ind_test的rowid entry

col 1; len 10; (10):  00 00 f8 b1 07 00 00 0f 00 00
 
 
本地索引ind_test_local的rowid entry
 
col 1; len 6; (6):  07 00 00 0f 00 00
 
可以看出本地索引存储了6bytes rowid,全局索引存储了10bytes rowid
 
再来看一下00 00 f8 b1 07 00 00 0f 00 00
转换成bit就是
00000000 00000000 11111000 10110001 00000111 00000000 00000000 00001111 00000000 00000000
 
32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.
 
00000000 00000000 11111000 10110001  data_object_id#
转换成10进制就是
 
2^15+2^14+2^13+2^12+2^11+2^7+2^5+2^4+2^0=63665
 
00000111 00 rfile#
 
2^4+2^3+2^2=28 
0000000000000000001111  block#
 
2^3+2^2+2^1+2^0=15
 

0000000000000000 rowi#
 
0
 
SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
--------------- ---------- ---------- ----------
          63665         28         15          0
 
验证通过
 
 
再来看一下如果file#超过1023后oracle会怎么处理
 
SQL 10G>select file#,ts#,rfile# from v$datafile  where file#>1023;
     FILE#        TS#     RFILE#
---------- ---------- ----------
      1024         14          1
      1025         14          3
      1026         14          4
      1027         14          5
      1028         14          6
      1029         14          7
      1030         14          8
      1031         14          9
      1032         14         10
      1033         14         11
      1034         14         12
     FILE#        TS#     RFILE#
---------- ---------- ----------
      1035         14         13
      1036         14         14
      1037         15         14
可以看到在一个tablespace里面rfile#从1开始到1023
 
 
SQL 10G>select file#,rfile# from v$datafile  where ts#=14 order by file#;
     FILE#     RFILE#
---------- ----------
         2          2
        15         15
        16         16
        17         17
        18         18
        19         19
        20         20
        21         21
        22         22
        23         23
        24         24
       。。。。。。
    FILE#     RFILE#
---------- ----------
      1015       1015
      1016       1016
      1017       1017
      1018       1018
      1019       1019
      1020       1020
      1021       1021
      1022       1022
      1023       1023
      1024          1
      1025          3
     FILE#     RFILE#
---------- ----------
      1026          4
      1027          5
      1028          6
      1029          7
      1030          8
      1031          9
      1032         10
      1033         11
      1034         12
      1035         13
      1036         14
 
January 16

what is data_object_id

前些天有人问data_object_id究竟是个什么东东,既然有了object_id那还要data_object_id干吗?
 
其实object_id和data_object_id同样是表示数据库对象的一个唯一标志,但是object_id表示的是逻辑id,data_object_id表示的是物理id。如果一些object没有物理属性的话那它就不存在data_object_id,例如procedure,function,package,data type,db link,mv定义,view定义,临时表,分区表定义等等这些object都是没有对应着某个segment,因此它们的data_object_id都为空。
 
当表刚创建的时候它的object_id和data_object_id都是相等的,但是如果表经过move或truncate后那么data_object_id将会有变化。
 

SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
     63053          63053
 
刚开始创建表时object_id=data_object_id
 
SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK     BLOCKS
----------- ------------ ----------
          4         2019          8
SQL 10G>alter table t move;
Table altered.
 
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
     63053          63463
在move以后可以看到data_object_id发生变化了
 
SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK     BLOCKS
----------- ------------ ----------
          4          467          8
 
SQL 10G>truncate table t;
Table truncated.
 
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
     63053          63464
SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
HEADER_FILE HEADER_BLOCK     BLOCKS
----------- ------------ ----------
          4          467          8
 
truncate之后虽然segment的位置没有移动,但是data_object_id还是发生变化了。
 
 
SQL 10G>alter table t add(b number);
Table altered.
 
SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
     63053          63464
 
我们对t表加一个字段,发现data_object_id没有发生变化,我们可以认为只有当segment发生变化时data_object_id才会随之变化。我们来看看data_object_id从哪里来的
 
 
SQL 10G> select text from dba_views where view_name='DBA_OBJECTS';
 
select u.name, o.name, o.subname, o.obj#, o.dataobj#,
       decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
                      4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
                      7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
                      11, 'PACKAGE BODY', 12, 'TRIGGER',
                      13, 'TYPE', 14, 'TYPE BODY',
                      19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
                      22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
                      28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
                      32, 'INDEXTYPE', 33, 'OPERATOR',
                      34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
                      40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
                      42, NVL((SELECT distinct 'REWRITE EQUIVALENCE'
                               FROM sum$ s
                               WHERE s.obj#=o.obj#
                                     and bitand(s.xpflags, 8388608) = 8388608),
                              'MATERIALIZED VIEW'),
                      43, 'DIMENSION',
                      44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
                      48, 'CONSUMER GROUP',
                      51, 'SUBSCRIPTION', 52, 'LOCATION',
                      55, 'XML SCHEMA', 56, 'JAVA DATA',
                      57, 'SECURITY PROFILE', 59, 'RULE',
                      60, 'CAPTURE', 61, 'APPLY',
                      62, 'EVALUATION CONTEXT',
                      66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
                      72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
                      81, 'FILE GROUP',
                     'UNDEFINED'),
       o.ctime, o.mtime,
       to_char(o.stime, 'YYYY-MM-DD:HH24:MI:SS'),
       decode(o.status, 0, 'N/A', 1, 'VALID', 'INVALID'),
       decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'),
       decode(bitand(o.flags, 4), 0, 'N', 4, 'Y', 'N'),
       decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N')
from sys.obj$ o, sys.user$ u
where o.owner# = u.user#
  and o.linkname is null
  and (o.type# not in (1  /* INDEX - handled below */,
                      10 /* NON-EXISTENT */)
       or
       (o.type# = 1 and 1 = (select 1
                              from sys.ind$ i
                             where i.obj# = o.obj#
                               and i.type# in (1, 2, 3, 4, 6, 7, 9))))
  and o.name != '_NEXT_OBJECT'
  and o.name != '_default_auditing_options_'
union all
select u.name, l.name, NULL, to_number(null), to_number(null),
       'DATABASE LINK',
       l.ctime, to_date(null), NULL, 'VALID','N','N', 'N'
from sys.link$ l, sys.user$ u
where l.owner# = u.user#
 
可以看到data_object_id是从obj$.dataobj#来的,而obj$.dataobj#又对应着seg$.HWMINCR
 
 
SQL 10G>select max(HWMINCR) from sys.seg$;

MAX(HWMINCR)
------------
       63464
这里的HWMINCR就是data_object_id的来源,每次seg$里生成新的一条记录都会增加HWMINCR这个值,同时obj$.dataobj#也会跟着变化。
 
 
 
December 28

dynamic sampling

dynamic sampling对于我们来说其实不算太陌生,从9i r2开始,dynamic sampling其实已经不动声色地融入到数据库中了。
我们经常会碰到由于一些表没有分析导致执行计划错误的情况,但是dynamic sampling的出现一定程度的减少了错误的产生。dynamic sampling针对没有分析过的表可以采样估计表的选择性,对于生产正确的执行计划有一定的帮助。
 
dynamic sampling分为10个级别,从0-10,由参数optimizer_dynamic_sampling控制
 
下面是取自"Performance Tuning Guide and Reference "的对10种级别的定义
 
  • Level 0: Do not use dynamic sampling.
  • Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
  • Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is the default number of dynamic sampling blocks.
  • Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks.
  • Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks.
  • Level 5: Apply dynamic sampling to all tables that meet the Level 4 criteria using 2 times the default number of dynamic sampling blocks.
  • Level 6: Apply dynamic sampling to all tables that meet the Level 5 criteria using 4 times the default number of dynamic sampling blocks.
  • Level 7: Apply dynamic sampling to all tables that meet the Level 6 criteria using 8 times the default number of dynamic sampling blocks.
  • Level 8: Apply dynamic sampling to all tables that meet the Level 7 criteria using 32 times the default number of dynamic sampling blocks.
  • Level 9: Apply dynamic sampling to all tables that meet the Level 8 criteria using 128 times the default number of dynamic sampling blocks.
  • Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
dynamic_sampling hint的级别定义如下
 
  • Level 0: Do not use dynamic sampling.
  • Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).
  • Level 2: The number of blocks sampled is 2 times the default number of dynamic sampling blocks.
  • Level 3: The number of blocks sampled is 4 times the default number of dynamic sampling blocks.
  • Level 4: The number of blocks sampled is 8 times the default number of dynamic sampling blocks.
  • Level 5: The number of blocks sampled is 16 times the default number of dynamic sampling blocks.
  • Level 6: The number of blocks sampled is 32 times the default number of dynamic sampling blocks.
  • Level 7: The number of blocks sampled is 64 times the default number of dynamic sampling blocks.
  • Level 8: The number of blocks sampled is 128 times the default number of dynamic sampling blocks.
  • Level 9: The number of blocks sampled is 256 times the default number of dynamic sampling blocks.
  • Level 10: Read all blocks in the table.

 

下面做一些测试

 

SQL> create table test as select object_id a from dba_objects;

Table created.

 

SQL>  show parameter optimizer_dynamic_sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     1


SQL> set autotrace trace explain;


SQL> select count(*) from test;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'TEST'

 

SQL> alter session set optimizer_dynamic_sampling=10;

Session altered.

 

SQL> select count(*) from test;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'TEST'


dynamic sampling没有起作用,因为在9i里面dynamic sampling需要在多表关联的语句里面才会起作用。在10g里面optimizer_dynamic_sampling在2及以上级别时单表就能dynamic sampling。

 

SQL 10G>alter session set optimizer_dynamic_sampling=2;

Session altered.

 

SQL 10G>select count(*) from test;

 

Execution Plan
----------------------------------------------------------
Plan hash value: 1950795681

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    21   (5)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST | 48065 |    21   (5)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

 

 

看看关联查询的情况

 

SQL> create table test1 as select * from test;

Table created.

 

SQL> select count(*) from test a,test1 b where a.a=b.a;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     MERGE JOIN
   3    2       SORT (JOIN)
   4    3         TABLE ACCESS (FULL) OF 'TEST1'
   5    2       SORT (JOIN)
   6    5         TABLE ACCESS (FULL) OF 'TEST'

 

如果两个表都没分析,那么dynamic sampling不起作用,分析其中一个表

 

SQL> analyze table test1 compute statistics;

Table analyzed.

 

SQL> select count(*) from test a,test1 b where a.a=b.a;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=17)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=19 Card=23706 Bytes=403002)
   3    2       TABLE ACCESS (FULL) OF 'TEST1' (Cost=6 Card=23706 Byte
          s=94824)

   4    2       TABLE ACCESS (FULL) OF 'TEST' (Cost=6 Card=23706 Bytes
          =308178)

 

dynamic sampling起作用了。

 

在10g里面有点区别,即使2个表都没分析过,dynamic sampling也可以起作用

 

SQL 10G>alter session set optimizer_dynamic_sampling=1;

Session altered.

 

SQL 10G>select count(*) from test a,test1 b where a.a=b.a;

Execution Plan
----------------------------------------------------------
Plan hash value: 2909046986

--------------------------------------------------------------------------------
-----

| Id  | Operation           | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time
    |

--------------------------------------------------------------------------------
-----

|   0 | SELECT STATEMENT    |       |     1 |    26 |       |   151   (4)| 00:00
:02 |

|   1 |  SORT AGGREGATE     |       |     1 |    26 |       |            |
    |

|*  2 |   HASH JOIN         |       | 43244 |  1097K|  1056K|   151   (4)| 00:00
:02 |

|   3 |    TABLE ACCESS FULL| TEST1 | 43235 |   548K|       |    21   (5)| 00:00
:01 |

|   4 |    TABLE ACCESS FULL| TEST  | 45439 |   576K|       |    21   (5)| 00:00
:01 |

--------------------------------------------------------------------------------
-----


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."A"="B"."A")

 

Note
-----
   - dynamic sampling used for this statement

 

其实归根到底是由于10g没有了基于规则的优化器,在9i里面如果一个查询中的所有表没有统计数据,那么他将选择基于规则的优化器而忽略dynamic sampling,而在10g因为这个原因所以dynamic sampling生效。

 

 

December 20

oracle block cleanout

上一篇文章正好写到oracle lock mechanism internal,这次顺理成章得写一下block cleanout,因为cleanout与transaction,lock都有密切关系。
 
cleanout分为2钟,一种是fast commit cleanout,另一种是delayed block cleanout.
 
oracle有一个modified block list结构,用来记录每个transaction更改过的block,每个transaction大约可以记录10%buffer cache这多的modified block。这部分block就是当发生commit的时候,oracle可以根据modified block list定位到那些块并做fast commit cleanout。如果一个transaction修改的块超过10%
buffer cache,那么超过的块就执行delayed block cleanout。当做fast commit cleanout时,oracle不会清理 Row locks lb标志位,ITL lck标志位。
 
另一种情况是delayed block cleanout,当transaction还未commit或rollback时modified block已经被写回磁盘,当发生commit时oracle并不会把block重新读入做cleanout,而是把cleanout留到下一次对此块的dml或select。当delayed cleanout时候如果undo segment header的transaction table slot还没有被覆盖,那么可以找回该事务递交的exact scn,如果slot已经被覆盖,那么将会使用undo segment header中的control scn来做为upper bound scn。
 
 
下面有一些例子可以阐述几种类型的block cleanout
 
创建一张表,为了简单起见,每个block限制为一行数据
 
SQL 10G>create table test
  2  pctfree 99
  3  as
  4     select rownum n1, rpad(rownum,200) v1
  5     from all_objects
  6     where rownum <= 1000
  7  ;
Table created.
 
 
看一下第一行的file number,block number
 
SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid),dbms_rowid.ROWID_BLOCK_NUMBER(rowid) from test where rownum=1;
DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------ ------------------------------------
                                   4                                  468
 
 
更新第一行并递交,这时候发生的是fast commit cleanout
 
SQL 10G>update test set n1=n1 where rownum=1;
1 row updated.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>alter system dump datafile 4 block 468;
System altered.
 
 
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x02   0x0003.019.00000d40  0x008009d3.24dd.1a  --U-    1  fsc 0x0000.102cdb84

tl: 207 fb: --H-FL-- lb: 0x2  cc: 2
  
 
fast commit cleanout并不清除lck,lb标志。
 
 
如果出现事务递交前modified block就被flush回硬盘,那么将发生delayed block cleanout。
 
更新记录:
SQL 10G>update test set n1=n1+1;
1000 rows updated.
 
flush buffer:
SQL 10G>ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
 
递交,这时不回去修改data block,只会修改undo segment header slot
SQL 10G>commit;
Commit complete.
 
SQL 10G>alter system dump datafile 4 block 468;
System altered.
 
通过查询来实现delayed block cleanout
SQL 10G>select count(*) from test where rownum=1;
  COUNT(*)
----------
         1
 
 
SQL 10G>alter system dump datafile 4 block 468;
System altered.
在做cleanout以前
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.027.00000d00  0x008000c0.208c.3c  ----    1  fsc 0x0000.00000000
 
tl: 207 fb: --H-FL-- lb: 0x1  cc: 2
 
在做cleanout以后
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.027.00000d00  0x008000c0.208c.3c  C---    scn 0x0005.102ce398
 
tl: 207 fb: --H-FL-- lb: 0x0  cc: 2
 
 
做了cleanout后lck,lb标志为都被清零,scn也是从undo segment header transaction table slot里面得到。如果slot被覆盖了,那么会把control scn拿来当作upperbound scn。
 
 
 
实验做到这里的时候产生一个疑问,如果在做delayed block cleanout之前undo tablespace被删除了怎么办,oracle从哪里找undo segment header呢?是不是cleanout会报错?带着疑问继续做实验。
 
 
SQL 10G>update test set n1=n1-1;
1000 rows updated.
 
 
SQL 10G>ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
Session altered.
 
 
 
SQL 10G>show parameter undo_tablespace
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_tablespace                      string      UNDOTBS1

SQL 10G>commit;
Commit complete.
 
切换undo tablespace
SQL 10G>alter system set undo_tablespace=undotbs2;
System altered.
 
删除旧的undo tablespace
SQL 10G>drop tablespace undotbs1 including contents;
Tablespace dropped.
 
SQL 10G>!rm /opt/oracle/oradata/dbtest/undotbs01.dbf
 
 
SQL 10G>alter system dump datafile 4 block 468;
System altered.
 
cleanout 并没有出错
 
SQL 10G>select count(*) from test;
  COUNT(*)
----------
      1000
 
SQL 10G>alter system dump datafile 4 block 468;
System altered.
 
发生cleanout以前
 
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x02   0x000a.015.00001906  0x008000b4.359d.2e  ----    1  fsc 0x0000.00000000
tl: 207 fb: --H-FL-- lb: 0x2  cc: 2
 
发生cleanout以后
 
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x02   0x000a.015.00001906  0x008000b4.359d.2e  C-U-    0  scn 0x0005.102ce8f9

tl: 207 fb: --H-FL-- lb: 0x0  cc: 2
 
发现还是能取到upper bound scn,十分奇怪,试想是不是有基表保存已经被删除的undo segment信息,仔细一查,undo$出现了!!!
 
因为xid指向了0x000a的segment header,所以我们查询_SYSSMU10$
 
SQL 10G>select name,SCNBAS from undo$ where name='_SYSSMU10$';
NAME                               SCNBAS
------------------------------ ----------
_SYSSMU10$                      271378681
 
 
 
来转换一下upper bound scn
 
SQL 10G>select to_number('102ce8f9','xxxxxxxx') from dual;
TO_NUMBER('102CE8F9','XXXXXXXX')
--------------------------------
                       271378681
 
 
正是这个scn保存在undo$中被拿来当做upper bound scn.
 
 
来看一下undo$
 
SQL 10G>select header_file,header_block,tablespace_name from dba_segments where segment_name='UNDO$';
HEADER_FILE HEADER_BLOCK TABLESPACE_NAME
----------- ------------ ------------------------------
          1          105 SYSTEM
 
果然是一个系统基表,oracle使用undo$来保证undo tablespace被删除后的delayed block cleanout。
 
 
 
December 14

oracle lock mechanism internal

谈到oracle的锁机制,这是oracle和其他数据库区别比较大的地方,为了更好的解释其中的原来,我们在这篇文章中主要介绍tx lock,所以以下提到的lock均代表tx lock。
 
在很多其他数据库中,lock实际上是通过一个in-meory lock list来实现的,当有session请求一个lock时,它会锁定lock list,然后去搜索lock list看是否这条记录上有别的lock,如果没有就创建一个lock list entry,然后unlock lock list。
 
在oracle中,我们先会定位到修改的记录在哪个block,哪条记录,如果有别的active transaction也是修改这条记录,那么会在enqueue lock fixed array里面创建一个对象(按请求lock的时间顺序),排队等待前一个事务commit或rollback,同时设置timeout时间,如果发生timeout则再去检查请求的lock是否已经可用。如果没有别的active transaction占有lock,那么它会在enqueue resource fixed array里面创建一个对象,并修改block的itll Lck标志位,修改记录lb标志位指向事务所在的itl。如果事务结束,将会去检查enqueue lock array,enqueue conversion array,并通知等待最久的那个事务可以请求lock。
关于enqueue lock,enqueue resource,enqueue resource以及和这些结构相关的一些初始化参数请详见steve adams的《oracle8i internal services for waits, latches, locks》。
 
可以看到oracle其实是把row-level lock直接在block里面实现了,不像其他数据库要为每一条需要修改的记录创建一个lock list对象,oracle只需要针对每个transaction来创建一些结构。所以在oracle里面,lock并不是惜缺资源。
 
 
下面来看一下block内lock的处理
 
SQL 9I>select * from test;
A
---------
14-DEC-05
 
SQL 9I>select dbms_rowid.ROWID_RELATIVE_FNO(rowid),dbms_rowid.ROWID_BLOCK_NUMBER(rowid) from test;
DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------ ------------------------------------
                                  13                                   13
 
SQL 9I>alter system dump datafile 13 block 13;
System altered.
 
 
Start dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
buffer tsn: 10 rdba: 0x0340000d (13/13)
scn: 0x0005.102623f0 seq: 0x01 flg: 0x00 tail: 0x23f00601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0340000d
 Object id on Block? Y
 seg/obj: 0x12b22  csc: 0x05.102623f0  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400009 ver: 0x01
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.018.0001ce00  0x00000000.0000.00  C---    0  scn 0x0005.102623e5
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
 
data_block_dump,data header at 0xad7ec7c
===============
tsiz: 0x1f80
hsiz: 0x14
pbl: 0x0ad7ec7c
bdba: 0x0340000d
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f75
avsp=0x1f61
tosp=0x1f61
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f75
block_row_dump:
tab 0, row 0, @0x1f75
tl: 11 fb: --H-FL-- lb: 0x0  cc: 1   --lb指向0x0的itl,表示这条记录没有被修改过,所以指向一个空itl
col  0: [ 7]  78 69 0c 0e 11 28 2b
end_of_block_dump
End dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
 
SQL 9I>update test set a=sysdate;
1 row updated.
 
SQL 9I>alter system dump datafile 13 block 13;
System altered.
 
update了一下在dump block来看
 
Start dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
buffer tsn: 10 rdba: 0x0340000d (13/13)
scn: 0x0005.1026243b seq: 0x01 flg: 0x00 tail: 0x243b0601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0340000d
 Object id on Block? Y
 seg/obj: 0x12b22  csc: 0x05.102623f0  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400009 ver: 0x01
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.018.0001ce00  0x00000000.0000.00  C---    0  scn 0x0005.102623e5
0x02   0x0008.023.0001c861  0x0080007c.1dfa.02  ----    1  fsc 0x0000.00000000
0x02的itl的lck标志位为1,表示锁定了一条记录,flag表示是未递交的。
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
 
data_block_dump,data header at 0xad7ec7c
===============
tsiz: 0x1f80
hsiz: 0x14
pbl: 0x0ad7ec7c
bdba: 0x0340000d
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f75
avsp=0x1f61
tosp=0x1f61
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f75
block_row_dump:
tab 0, row 0, @0x1f75
tl: 11 fb: --H-FL-- lb: 0x2  cc: 1 --指向了0x2的itl
col  0: [ 7]  78 69 0c 0e 11 2c 2c
end_of_block_dump
End dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
 
 
SQL 9I>commit;
Commit complete.
 
SQL 9I>alter system dump datafile 13 block 13;
System altered.
 
 
commit后我们来dump block
 
 
Start dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
buffer tsn: 10 rdba: 0x0340000d (13/13)
scn: 0x0005.10262467 seq: 0x01 flg: 0x02 tail: 0x24670601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0340000d
 Object id on Block? Y
 seg/obj: 0x12b22  csc: 0x05.102623f0  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400009 ver: 0x01
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.018.0001ce00  0x00000000.0000.00  C---    0  scn 0x0005.102623e5
0x02   0x0008.023.0001c861  0x0080007c.1dfa.02  --U-    1  fsc 0x0000.10262467
0x02的lck标志依然是1,但是flag已经是U,表示事务已经递交,lock已经被释放
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
 
data_block_dump,data header at 0xad7ec7c
===============
tsiz: 0x1f80
hsiz: 0x14
pbl: 0x0ad7ec7c
bdba: 0x0340000d
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f75
avsp=0x1f61
tosp=0x1f61
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f75
block_row_dump:
tab 0, row 0, @0x1f75
tl: 11 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [ 7]  78 69 0c 0e 11 2c 2c
end_of_block_dump
End dump data blocks tsn: 10 file#: 13 minblk 13 maxblk 13
 
就像上面所提到的,当一个session请求lock时,它先去观察block内记录的lb标志,然后回到itl判断有没有未递交事务,如果有它就开始等待,如果没有就去更新lb,itl。
 
另外,等待lock的进程将会产生更多的consistent gets,db block gets
 
 
session 1:
 
SQL 9I>set autotrace trace;
SQL 9I>update test set a=sysdate;
1 row updated.

Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=CHOOSE
   1    0   UPDATE OF 'TEST'
   2    1     TABLE ACCESS (FULL) OF 'TEST'
 

Statistics
----------------------------------------------------------
          0  recursive calls
          2  db block gets
          3  consistent gets

          0  physical reads
        376  redo size
        619  bytes sent via SQL*Net to client
        525  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
session 2:
 

SQL 9I>set autotrace trace;
SQL 9I>update test set a=sysdate;
waiting......
 
 
 
session 1:
 
SQL 9I>commit;
Commit complete.
 
session 2:
 
SQL 9I>update test set a=sysdate;
1 row updated.

Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=CHOOSE
   1    0   UPDATE OF 'TEST'
   2    1     TABLE ACCESS (FULL) OF 'TEST'
 

Statistics
----------------------------------------------------------
          0  recursive calls
          3  db block gets
          6  consistent gets

          0  physical reads
        492  redo size
        613  bytes sent via SQL*Net to client
        525  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

这是因为等待的进程需要做两次全表扫描,第一次读的时候发现被其他进程锁定,其他进程释放锁后又重新读了一次,db block gets也是同样道理。
 
 
还有一个实验能证明enqueue是按请求时的时间顺序排列的
 
session 1:
 
SQL 9I>lock table test in share mode;
Table(s) Locked.
 
 
session 2:
SQL 9I>lock table test in exclusive mode;
waiting......
 
 
session 3:
SQL 9I>lock table test in share mode;
waiting......
 
因为session 1和session 3 lock mode并不排斥,但是由于session 3请求时间比session 2晚,而session 2和session 1的lock mode冲突,所以导致session 3也不能获得lock;
 
 
oracle lock 机制还是挺有意思的一个东西,tom和steve分别从不同角度很好的展示了oracle的lock机制,希望对这方面有兴趣的oracle fans好好看看tom的expert one-on-one和steve的oracle8i  internal services for waits, latches, locks.
December 07

bulk fetch limit

bulk fetch limit可以限制每次bulk collect into的行数,这在fetch一个大cursor时非常有必要,因为它能减少pga的内存使用量。
 
我们来看一下加了limit子句和不加limit的pga使用率的区别
 
首先看一下不执行语句时的pga使用率
 
[oracle@csdbc oracle]$ sqlplus taobao/taobao

SQL*Plus: Release 9.2.0.4.0 - Production on Wed Dec 7 13:59:40 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
exit
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
 
 
SQL 9I>select a.name,b.value from v$statname a,v$mystat b where a.statistic#=b.statistic# and (a.name like '%pga%' or a.name like '%uga%');
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                                     142468
session uga memory max                                               142468
session pga memory                                                      286044
session pga memory max                                               286044
 
 
当限制每次fetch 10万条时的pga使用率

[oracle@csdbc oracle]$ sqlplus taobao/taobao
SQL*Plus: Release 9.2.0.4.0 - Production on Wed Dec 7 14:00:53 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
SQL 9I>declare
  2  cursor c1 is select nick from bmw_users;
a dbms_sql.varchar2s;
  3    4  begin
  5  open c1;
  6    loop    
  7          fetch c1 bulk collect into a limit 100000;
  8          exit when c1%notfound;
  9    end loop;
 10  close c1;
end;
 11   12  /
PL/SQL procedure successfully completed.
 

select a.name,b.value from v$statname a,v$mystat b where a.statistic#=b.statistic# and (a.name like '%pga%' or a.name like '%uga%');
exit
PL/SQL procedure successfully completed.
SQL 9I>SQL 9I>
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                                    77004
session uga memory max                                               142468
session pga memory                                                  5502408
session pga memory max                                              5502408
 
 
当不限制条数时的pga使用率
 
[oracle@csdbc oracle]$ sqlplus taobao/taobao
SQL*Plus: Release 9.2.0.4.0 - Production on Wed Dec 7 14:00:53 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL 9I>declare
cursor c1 is select nick from bmw_users;
  2    3  a dbms_sql.varchar2s;
  4  begin
  5  open c1;
  loop    
        fetch c1 bulk collect into a;
  6    7    8          exit when c1%notfound;
  9    end loop;
 10  close c1;
 11  end;
 12  /
select a.name,b.value from v$statname a,v$mystat b where a.statistic#=b.statistic# and (a.name like '%pga%' or a.name like '%uga%');
exit
PL/SQL procedure successfully completed.
SQL 9I>SQL 9I>
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                                    77004
session uga memory max                                               142468
session pga memory                                                117558392
session pga memory max                                            117558392
 
 
最后来看一下每次fetch一千条的pga使用率
 
[oracle@csdbc oracle]$ sqlplus taobao/taobao
SQL*Plus: Release 9.2.0.4.0 - Production on Wed Dec 7 14:00:53 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
SQL 9I>declare
  2  cursor c1 is select nick from bmw_users;
a dbms_sql.varchar2s;
  3    4  begin
  5  open c1;
  6    loop    
  7          fetch c1 bulk collect into a limit 1000;
  8          exit when c1%notfound;
  9    end loop;
 10  close c1;
end;
 11   12  /
PL/SQL procedure successfully completed.
SQL 9I>select a.name,b.value from v$statname a,v$mystat b where a.statistic#=b.statistic# and (a.name like '%pga%' or a.name like '%uga%');

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                                   142468
session uga memory max                                               142468
session pga memory                                                   360268
session pga memory max                                               360268
 
 
December 05

10g r2之log error

10g r2的log error功能有点接近于以前的exception表,不过相比exception表它更灵活,内容更丰富。
SQL 10G>desc test
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                          NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

首先我们的创建error表
SQL 10G>exec dbms_errlog.create_error_log('TEST' );
PL/SQL procedure successfully completed.
创建的error表以err$_开头,加上表名,包含了test表的所有列
SQL 10G>desc err$_test
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ORA_ERR_NUMBER$                                    NUMBER
 ORA_ERR_MESG$                                      VARCHAR2(2000)
 ORA_ERR_ROWID$                                     ROWID
 ORA_ERR_OPTYP$                                     VARCHAR2(2)
 ORA_ERR_TAG$                                       VARCHAR2(2000)
 OWNER                                              VARCHAR2(4000)
 OBJECT_NAME                                        VARCHAR2(4000)
 SUBOBJECT_NAME                                     VARCHAR2(4000)
 OBJECT_ID                                          VARCHAR2(4000)
 DATA_OBJECT_ID                                     VARCHAR2(4000)
 OBJECT_TYPE                                        VARCHAR2(4000)
 CREATED                                            VARCHAR2(4000)
 LAST_DDL_TIME                                      VARCHAR2(4000)
 TIMESTAMP                                          VARCHAR2(4000)
 STATUS                                             VARCHAR2(4000)
 TEMPORARY                                          VARCHAR2(4000)
 GENERATED                                          VARCHAR2(4000)
 SECONDARY                                          VARCHAR2(4000)
让我们测试一下log error功能,首先创建一个pk
SQL 10G>alter table test add constraint test_pk primary key(object_id);
Table altered.

插入重复的数据,这时候不会报错,但是插入的记录数是0
SQL 10G>insert into test select * from test where rownum<2
  2  LOG ERRORS REJECT LIMIT UNLIMITED;
0 rows created.

再看error表的内容,记下来一条,dml type是I也就是Insert,error number是1
SQL 10G>select * from err$_test;
              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I
SYS
ICOL$
20
2
TABLE
30-JUN-05
30-JUN-05
2005-06-30:19:10:16
VALID
N
N
N

对刚才的dml做rollback,发现并不影响error表的内容
SQL 10G>rollback; 
Rollback complete.

SQL 10G>select * from err$_test;
              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I
SYS
ICOL$
20
2
TABLE
30-JUN-05
30-JUN-05
2005-06-30:19:10:16
VALID
N
N
N

truncate也不影响error表的内容
SQL 10G>truncate table test;
Table truncated.

SQL 10G>select * from err$_test;
              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I
SYS
ICOL$
20
2
TABLE
30-JUN-05
30-JUN-05
2005-06-30:19:10:16
VALID
N
N
N

drop table同样对error表没有影响
SQL 10G>drop table test;
Table dropped.

SQL 10G>select * from err$_test;
              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I
SYS
ICOL$
20
2
TABLE
30-JUN-05
30-JUN-05
2005-06-30:19:10:16
VALID
N
N
N

log error在这里有一个问题,如果删除test表后重建一张结构不一样的test表,那么error表将不会包含新test表的列
,有可能出现混乱。
SQL 10G>create table test(a number);
Table created.

SQL 10G>alter table test add constraint test_pk primary key(a);
Table altered.

SQL 10G>insert into test values(1);
1 row created.

SQL 10G>insert into test values(1) LOG ERRORS REJECT LIMIT UNLIMITED;
0 rows created.

SQL 10G>commit;
Commit complete.

SQL 10G>select * from err$_test;
              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I
SYS
ICOL$
20
2
TABLE
30-JUN-05
30-JUN-05
2005-06-30:19:10:16
VALID
N
N
N

              1
ORA-00001: unique constraint (TEST.TEST_PK) violated
I

可以看到上面的第2条记录是新的test表的异常记录,不包含列信息。

下面来看一下log error最多被使用的情况,通常我们在大批量的dml时如果其中一条
记录不符合条件那么整个事务将会被回滚,所以我们有时候不得不使用循环来实现,
但是使用循环会增加undo和redo,并且我们需要记录当前事务执行到哪一条记录避免
失败后从头开始,有了log error功能,我们可以使用一句语句来完成。
SQL 10G>truncate table err$_test;
Table truncated.

SQL 10G>drop table test;
Table dropped.

SQL 10G>create table test as select * from dba_objects where 1=0;
Table created.

SQL 10G>alter table test add constraint test_pk primary key(object_id);
Table altered.

SQL 10G>insert into test select * from dba_objects;
insert into test select * from dba_objects
                        *
ERROR at line 1:
ORA-01400: cannot insert NULL into ("TEST"."TEST"."OBJECT_ID")
 
SQL 10G>select count(*) from test;
  COUNT(*)
----------
         0
 
SQL 10G>insert into test select * from dba_objects
  2  LOG ERRORS REJECT LIMIT UNLIMITED;
50189 rows created.
SQL 10G>commit;
Commit complete.

SQL 10G>select count(*) from err$_test;
  COUNT(*)
----------
         1
SQL 10G>select * from err$_test;
           1400
ORA-01400: cannot insert NULL into ("TEST"."TEST"."OBJECT_ID")
I
DICTMGR
LNK_DBC.REGRESS.RDBMS.DEV.US.ORACLE.COM
 
DATABASE LINK
28-NOV-05

VALID
N
N
N

log error是不错的东西,大家在批量导数据的时候可以考虑一下哦。
November 25

trigger如何实现级联更新

这个问题在采用fk constraint的数据库中比较常见,当对父表进行更新时,我们希望能级联更新子表,oracle并没有提供update cascade的功能,所以这个功能我们自己编码来实现。
 
通常当使用after each row trigger来实现单条记录的级联更新,但是after each row trigger会在多条记录级联更新时产生错误的结果。
 
我们来创建2张表
 
SQL 10G>create table p (p1 number constraint ppk primary key);
Table created.
 
SQL 10G>create table f (f1 number constraint ffk references p deferrable);
Table created.
 
 
SQL 10G>insert into p values(1);
1 row created.
SQL 10G>insert into p values(2);
1 row created.
SQL 10G>insert into p values(3);
1 row created.
SQL 10G>
SQL 10G>insert into f values(1);
1 row created.
SQL 10G>insert into f values(2);
1 row created.
SQL 10G>insert into f values(3);
1 row created.
SQL 10G>commit;
Commit complete.
 
创建trigger
 
SQL 10G>create or replace trigger pt_after_each after update on p for each row
  2  begin
  3  update f set f1=:new.p1 where f1=:old.p1;
  4  end;
  5  /
Trigger created.
 
 
更新单条记录
 
SQL 10G>update p set p1=4 where p1=1;
1 row updated.
SQL 10G>select *from p;
        P1
----------
         2
         3
         4
SQL 10G>select * from f;
        F1
----------
         4
         2
         3
 
结果没错,after each row trigger实现了级联更新的功能并给出了正确的结果
 
 
我们来更新多条记录
 
SQL 10G>rollback;
Rollback complete.
 
SQL 10G>update p set p1=p1+1;
3 rows updated.
 
SQL 10G>select *from p;
        P1
----------
         2
         3
         4
SQL 10G>select * from f;
        F1
----------
         4
         4
         4
 
可以看到f表的记录变成了三4,显然这不是我们想要的结果,单父表这边把1更新成2,子表也相应把1变成2,这时候子表就有2个2了,然后父表把2更新成3,子表更新两个2成3,这时候子表就有三个3了,当父表把3更新成4,子表会把三个3都更新成4,也就是最后我们看到的结果,这显然是不符合需求的。我们除了限制父表的多条记录更新外还有什么办法呢?
 
我们可以改写trigger,使用before,before row,after trigger联合实现这个功能
 
 
首先我们得创建一个包,定义我们需要的数据结构
 

SQL 10G>create or replace package state_pkg
  2  as
  3      type myArray is table of f.f1%type       ---一个number类型的index table
  4                index by binary_integer;
  5      type vararray is table of varchar2(100)  ---一个varchar类型的index table
  6                index by binary_integer;
  7      type rowidArray is table of vararray       ---一个vararray类型的index table
  8                index by binary_integer;
  9      todo  myArray;        ---存储需要更新的值的table
 10      rid   rowidArray;      ---存储需要更新的f表的rowid的table
 11      empty myArray;      ---空的myarray结构,用来初始化
 12      emptyrid rowidArray; ---空的rowidArray结构,用来初始化
 13  end;
 14  /
Package created.
 
 
 
创建before trigger
 
SQL 10G>create or replace trigger pt_before before update on p
  2  begin
  3  state_pkg.todo:=state_pkg.empty;
  4  state_pkg.rid:=state_pkg.emptyrid;
  5  end;
  6  /
Trigger created.
 
这个trigger的功能就是当每次更新父表前先把package中的index table值清空
 
 
创建before row trigger
 
SQL 10G>create or replace trigger pt_before_row before update on p for each row
  2  declare
  3  begin
  4  state_pkg.todo(state_pkg.todo.count+1) := :new.p1;
  5  select rowid bulk collect into state_pkg.rid(state_pkg.rid.count+1) from f where f1=:old.p1;
  6  end;
  7  /
Trigger created.
 
这个trigger的功能是保留要更新到子表的值到table,同时查询出子表要更新的记录的rowid并保存在state_pkg.rid结构中,如果父表和子表的记录是1对1的关系,那么state_pkg.rid只需要定义成vararray  type而不需要定义成rowidArray type。
 
 
最后创建after trigger
 
SQL 10G>create or replace trigger p_after
  2  after update on p
  3  declare
  4  begin
  5          for i in 1 .. state_pkg.todo.count loop
  6                  for  j in state_pkg.rid(i).first..state_pkg.rid(i).last loop
  7                  update f set f1=state_pkg.todo(i) where rowid=state_pkg.rid(i)(j);
  8                  end loop;
  9          end loop;
 10  end;
 11  /
 
这个trigger就是为了实现最终的级联更新,由于after row trigger会带来错误的结果,所以需要使用after table trigger来更新。
 
 
离得到正确的结果还有一步
 
SQL 10G>select * from p;   
        P1
----------
         1
         2
         3
 

SQL 10G>select * from f;
        F1
----------
         1
         2
         3
 
 
 
SQL 10G>drop trigger pt_after_each;
Trigger dropped.
 
SQL 10G>update p set p1=p1+1;
update p set p1=p1+1
*
ERROR at line 1:
ORA-02292: integrity constraint (TEST.FFK) violated - child record found
 
由于是after table trigger,所以有fk关联的时候会产生02292的错误,我们需要把fk改成deferred
 
 
SQL 10G>alter table f modify constraint ffk initially deferred;
Table altered.
 
SQL 10G>update p set p1=p1+1;
3 rows updated.
 

SQL 10G>select * from p;
        P1
----------
         2
         3
         4
SQL 10G>select * from f;
        F1
----------
         2
         3
         4
SQL 10G>commit;
Commit complete.
 
SQL 10G>update p set p1=p1-1;
3 rows updated.
 
SQL 10G>select * from p;
        P1
----------
         1
         2
         3
 
SQL 10G>select * from f;
        F1
----------
         1
         2
         3
再来看一下1对多的情况
 
SQL 10G>insert into f values(1);
1 row created.
 
SQL 10G>commit;
Commit complete.
 
SQL 10G>select * from f;
        F1
----------
         1
         1
         2
         3
 
 
SQL 10G>update p set p1=p1+1;
3 rows updated.
 
SQL 10G>select * from p;
        P1
----------
         2
         3
         4
 
SQL 10G>select * from f;
        F1
----------
         2
         2
         3
         4
 
 
这样就实现了1对多的级联更新。
 
 
不过使用这种trigger方法来实现级联更新效率比较低,特别是更新一大批记录的情况,我们应该在应用程序里面保证完整性,通过deferred fk,使用代码更新才是效率最高的。
 
 

Oracle & Starcraft

Feed

The owner hasn't specified a feed for this module yet.