hai's profileOracle & StarcraftBlogLists Tools Help

Blog


    April 23

    本blog停止更新

    新的更新全部会在www.orawh.com
    January 24

    hash join 与 10104 event

    OPTIMAL MODE:

     Optimal方式的hash join相对来说非常简单,在这个例子(optimal.doc)里面,一共需要2partition,这也是hash join中最小的partition分配数,随着work area的增大,partition的数目也会增多,partition的分配规律遵守2n次方这样的规则分配。在经过build tablescan以后,hash table被建立,每条记录被hash后对应的hash bucket也被填充,与此相对应的bitmap vectorbit位也被设置,这个时候开始probe tablescan,首先通过同样的hash functionjoin key进行运算,然后对比bitmap vector中的bit是否被标志,如果没被标志那么表示这条记录不符合关联的要求直接被丢弃。如果bit位被设置,那么表示这个值可能与hash bucket中保存的值一致,因为hash算法会有碰撞,所以一个bucket中会有不同的值存在,所以将会对比这个值和hash bucket中保存的值是不是相同,如果相同那么输出给客户端,等probe table扫描完毕,所有结果都输出给客户端。

     

    optimal.doc  点击下载trace 文件

    ONEPASS MODE:


    当我们修改了hash_area_size2m后,hash join的运行模式变成了onepass状态,从上面的trace(onepass.doc)文件中可以发现,partition变成了4

    ### Partition Distribution ###

    Partition:0    rows:49933      clusters:5      slots:1      kept=0

    Partition:1    rows:49674      clusters:5      slots:1      kept=0

    Partition:2    rows:49943      clusters:5      slots:4      kept=0

    Partition:3    rows:50451      clusters:5      slots:5      kept=1


    其中partition 3可以被放在work area中,其他几个partition不可避免的要写出到磁盘上。接下来开始probe tablescan,在扫描的过程中经过bitmap vector的过滤,不符合的记录被丢弃,因为bitmapVector中也保存了每个hash bucket是否在内存还是在磁盘,所以符合的记录就会去查到底这个hash bucket位于哪里,如果在内存,直接比较确切值,丢弃或者输出给客户端(从trace文件中看不出这个过程,trace文件看起来是dump所有不能放入内存的partitition,然后读入partition pairhash join)。如果在磁盘,那么probe table的这条记录也写出到磁盘,形成一个partition pair
    partition 3的数据join完毕,接下来是partition 2的所有部分被读入内存,同时probe table相对的partition也被读入内存进行hash join,然后是partition 1,partition 0。可以看到这里都只读了一次probe table partition,所以我们称之为onepass hash join.

    onepass.doc  点击下载trace 文件

    MULTIPASS MODE:


     这个就是multipass hash join的过程,trace文件异常的长,首先分配了2partitions,没有一个partition能被完全放在work area里面

    ### Partition Distribution ###

     Partition:0    rows:99876      clusters:10     slots:1      kept=0

    Partition:1    rows:100125     clusters:10     slots:4      kept=0

    首先是build tablescan Work area里面容纳了partition 14slotpartition 0的一个slot,另外还有一个slot是为了i/o用的。每当一个slot(可能是一个block)满后将会被写出到磁盘,等build table scan完毕,磁盘上保存了2partition的内容,并且bitmap vector也被建立了。这时开始了probe tablescan,系统为probe table的每个partition分配了一个slot,还分配一个slot作为读入probe table的缓冲,通过bitmap vector的过滤,符合条件的记录被填充到probe partitionslot中并与build table partitionslot来比较确切值。这个步骤完了以后work area里面应该存在一个build table partition 0slot,一个build table partition 1slot,一个写出缓冲的slot,一个读入缓冲的slot,一个probe table partition 0slot,一个probe table partition 1slot,一共6slot,刚好填满所有可分配的slot。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取partition pair开始join,经过10probe partition的读取。partition 0也是经过了10probe partition的读取,这就带来了很多i/o,导致hash join的性能急剧下降。Oracle在进行multipass的过程中会有2hash function的存在。Oracle将会读入磁盘上的build table partition再进行一次hash生成一些subpartition,这样的话每次读入probe table subpartition即可,而不用多次读入probe partition导致过大的i/o.不过在这个trace文件中并没有发现这个过程。的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。里面容纳了的个和的一个另外还有一个是为了用的。每当一个(可能是一个)满后将会被写出到磁盘,等完毕,磁盘上保存了个的内容并且也被建立了。这时开始了的,系统为的每个分配了一个,还分配一个作为读入的缓冲,通过的过滤,符合条件的记录被填充到的中并与的来比较确切值。这个步骤完了以后里面应该存在一个的,一个的,一个写出缓冲的一个读入缓冲的,一个的,一个的,一共个,刚好填满所有可分配的。这个过程中间可能会有在内存中匹配的行返回,然后开始从磁盘读取开始,经过次的读取。也是经过了次的读取,这就带来了很多,导致的性能急剧下降。在进行的过程中会有次的存在。将会读入磁盘上的再进行一次生成一些,这样的话每次读入即可,而不用多次读入导致过大的不过在这个文件中并没有发现这个过程。另外还有一个trace文件反映了ROLE REVERSALoracle在进行join时会去评估build partitionprobe partition的大小,如果发行probe partition小于build partition,那么会对换两者的角色,对原来的probe partition建立hash table,拿原来的build partition来匹配,这种方法也在一定程度上减少了i/o,提高了效率。

    *** HASH JOIN GET FLUSHED PARTITIONS (PHASE 2) ***

     Getting a pair of flushed partions.  

    BUILD PARTION: nrows:24948 size=(3 slots, 384K) 

    PROBE PARTION: nrows:12410 size=(2 slots, 256K)    

    ROLE REVERSAL OCCURRED


    ### Hash table overall statistics ###
    Total buckets: 262144 Empty buckets: 250012 Non-empty buckets: 12132
    Total number of rows: 12410
    Maximum number of rows in a bucket: 3
    Average number of rows in non-empty buckets: 1.022915

    multipass.doc  点击下载trace 文件

    上面这些只是本人基于trace文件对hash join运行模式的推断,不一定能反正hash join真实的运行情况,所以欢迎大家来讨论!  

    November 03

    运用transport tablespace进行快速oracle版本升级(9i-10g)

    大家可能都知道在进行oracle数据库版本升级的时候会有2种方式

    1.通过dbua(database upgrade assistant)

    2.exp/imp

    通过dbua来升级的话由于不涉及到数据文件的改变,所以速度会比较快,但是如果dbua一旦在升级的过程中出现问题可能会导致原来的库不可用。

    而通过exp/imp虽然对原来的库不会有影响,但是如果数据库比较大的话那么升级的时间将会是不可接受的(尤其对24*7)的应用来说。

     针对这个问题,这次的2006 oracle openworld法国的amadeus公司提供了一个非常有创意的点子,就是利用dataguard和transport tablespace功能来实现最短时间内的安全升级。

     首先让我们来了解一下amadeus公司

     

    艾玛迪斯全球旅游分销系统公司(Amadeus Global Travel Distribution SA)是全球领先的旅游行业技术及分销供应商。1987年艾玛迪斯总部建立于西班牙马德里。在 Sophia Antipolis(法国尼斯附近)和美国波士顿设立有市场及开发部门。公司的数据中心位于德国慕尼黑附近的Erding。公司提供各种先进的旅游行业技术解决方案,至今已成为成长最快并被最广泛使用的全球分销系统(GDS)。

    作为卓越的技术合作伙伴,艾玛迪斯把最先进的信息技术带入旅游行业,使众多的旅游供应商、休闲及商务旅游服务商从中获益。通过设立服务于当地市场的national marketing companies(NMCs),艾玛迪斯用其庞大的信息技术资源向全世界200个国家和地区提供优质的技术解决方案。

    我们再来看一下跟它们的数据库相关的信息

    他们的业务系统达到99.99%的可用率,每秒钟有30万次的数据库请求,每天有2亿8千万次transaction,这是一个相当大的数据库系统,如果用dbua或者exp/imp他们都不能接受升级的风险,于是他们的技术人员就想出了用dataguard和transport tablespace功能来实现最短时间内的安全升级。

     具体的实现方法是这样的

    1.先为主库建立一个dataguard数据库(可以在线做)

    2.在dataguard库上安装10g软件(可以在线做)

    3.整理一些不能通过transport tablespace搞定的东西,比如sequence,synonyms,grants......

    4.停止主库这边所有write的应用,提供read的服务(写入停止,提供查询)

    5.强制归档主库redo log并传到dataguard恢复(写入停止,提供查询)

    6.利用transport tablespace来转换数据库版本,并创建sequencee,synonyms,grants等(写入停止,提供查询)。

    7.验证新环境的过程,在验证过程中如果发现有问题,则可以切换会原来的系统(写入停止,提供查询)。

    8.切换应用到10g数据库(提供服务)

    amadeus在演习时做到10分钟内完成4,5,6,7并成功切换了系统,考虑到他们的数据库繁忙程度和数据库容量非常大,这真是一项伟大的成就。我们可以在以后的数据库版本的升级过程中借鉴他们的方法。

     

    我们再从技术上验证一下transport tablespace可以运用在版本升级

    在9i的库上创建一个test tablespace 

    create tablespace test
    datafile '/opt/oracle/test.dbf' size 10m
    extent management local autoallocate;

    创建一张表在test表空间上 

    create table test1(a number) tablespace test;

    insert into test1 values(1);
    commit;

    SQL 9i>select * from test1;

             A
    ----------
             1

     

    把test表空间置为read only模式


    alter tablespace test read only;

     到处test tablespace的metadata

    exp \'sys/sys as sysdba\'  transport_tablespace=y tablespaces=(TEST)  file=test.dmp log=test.log

    传输dmp文件和数据文件(在amadeus的案例里面由于10g的库和9i的库在同一台机器上,所以避免了拷贝数据文件的时间,这也是整个方案的重点之一)到远程

    scp test.dmp oracle@10.0.100.115:/opt/oracle/

    scp /opt/oracle/test.dbf oracle@10.0.100.115:/opt/oracle/

     

    在目标库上导入metadata数据

    imp \'sys/sys as sysdba\' transport_tablespace=y tablespaces=(TEST) file='/opt/oracle/test.dmp' datafiles=

    ('/opt/oracle/test.dbf') tts_owners=test fromuser=test touser=test log=tts_i.log

     

    查看test1表,发现数据一致


    SQL 10G>select * from test1;

             A
    ----------
             1

    把test表空间置为read write模式

    alter tablespace test read write;

    insert into test1 values(2);


    SQL 10G>select * from test1;

             A
    ----------
             1
             2

    一切正常,测试完毕

     

    这个测试简单的模仿了transport tablespace升级数据库的可能性,当然在实际过程中我们要校验是否自包含表空间,是否需要创建sequence等,但是总体来说这种方案能提供最短时间内的数据库版本升级。

    October 16

    itl deadlock

    昨天有同事碰到itl竞争导致的deadlock,这在以前比较少见,那么在这里我们来作个实验重现一下,
    从理论上解释为什么itl竞争会导致deadlock。
    create table test(a number);
    insert into test values(1);
    SQL 10G>insert into test select * from test;
    1 row created.
    SQL 10G>/
    2 rows created.
    SQL 10G>/
    4 rows created.
    SQL 10G>/
    8 rows created.
    SQL 10G>/
    16 rows created.
    SQL 10G>/
    32 rows created.
    SQL 10G>/
    64 rows created.
    SQL 10G>/
    128 rows created.
    SQL 10G>/
    256 rows created.
    SQL 10G>/
    512 rows created.
    SQL 10G>/
    1024 rows created.
    SQL 10G>/
    2048 rows created.
    SQL 10G>/
    4096 rows created.
    SQL 10G>/
    8192 rows created.
    SQL 10G>/
    16384 rows created.
    SQL 10G>commit;
    Commit complete.
    SQL 10G>alter table test move pctfree 0;
    Table altered.
    move表,缩小pctfree为0使表不能创建多余的itl
    SQL 10G>select ini_trans from  dba_tables where owner='TEST' and  table_name='TEST';
     INI_TRANS
    ----------
             1
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x0010.023.0000229f  0x01c01177.05f5.1c  C---    0  scn 0x0005.e88ed5fb
    0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
    0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
    这里可以看到虽然dba_tables里面的ini_trans为1,但是实际上block里面默认的itl是3个,从下面的实验中可以发现。

    SQL 10G>select distinct dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block# from test order by block#;
        BLOCK#
    ----------
         61612
         61613
         61614
         61615
         61616
         61618
         61619
         61620
         61621
         61622
         61623
        BLOCK#
    ----------
         61624
         61626
         61627
         61628
         61629
         61630
         61631
         61632
         61634
         61635
         72057
        BLOCK#
    ----------
         72058
         72059
         72060
         72061
         72062
         72063
         72064
         72065
         72066
         72067
         72068
        BLOCK#
    ----------
         72069
         72070
         72071
         72072
         72817
         72818
         72819
         72820
         72821
         72822
         72823
        BLOCK#
    ----------
         72824
    45 rows selected.
    我们挑2个block来做实验,block 61612,block 61613

    session 1:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1;
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=1;
    session1更新分别更新61612,61613的第2行,分别占据61612,61613的一个itl entry
    session 2:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2;
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=2;
    session2更新分别更新61612,61613的第3行,分别占据61612,61613的另一个itl entry
    session 3:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
    session3 更新61612的第1行,占据61612的第3个itl entry

    session 4:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
    session4 更新61613的第4行,占据61613的第3个itl entry,这个时候2个block的itl entry都被用完,
    每个block itl都是3个并且不能扩展

    session 4:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
    session4 更新61612的第4行,这时候出现"enq: TX - allocate ITL entry"等待
    session 3:
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
    session3 更新61613的第1行,出现"enq: TX - allocate ITL entry"等待
    session 4:
    SQL 10G>update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3;
    update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61612 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=3
    *
    ERROR at line 1:
    ORA-00060: deadlock detected while waiting for resource
    而当session3 更新61613的第1行时,session4这边报出了deadlock,看下图
    Deadlock graph:
                           ---------Blocker(s)--------  ---------Waiter(s)---------
    Resource Name          process session holds waits  process session holds waits
    TX-0013002e-0001ac43        18     154     X             19     143           S
    TX-0014002a-0001c990        19     143     X             18     154           S
    session 154: DID 0001-0012-0001B9E7     session 143: DID 0001-0013-00000DB7
    session 143: DID 0001-0013-00000DB7     session 154: DID 0001-0012-0001B9E7
    Rows waited on:
    Session 143: no row
    Session 154: no row
    Information on the OTHER waiting sessions:
    Session 143:
      pid=19 serial=60295 audsid=6783 user: 55/TEST
      O/S info: user: oracle, term: pts/1, ospid: 664, machine: csdba
                program: sqlplus@csdba (TNS V1-V3)
      application name: SQL*Plus, hash value=3669949024
      Current SQL Statement:
      update test set a=a where dbms_rowid.ROWID_BLOCK_NUMBER(rowid)=61613 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0
    End of information on OTHER waiting sessions.
    September 25

    autotrace,explain plan与绑定变量

    create table fenbu as select 1 id,'Y' flag from dba_objects where rownum<100001;

    insert into fenbu values(1,'N');
    commit;

    create index IDX_FENBU_FLAG on fenbu(flag);

    analyze table fenbu compute statistics for table for all columns for all indexes;

    var a varchar2(32);
    exec :a:='N';

    SQL 10G>set autotrace trace exp;

    SQL 10G>alter session set events'10046 trace name context forever,level 12';  

    Session altered.

    SQL 10G>select * from fenbu    where   flag=:a;


    Execution Plan
    ----------------------------------------------------------

    ----------------------------------------------------------------
    | Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)|
    ----------------------------------------------------------------
    |   0 | SELECT STATEMENT  |       | 50001 |   244K|   202  (80)|
    |*  1 |  TABLE ACCESS FULL| FENBU | 50001 |   244K|   202  (80)|
    ----------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

       1 - filter("FLAG"=:A)

    Note
    -----
       - 'PLAN_TABLE' is old version

     


    SQL 10G>alter session set events'10046 trace name context off';

    Session altered.

     

    很显然可以看到set autotrace的执行计划是错的,这是因为set auotrace,explain plan等操作
    并不会发生bind peeking,它并不会把绑定变量的值反映到执行计划里面,不会去看直方图的
    数据分布,所以它生成的计划并不可信,我们可以来看一下10046的真实计划。


    select *
    from
     fenbu    where   flag=:a


    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2      0.00       0.00          0          4          0           1
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4      0.00       0.00          0          4          0           1

    Misses in library cache during parse: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 55 

    Rows     Row Source Operation
    -------  ---------------------------------------------------
          1  TABLE ACCESS BY INDEX ROWID FENBU (cr=4 pr=0 pw=0 time=102 us)
          1   INDEX RANGE SCAN IDX_FENBU_FLAG (cr=3 pr=0 pw=0 time=78 us)(object id 83455)

    September 01

    oracle compress table II

    上一篇文章提到压缩表发生update后会导致行迁移,但是在上篇文章里面没有做过多描述
    ,这次我们来仔细看一下update compressed table会发生什么事情。此外我们再来看看压
    缩表结构修改是怎么处理的。

    首先创建测试表

    create table test2(a varchar2(10),b varchar2(10),c varchar2(10));

    begin
    for i in 1000000000..1000100000 loop
    insert into test2 values(i,'1',to_char(mod(i,100)));
    commit;
    end loop;
    end;
    /
    SQL 10G>create table testcom4 compress as select * from test2 order by c;
    Table created.

    对压缩表添加一个列

    SQL 10G>SQL 10G>SQL 10G>

    alter table testcom4 add d number;

    Table altered.

    定位到一条记录,找出所在文件号,块号,文件号和rowid

    SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
    dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
    dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from
    testcom4 where rownum&lt;2;

    FILE# BLOCK# ROW#
    ---------- ---------- ----------
    12 61364 0

    SQL 10G>select rowid from testcom4 where rownum&lt;2;

    ROWID
    ------------------
    AAAT9AAAMAAAO+0AAA

    更新这条记录

    SQL 10G>update testcom4 set d=1 where rowid='AAAT9AAAMAAAO+0AAA';

    1 row updated.

    SQL 10G>commit;

    Commit complete.

    SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
    dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
    dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from testcom4
    where rowid='AAAT9AAAMAAAO+0AAA';

    FILE# BLOCK# ROW#
    ---------- ---------- ----------
    12 61364 0
    dump这个block看看行迁移是怎么发生的

    SQL 10G>alter system dump datafile 12 block 61364;

    System altered.

    perm_9ir2[3]={ 2 0 1 }
    ...
    block_row_dump:
    tab 0, row 0, @0x1f79
    tl: 7 fb: --H-FL-- lb: 0x0 cc: 2
    col 0: [ 1] 31
    col 1: [ 1] 30
    bindmp: 01 bc 02 c9 31 c9 30
    tab 1, row 0, @0x1f69
    tl: 9 fb: --H----- lb: 0x2 cc: 0
    nrid: 0x0300f085.0 这里指向了新的数据块
    bindmp: 20 02 00 03 00 f0 85 00 00
    定位新的块

    SQL 10G>select dbms_utility.DATA_BLOCK_ADDRESS_FILE(to_number('300f085','xxxxxxxxxx'))
    file#,dbms_utility.DATA_BLOCK_ADDRESS_BLOCK(to_number('300f085','xxxxxxxxxx'))
    block# from dual;

    FILE# BLOCK#
    ---------- ----------
    12 61573

    dump新的block

    SQL 10G>alter system dump datafile 12 block 61573;

    System altered.

    block_row_dump:
    tab 0, row 0, @0x1f65
    tl: 27 fb: ----FL-- lb: 0x1 cc: 4
    hrid: 0x0300efb4.0
    col 0: [10] 31 30 30 30 30 39 33 30 30 30
    col 1: [ 1] 31
    col 2: [ 1] 30
    col 3: [ 2] c1 02

    可以看到新的block里面已经是非压缩的数据格式了,从这里可以看出对压缩表的更新确实是会导致
    压缩失效。
    那么能不能删除新加的列呢?试一下

    SQL 10G>alter table testcom4 drop column d;
    alter table testcom4 drop column d
    *
    ERROR at line 1:
    ORA-39726: unsupported add/drop column operation on compressed tables

    报错了,提示“unsupported add/drop column operation on compressed tables”
    metalink上说这是oracle的一个bug,在10g修复,但是在我的10g r2的版本上还是
    不通过。9i的版本更加离谱,连add column都不行。

    ---------------------------
    SQL 9I> alter table testcom4 add d number;
    alter table testcom4 add d number
    *
    ERROR at line 1:
    ORA-22856: cannot add columns to object tables
    ---------------------------
    10g可以进行set unused的操作

    SQL 10G>alter table testcom4 set unused column d;

    Table altered.
    但是drop unused columns依然报错,依然是一个bug

    SQL 10G>alter table testcom4 drop unused columns;
    alter table testcom4 drop unused columns
    *
    ERROR at line 1:
    ORA-12996: cannot drop system-generated virtual column

    希望下次下载一个patch可以解决这些问题。

    August 29

    library cache lock wait event

     
    说起library cache lock wait event就必须先说一下library cache lock是什么东西,library lock是oracle用来对library cache object进行并发控制的两种数据结构之一,另外一种数据结构是pin,lock将会在pin之前获得,并被加载到library cache handle上。library cache lock有3种形式,share,null,exclusive,具体的library cache lock的分析请看我的另一篇帖子
     
    通常我们举的关于library cache lock wait event的例子都是用procedure来模拟的,eygle发表过这方面的文章,今天我举的是另一个例子,也是一次真实的事故,事故发生的原因就是因为ddl导致cursor invalidation,但是ddl持续运行了很长时间,导致后来的session需要parse sql的时候不能获得library cache lock,进一步导致app服务器的connection pool满,导致应用crash。下面我们用一个简单的例子模拟一下这个场景。
     
    session 1:
    SQL 10G>alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;
    Table altered.
     
     
    session 2:
     
    SQL 10G>select sid from v$mystat where rownum<2;
           SID
    ----------
           142

    SQL 10G>select count(*) from test;
     

    session 3:
    SQL 10G> select event from v$session_wait where sid=142;
    EVENT
    --------------------------------------------------------------------------------
    library cache lock
    当test表过大,创建uk constraint时间过长时会有越来越多的session等待library cache lock从而进一步导致应用crash,这样提醒我们当对大表进行ddl时一定要分解ddl操作,不要一句语句搞定所有操作,象alter table test add constraint uk_test unique(a,b,c,d,e) using index online compute statistics;这种语句虽然非常便捷,但是不一定适合在大并发的时候去执行。我们应该把这个操作分解成几步来做,
    1. create index ind_test on test(a,b,c,d,e) online compute statistics;
    2.  alter table test add constraint uk_test unique(a,b,c,d,e) using index ind_test novalidate;
    3.  校验表里面的数据是否符合唯一约束
    4.  alter table test modify constraint uk_test validate;
     
    这里的要点就是要把每一个ddl分解成小的操作来执行。值得指出的一点是alter table test modify constraint uk_test validate;这个步骤虽然会可能会耗时非常久,但是这个步骤是不会加载library cache exclusive lock的,所以它不会导致其他session等待library cache lock。但是constraint的状态从disable到enable的转换则还是会加载library cache exclusive lock,导致其他进程等待library cache lock。oracle可能在这个地方做了不同的处理,针对disable->enable和novalidate->validate采取了不同的lock类型,显然我们也可以知道disable->enable和novalidate->validate对library cache object的影响是不一样的,一个是可以说是从无到有,一个是只是改变了一下状态,我的猜测是disable->enable的话oracle依然加载了exclusive lock,而novalidate->validate的话oracle加载了null or share lock,有心的人可以dump一下sga来测试一下。
     
     
     
    August 04

    dbms_stats和leaf_blocks

    dbms_stats是oracle用来代替原有的analyze功能的一个包,与analyze相比dbms_stats具有很多优势,比如并行,比如分区信息统计等,但是dbms_stats再分析index的时候处理方式并不是太理想,dbms_stats分析index时将会只统计leaf_blocks为当前有数据的leaf block,而analyze则会统计为所有曾经被使用过的leaf block number,很显然dbms_stats的统计结果会使index fast full scan的成本被严重低估,在某些情况下会错误得选择index fast full scan做为执行路径。下面来看一个例子:
     
    先清空原来的表
    SQL 10G>truncate table t1;
    Table truncated.
     
    插入数据
    SQL 10G>insert into t1 select
      2  rownum id,
      3  trunc(100 * dbms_random.normal) val,
      4  rpad('x',100) padding
      5  from
      6  all_objects
      7  where
      8  rownum <= 10000
      9  ;
    10000 rows created.
     
    SQL 10G>commit;
    Commit complete.
     
    创建索引,并限制pctfree为99,模拟大索引的产生
    SQL 10G>create index ind_t1 on t1(id) pctfree 99;
    Index created.
     
    使用analyze分析索引
    SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
    Index analyzed.
     
    SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
    INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ---------- -----------
    IND_T1                                  2       10000
     
    再使用dbms_stats分析,可以看到在这个时候基本统计数据是相同的
    SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
    PL/SQL procedure successfully completed.
     
    SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
    INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ---------- -----------
    IND_T1                                  2       10010
     
    删除数据使表中只保留一条记录
    SQL 10G>delete from t1 where rownum<10000;
    9999 rows deleted.
     
    SQL 10G>commit;
    Commit complete.
     
    再用analyze分析索引,可以发现leaf_blocks依然是10000
    SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
    Index analyzed.
     
    SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
    INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ---------- -----------
    IND_T1                                  2       10000
     
    看cost这一项显示index fast full scan的成本为2679,这是正确的
    SQL 10G>set autotrace trace exp;

    SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
    Execution Plan
    -------------------------------------------------------------
    | Id  | Operation             | Name   | Rows  | Cost (%CPU)|
    -------------------------------------------------------------
    |   0 | SELECT STATEMENT      |        |     1 |  2679  (19)|
    |   1 |  SORT AGGREGATE       |        |     1 |            |
    |   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |  2679  (19)|
    -------------------------------------------------------------
     
    使用dbms_stats分析索引,leaf_blocks被统计为1,只统计了当前在用的leaf block
    SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS('TEST','IND_T1');
    PL/SQL procedure successfully completed.
     
    SQL 10G>set autotrace off;

    SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM  user_indexes where table_name='T1';
    INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ---------- -----------
    IND_T1                                  2           1
     
    看cost这一项显示index fast full scan的成本为1,这显然是出现了错误
    SQL 10G>set autotrace trace;
    SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
    Execution Plan
    -------------------------------------------------------------
    | Id  | Operation             | Name   | Rows  | Cost (%CPU)|
    -------------------------------------------------------------
    |   0 | SELECT STATEMENT      |        |     1 |     1   (0)|
    |   1 |  SORT AGGREGATE       |        |     1 |            |
    |   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |     1   (0)|
    -------------------------------------------------------------
     
    再来看看它究竟需要读取多少个块,是不是cost=1就够了
     
    SQL 10G>   ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
    Session altered.
    SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;

    Execution Plan
    ----------------------------------------------------------
    -------------------------------------------------------------
    | Id  | Operation             | Name   | Rows  | Cost (%CPU)|
    -------------------------------------------------------------
    |   0 | SELECT STATEMENT      |        |     1 |     1   (0)|
    |   1 |  SORT AGGREGATE       |        |     1 |            |
    |   2 |   INDEX FAST FULL SCAN| IND_T1 |     1 |     1   (0)|
    -------------------------------------------------------------
     

    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
          10035  consistent gets
          10016  physical reads
              0  redo size
            411  bytes sent via SQL*Net to client
            385  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
    显然,这里发生了10016个physical reads,cost=1是远远不够的。不知道oracle会不会就这个问题有改进方案,大家拭目以待。
     
     
    July 24

    10g r2 db_file_multiblock_read_count在cbo中的变化

    10g r2和前几个版本比起来对db_file_multiblock_read_count在cbo成本计算中的公式做了调整,看一下下面我实验后的对照表。
     
    我的测试环境
     
    [oracle@csdba ~]$ uname -a
    Linux csdba 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005 i686 i686 i386 GNU/Linux
     

    SQL*Plus: Release 10.2.0.1.0 - Production on Mon Jul 24 17:07:42 2006
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
    With the Partitioning and Data Mining Scoring Engine options
     

    分别测试了两个版本下1000行,10000行,100000行记录的6个对比

    exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>1000);


    exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>10000);


    exec dbms_stats.SET_TABLE_STATS(OWNNAME=>'TEST',TABNAME=>'T1',NUMBLKS=>100000);

     

    10g r2版本

    MBRC BLOCKS COST ADJUSTED_MBRC
    4 1000 377 2.652519894
    8 1000 273 3.663003663
    16 1000 221 4.524886878
    32 1000 195 5.128205128
    64 1000 182 5.494505495
    128 1000 176 5.681818182
    MBRC BLOCKS COST ADJUSTED_MBRC
    4 10000 3763 2.657454159
    8 10000 2722 3.673769287
    16 10000 2201 4.543389368
    32 10000 1941 5.151983514
    64 10000 1811 5.521811154
    128 10000 1745 5.730659026
    MBRC BLOCKS COST ADJUSTED_MBRC
    4 100000 37615 2.658513891
    8 100000 27199 3.676605758
    16 100000 21990 4.547521601
    32 100000 19386 5.158361704
    64 100000 18084 5.529750055
    128 100000 17433 5.736247347


               
    9i r2

    MBRC BLOCKS COST ADJUSTED_MBRC
    4 1000 241 4.149377593
    8 1000 153 6.535947712
    16 1000 98 10.20408163
    32 1000 63 15.87301587
    64 1000 40 25
    128 1000 26 38.46153846
    MBRC BLOCKS COST ADJUSTED_MBRC
    4 10000 2397 4.171881519
    8 10000 1519 6.583278473
    16 10000 963 10.38421599
    32 10000 611 16.36661211
    64 10000 388 25.77319588
    128 10000 246 40.6504065
    MBRC BLOCKS COST ADJUSTED_MBRC
    4 100000 23953 4.1748424
    8 100000 15179 6.588049279
    16 100000 9619 10.39609107
    32 100000 6096 16.40419948
    64 100000 3863 25.88661662
    128 100000 2449 40.83299306


    最后得出的测试结果是在10g r2里面,db_file_multiblock_read_count
    对cost计算的影响明显变小,看起来oracle对db_file_multiblock_read_count
    采取了更谨慎的态度,这样一来不会因为设置了
    db_file_multiblock_read_count为一个较大的值而导致数据库倾向于全表扫描。

     

     

    July 01

    今夜,阿根廷与世界杯再见

     
    悲情的阿根廷的主教练,他继承了阿根廷的悲情的传统。巴西莱、帕萨雷拉、贝尔萨、在这一刻灵魂附体,佩克尔曼一个人他代表了阿根廷足球悲情的传统,在这一刻他不是一个人在流泪,他不是一个人!
     
    坎比亚索,面对这个点球。他面对的全世界阿根廷球迷的目光和期待。

      莱曼曾经在之前扑出阿亚拉点球,坎比亚索应该深知这一点,他还能够微笑着面对他面前的这个人吗?10秒钟以后他会是怎样的表情?

      球被扑了!比赛结束了!阿根廷队失败了。他们再一次倒在法西斯的球队面前,悲情的阿根廷的主教练!佩克尔曼今天倒下!阿根廷别再为我哭泣!

      这个点球是一个绝对理论上的臭脚。绝对的臭脚,阿根廷队淘汰出了四强!

      这个失败属于阿根廷,属于阿亚拉,属于弗朗哥,属于佩克尔曼,属于所有热爱阿根廷足球的人!

      阿根廷队真的会后悔的,佩克尔曼在下半时他们领先一球的情况下打得太保守、太沉稳了,他失去了自己在小组赛的那种勇气,面对德国悠久的历史,他失去了他在小组赛中那种痛打落水狗的作风,他终于自食其果。阿根廷队该回家了,也许他们不用回遥远的阿根廷,他们不用回家,因为他们大多数人都在欧洲生活,再见!

    June 30

    first_rows_n和all_rows

    first_rows_n和all_rows都是oracle optimizer_mode的选项,他们有什么区别呢,会对优化器产生怎么样的影响呢?让我们一起来解开迷题.

    all_rows模式:
    all_rows是oracle优化器默认的模式,它将选择一种在最短时间内返回所有数据的执行计划,它将基于整体成本的考虑.

    first_rows_n模式:
    first_rows_n是从9i开始引入来代替以前的first_rows模式,虽然first_rows模式仍然存在,但是oracle已经不推荐使用.因为它基本上是基于oracle可执行文件硬编码的很多规则实现,比如它会尝试彻底去避免hash join或者merge join除非nest loop的非驱动表会进行全表扫描,first_rows也会偏向于使用索引而不是全表扫描,这在某些情况下也会带来反面的效果.所以oracle引入first_rows_n来代替first_rows,first_rows_n是根据成本而不是基于硬编码的规则来选择执行计划.n可以是1,10,100,1000或者直接用first_rows(n) hint指定任意正数.这里的n是我们想获取结果集的前n条记录,举个例子,如果n为1,那么oracle会选择一个最快速度返回结果集第一条记录的执行计划而不管是否它获取结果集的所有记录的执行成本是不是最优.这种需求在很多分页语句的需求中会碰到.


    那么oracle是怎么判断first_rows_n的成本并作出选择的呢,10053跟踪事件能给我们答案


    create table t as select * from dba_objects;
    create table t1 as select * from t;
    create index ind_object_id on t(object_id) compute statistics;
    create index ind_t1_object_id on t1(object_id) compute statistics;
    analyze table t compute statistics for table for all columns;
    analyze table t1 compute statistics for table for all columns;

    准备好测试表和索引后来看看测试脚本

    all_rows模式:
    alter session set events'10053 trace name context forever,level 1';
    alter session set optimizer_mode=all_rows;
    select t.owner from t,t1 where t.object_id  =  t1.object_id;
    alter session set events'10053 trace name context off';

    first_rows_1模式:
    alter session set events'10053 trace name context forever,level 1';
    alter session set optimizer_mode=first_rows_1;
    select t.owner from t,t1 where t.object_id   =  t1.object_id;
    alter session set events'10053 trace name context off';

    first_rows_10模式:
    alter session set events'10053 trace name context forever,level 1';
    alter session set optimizer_mode=first_rows_10;
    select t.owner from t,t1 where t.object_id   =  t1.object_id;
    alter session set events'10053 trace name context off';

    first_rows_100模式:
    alter session set events'10053 trace name context forever,level 1';
    alter session set optimizer_mode=first_rows_100;
    select t.owner from t,t1 where t.object_id   =  t1.object_id;
    alter session set events'10053 trace name context off';

    由于篇幅太长,所以把10053的trace文件简化了一下,只留下join这一部分的内容,并把merge join的部分去除了
    测试环境是10g r2

    all_rows:

    **************************
    GENERAL PLANS
    **************************
    Considering cardinality-based initial join order.
    ***********************
    Join order[1]:   T[T]#0  T1[T1]#1
    ***************
    Now joining: T1[T1]#1
    ***************
    NL Join
      Outer table: Card: 51986.00  Cost: 164.59  Resp: 164.59  Degree: 1  Bytes: 9
      Inner table: T1  Alias: T1
      Access Path: TableScan
        NL Join:  Cost: 8493121.71  Resp: 8493121.71  Degree: 0
          Cost_io: 8358538.00  Cost_cpu: 839658589661
          Resp_io: 8358538.00  Resp_cpu: 839658589661
      Access Path: index (index (FFS))
        Index: IND_T1_OBJECT_ID
        resc_io: 25.16  resc_cpu: 7056806
        ix_sel: 0.0000e+00  ix_sel_with_filters: 1
      Inner table: T1  Alias: T1
      Access Path: index (FFS)
        NL Join:  Cost: 1366740.53  Resp: 1366740.53  Degree: 0
          Cost_io: 1307937.00  Cost_cpu: 366871247240
          Resp_io: 1307937.00  Resp_cpu: 366871247240
      Access Path: index (AllEqJoinGuess)
        Index: IND_T1_OBJECT_ID
        resc_io: 1.00  resc_cpu: 8371
        ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
        NL Join: Cost: 52220.34  Resp: 52220.34  Degree: 1
          Cost_io: 52148.00  Cost_cpu: 451348998
          Resp_io: 52148.00  Resp_cpu: 451348998
      Best NL cost: 52220.34
              resc: 52220.34 resc_io: 52148.00 resc_cpu: 451348998
              resp: 52220.34 resp_io: 52148.00 resp_cpu: 451348998
    Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
    Join Card - Rounded: 51982 Computed: 51982.00

    HA Join
      Outer table:
        resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
      Inner table: T1  Alias: T1
        resc: 28.13  card: 51986.00  bytes: 4  deg: 1  resp: 28.13
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.58  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
    HA Join (swap)
      Outer table:
        resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
      Inner table:  T  Alias: T
        resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.58  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
      HA cost: 195.30
         resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
         resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
    Best:: JoinMethod: Hash
           Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
    ***********************
    Best so far: Table#: 0  cost: 164.5888  card: 51986.0000  bytes: 467874
                 Table#: 1  cost: 195.3030  card: 51982.0000  bytes: 675766
                
    计算第一种join顺序的成本值,T做驱动表,T1做内部表,
    Best:: JoinMethod: Hash
           Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
    在这里可以看到最优join方式是hash join,
    最终的成本是195.30,返回结果集记录数是51982


    ***********************
    Join order[2]:  T1[T1]#1   T[T]#0
    ***************
    Now joining:  T[T]#0
    ***************
    NL Join
      Outer table: Card: 51986.00  Cost: 28.13  Resp: 28.13  Degree: 1  Bytes: 4
      Inner table:  T  Alias: T
      Access Path: TableScan
        NL Join:  Cost: 8492985.25  Resp: 8492985.25  Degree: 0
          Cost_io: 8358403.00  Cost_cpu: 839649495148
          Resp_io: 8358403.00  Resp_cpu: 839649495148
      Access Path: index (AllEqJoinGuess)
        Index: IND_OBJECT_ID
        resc_io: 2.00  resc_cpu: 15913
        ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
        NL Join (ordered): Cost: 104132.73  Resp: 104132.73  Degree: 1
          Cost_io: 103999.00  Cost_cpu: 834303785
          Resp_io: 103999.00  Resp_cpu: 834303785
      Best NL cost: 104132.73
              resc: 104132.73 resc_io: 103999.00 resc_cpu: 834303785
              resp: 104132.73 resp_io: 103999.00 resp_cpu: 834303785
    Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
    Join Card - Rounded: 51982 Computed: 51982.00
    HA Join
      Outer table:
        resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
      Inner table:  T  Alias: T
        resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.58  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
      HA cost: 195.30
         resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
         resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
    Join order aborted: cost > best plan cost
    计算第二种join顺序的成本值,T1做驱动表,T做内部表,
    Join order aborted: cost > best plan cost
    第二种join顺序被放弃,因为成本大于已经第一种join顺序的最优成本


    ***********************
    (newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:2000
    *********************************
    Number of join permutations tried: 2
    *********************************
    (newjo-save)    [1 0 ]
    Final - All Rows Plan:  Best join order: 1
      Cost: 195.3030  Degree: 1  Card: 51982.0000  Bytes: 675766
      Resc: 195.3030  Resc_io: 189.0000  Resc_cpu: 39324090
      Resp: 195.3030  Resp_io: 189.0000  Resc_cpu: 39324090

    在All Rows模式下最终优化器选择了Best join order: 1,Cost: 195.3030,
    尝试了2种join 顺序(Number of join permutations tried: 2)

     

     


    first_rows_1模式:


    ***************************************
    GENERAL PLANS
    ***************************************
    Considering cardinality-based initial join order.
    ***********************
    Join order[1]:   T[T]#0  T1[T1]#1
    ***************
    Now joining: T1[T1]#1
    ***************
    NL Join
      Outer table: Card: 51986.00  Cost: 164.59  Resp: 164.59  Degree: 1  Bytes: 9
      Inner table: T1  Alias: T1
      Access Path: TableScan
        NL Join:  Cost: 8493121.71  Resp: 8493121.71  Degree: 0
          Cost_io: 8358538.00  Cost_cpu: 839658589661
          Resp_io: 8358538.00  Resp_cpu: 839658589661
      Access Path: index (index (FFS))
        Index: IND_T1_OBJECT_ID
        resc_io: 25.16  resc_cpu: 7056806
        ix_sel: 0.0000e+00  ix_sel_with_filters: 1
      Inner table: T1  Alias: T1
      Access Path: index (FFS)
        NL Join:  Cost: 1366740.53  Resp: 1366740.53  Degree: 0
          Cost_io: 1307937.00  Cost_cpu: 366871247240
          Resp_io: 1307937.00  Resp_cpu: 366871247240
      Access Path: index (AllEqJoinGuess)
        Index: IND_T1_OBJECT_ID
        resc_io: 1.00  resc_cpu: 8371
        ix_sel: 1.9239e-05  ix_sel_with_filters: 1.9239e-05
        NL Join: Cost: 52220.34  Resp: 52220.34  Degree: 1
          Cost_io: 52148.00  Cost_cpu: 451348998
          Resp_io: 52148.00  Resp_cpu: 451348998
      Best NL cost: 52220.34
              resc: 52220.34 resc_io: 52148.00 resc_cpu: 451348998
              resp: 52220.34 resp_io: 52148.00 resp_cpu: 451348998
    Join Card:  51982.00 = outer (51986.00) * inner (51986.00) * sel (1.9234e-05)
    Join Card - Rounded: 51982 Computed: 51982.00
    HA Join
      Outer table:
        resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
      Inner table: T1  Alias: T1
        resc: 28.13  card: 51986.00  bytes: 4  deg: 1  resp: 28.13
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.58  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
    HA Join (swap)
      Outer table:
        resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
      Inner table:  T  Alias: T
        resc: 164.59  card: 51986.00  bytes: 9  deg: 1  resp: 164.59
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.58  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 195.30  Resp: 195.30  [multiMatchCost=0.00]
      HA cost: 195.30
         resc: 195.30 resc_io: 189.00 resc_cpu: 39324090
         resp: 195.30 resp_io: 189.00 resp_cpu: 39324090
    Best:: JoinMethod: Hash
           Cost: 195.30  Degree: 1  Resp: 195.30  Card: 51982.00  Bytes: 13
    ***********************
    Best so far: Table#: 0  cost: 164.5888  card: 51986.0000  bytes: 467874
                 Table#: 1  cost: 195.3030  card: 51982.0000  bytes: 675766
    *********************************
    Number of join permutations tried: 1
    *********************************
    (newjo-save)    [1 0 ]
    Final - All Rows Plan:  Best join order: 1
      Cost: 195.3030  Degree: 1  Card: 51982.0000  Bytes: 675766
      Resc: 195.3030  Resc_io: 189.0000  Resc_cpu: 39324090
      Resp: 195.3030  Resp_io: 189.0000  Resc_cpu: 39324090
    kkoipt: Query block SEL$1 (#0)
    ******* UNPARSED QUERY IS *******
    SELECT /*+ NO_STAR_TRANSFORMATION NO_EXPAND */ "T"."OWNER" "OWNER" FROM "TEST"."T" "T","TEST"."T1" "T1" WHERE "T"."OBJECT_ID"="T1"."OBJECT_ID"
    kkoqbc-end
              : call(in-use=32712, alloc=49112), compile(in-use=35284, alloc=36696)
    First K Rows: K/N ratio = 0.000019237428341, qbc=0x905f2620
    First K Rows: Setup end
    ***********************
     
    在FIRST_ROWS_1模式下,oracle会先按ALL_ROWS模式计算一种join顺序(Number of join permutations tried: 1)
    ,得到返回结果集的大小,
    从而计算出FIRST_ROWS_1中的1条记录和所有结果集记录的一个比率值,
    Join Card - Rounded: 51982 Computed: 51982.00
    First K Rows: K/N ratio = 1/51982=0.000019237428341
    通过这个K/N ratio,oracle会重新计算join cost


    SINGLE TABLE ACCESS PATH (First K Rows)
     
    Table:  T  Alias: T    
        Card: Original: 2  Rounded: 2  Computed: 2.00  Non Adjusted: 2.00
      Access Path: TableScan
        Cost:  2.00  Resp: 2.00  Degree: 0
          Cost_io: 2.00  Cost_cpu: 7541
          Resp_io: 2.00  Resp_cpu: 7541
      Best:: AccessPath: TableScan
             Cost: 2.00  Degree: 1  Resp: 2.00  Card: 2.00  Bytes: 9
    ***************************************
    SINGLE TABLE ACCESS PATH (First K Rows)
      Table: T1  Alias: T1    
        Card: Original: 25996  Rounded: 25996  Computed: 25996.00  Non Adjusted: 25996.00
      Access Path: TableScan
        Cost:  83.30  Resp: 83.30  Degree: 0
          Cost_io: 82.00  Cost_cpu: 8079850
          Resp_io: 82.00  Resp_cpu: 8079850
      Access Path: index (index (FFS))
        Index: IND_T1_OBJECT_ID
        resc_io: 14.00  resc_cpu: 3532204
        ix_sel: 0.0000e+00  ix_sel_with_filters: 1
      Access Path: index (FFS)
        Cost:  14.57  Resp: 14.57  Degree: 1
          Cost_io: 14.00  Cost_cpu: 3532204
          Resp_io: 14.00  Resp_cpu: 3532204
      Access Path: index (FullScan)
        Index: IND_T1_OBJECT_ID
        resc_io: 59.00  resc_cpu: 5618765
        ix_sel: 1  ix_sel_with_filters: 1
        Cost: 59.90  Resp: 59.90  Degree: 1
      Best:: AccessPath: IndexFFS  Index: IND_T1_OBJECT_ID
             Cost: 14.57  Degree: 1  Resp: 14.57  Card: 25996.00  Bytes: 4
    First K Rows: unchanged join prefix len = 1

    ***********************
    Join order[1]:   T[T]#0  T1[T1]#1
    ***************
    Now joining: T1[T1]#1
    ***************
    NL Join
      Outer table: Card: 2.00  Cost: 2.00  Resp: 2.00  Degree: 1  Bytes: 9
      Inner table: T1  Alias: T1
      Access Path: TableScan
        NL Join:  Cost: 166.59  Resp: 166.59  Degree: 0
          Cost_io: 164.00  Cost_cpu: 16167241
          Resp_io: 164.00  Resp_cpu: 16167241
      Access Path: index (index (FFS))
        Index: IND_T1_OBJECT_ID
        resc_io: 13.50  resc_cpu: 3532204
        ix_sel: 0.0000e+00  ix_sel_with_filters: 1
      Inner table: T1  Alias: T1
      Access Path: index (FFS)
        NL Join:  Cost: 30.13  Resp: 30.13  Degree: 0
          Cost_io: 29.00  Cost_cpu: 7071948
          Resp_io: 29.00  Resp_cpu: 7071948
      Access Path: index (AllEqJoinGuess)
        Index: IND_T1_OBJECT_ID
        resc_io: 1.00  resc_cpu: 8371
        ix_sel: 3.8475e-05  ix_sel_with_filters: 3.8475e-05
        NL Join: Cost: 4.00  Resp: 4.00  Degree: 1
          Cost_io: 4.00  Cost_cpu: 24284
          Resp_io: 4.00  Resp_cpu: 24284
      Best NL cost: 4.00
              resc: 4.00 resc_io: 4.00 resc_cpu: 24284
              resp: 4.00 resp_io: 4.00 resp_cpu: 24284
    Join Card:  1.00 = outer (2.00) * inner (25996.00) * sel (1.9234e-05)
    Join Card - Rounded: 1 Computed: 1.00

    HA Join
      Outer table:
        resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
      Inner table: T1  Alias: T1
        resc: 14.57  card: 25996.00  bytes: 4  deg: 1  resp: 14.57
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.17  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 181.32  Resp: 181.32  [multiMatchCost=0.00]
    HA Join (swap)
      Outer table:
        resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
      Inner table:  T  Alias: T
        resc: 2.00  card: 2.00  bytes: 9  deg: 1  resp: 2.00
        using dmeth: 2  #groups: 1
        Cost per ptn: 1.75  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 31.88  Resp: 31.88  [multiMatchCost=0.00]
      HA cost: 31.88
         resc: 31.88 resc_io: 29.00 resc_cpu: 17981913
         resp: 31.88 resp_io: 29.00 resp_cpu: 17981913
    Best:: JoinMethod: NestedLoop
           Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.00  Bytes: 13
    ***********************
    Best so far: Table#: 0  cost: 2.0012  card: 2.0000  bytes: 18
                 Table#: 1  cost: 4.0039  card: 1.0000  bytes: 13
    ***********************

    经过重新计算后,
    计算第一种join顺序的成本值,T做驱动表,T1做内部表,
    Best:: JoinMethod: NestedLoop
           Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.00  Bytes: 13
    在这里可以看到最优join方式是nest loop,这和ALL_ROWS下选择有了区别
    最终的成本是4.00,返回结果集记录数是1(Join Card - Rounded: 1)

    ***********************
    Join order[2]:  T1[T1]#1   T[T]#0
    ***************
    Now joining:  T[T]#0
    ***************
    NL Join
      Outer table: Card: 2.00  Cost: 2.00  Resp: 2.00  Degree: 1  Bytes: 4
      Inner table:  T  Alias: T
      Access Path: TableScan
        NL Join:  Cost: 166.59  Resp: 166.59  Degree: 0
          Cost_io: 164.00  Cost_cpu: 16167061
          Resp_io: 164.00  Resp_cpu: 16167061
      Access Path: index (AllEqJoinGuess)
        Index: IND_OBJECT_ID
        resc_io: 2.00  resc_cpu: 15913
        ix_sel: 3.8475e-05  ix_sel_with_filters: 3.8475e-05
        NL Join (ordered): Cost: 5.01  Resp: 5.01  Degree: 1
          Cost_io: 5.00  Cost_cpu: 31647
          Resp_io: 5.00  Resp_cpu: 31647
      Best NL cost: 5.01
              resc: 5.01 resc_io: 5.00 resc_cpu: 31647
              resp: 5.01 resp_io: 5.00 resp_cpu: 31647
    Join Card:  1.00 = outer (2.00) * inner (25996.00) * sel (1.9234e-05)
    Join Card - Rounded: 1 Computed: 1.00
    HA Join
      Outer table:
        resc: 28.13  card 51986.00  bytes: 4  deg: 1  resp: 28.13
      Inner table:  T  Alias: T
        resc: 83.30  card: 25996.00  bytes: 9  deg: 1  resp: 83.30
        using dmeth: 2  #groups: 1
        Cost per ptn: 2.17  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 113.59  Resp: 113.59  [multiMatchCost=0.00]
    HA Join (swap)
      Outer table:
        resc: 164.59  card 51986.00  bytes: 9  deg: 1  resp: 164.59
      Inner table: T1  Alias: T1
        resc: 2.00  card: 2.00  bytes: 4  deg: 1  resp: 2.00
        using dmeth: 2  #groups: 1
        Cost per ptn: 1.75  #ptns: 1
        hash_area: 0 (max=0)   Hash join: Resc: 168.34  Resp: 168.34  [multiMatchCost=0.00]
      HA cost: 168.34
         resc: 168.34 resc_io: 164.00 resc_cpu: 27076246
         resp: 168.34 resp_io: 164.00 resp_cpu: 27076246
    Join order aborted: cost > best plan cost

    计算第二种join顺序的成本值,T1做驱动表,T做内部表,
    Join order aborted: cost > best plan cost
    第二种join顺序被放弃,因为成本大于已经第一种join顺序的最优成本

    ***********************
    (newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:1000
    *********************************
    Number of join permutations tried: 2
    *********************************
    (newjo-save)    [1 0 ]
    Final - First K Rows Plan:  Best join order: 1
      Cost: 4.0039  Degree: 1  Card: 1.0000  Bytes: 13
      Resc: 4.0039  Resc_io: 4.0000  Resc_cpu: 24284
      Resp: 4.0039  Resp_io: 4.0000  Resc_cpu: 24284
    kkoipt: Query block SEL$1 (#0)

    在FIRST_Rows_1模式下最终优化器选择了Best join order: 1,Cost: 4.0039,
    尝试了2种join 顺序(Number of join permutations tried: 2)
    实际上是3种,包括了一次在ALL_ROWS模式下的计算


    另外再看一下
    FIRST_Rows_10
    FIRST_Rows_100
    最终的执行计划选择和成本计算

    FIRST_Rows_10:

    Final - First K Rows Plan:  Best join order: 1
      Cost: 13.0163  Degree: 1  Card: 10.0000  Bytes: 130
      Resc: 13.0163  Resc_io: 13.0000  Resc_cpu: 101517
      Resp: 13.0163  Resp_io: 13.0000  Resc_cpu: 101517

    FIRST_Rows_100:

    Final - First K Rows Plan:  Best join order: 1
      Cost: 31.8883  Degree: 1  Card: 51982.0000  Bytes: 1143604
      Resc: 31.8883  Resc_io: 29.0000  Resc_cpu: 18019724
      Resp: 31.8883  Resp_io: 29.0000  Resc_cpu: 18019724

    值得注意,FIRST_Rows_100选择了hash


    再看一下执行计划

    ALL_ROWS:
    -------------------------------------------------------------------------------
    | Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT      |                  | 51982 |   659K|   195   (4)|
    |*  1 |  HASH JOIN            |                  | 51982 |   659K|   195   (4)|
    |   2 |   INDEX FAST FULL SCAN| IND_T1_OBJECT_ID | 51986 |   203K|    28   (4)|
    |   3 |   TABLE ACCESS FULL   | T                | 51986 |   456K|   165   (2)|
    -------------------------------------------------------------------------------

    FIRST_Rows_1:
    ----------------------------------------------------------------------------
    | Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)|
    ----------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |                  |     1 |    13 |     4   (0)|
    |   1 |  NESTED LOOPS      |                  |     1 |    13 |     4   (0)|
    |   2 |   TABLE ACCESS FULL| T                | 25996 |   228K|     2   (0)|
    |*  3 |   INDEX RANGE SCAN | IND_T1_OBJECT_ID |     1 |     4 |     1   (0)|
    ----------------------------------------------------------------------------

    FIRST_Rows_10:
    ----------------------------------------------------------------------------
    | Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)|
    ----------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |                  |    10 |   130 |    13   (0)|
    |   1 |  NESTED LOOPS      |                  |    10 |   130 |    13   (0)|
    |   2 |   TABLE ACCESS FULL| T                | 47264 |   415K|     2   (0)|
    |*  3 |   INDEX RANGE SCAN | IND_T1_OBJECT_ID |     1 |     4 |     1   (0)|
    ----------------------------------------------------------------------------

    FIRST_Rows_100:
    -------------------------------------------------------------------------------
    | Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT      |                  | 51982 |  1116K|    32  (10)|
    |*  1 |  HASH JOIN            |                  | 51982 |  1116K|    32  (10)|
    |   2 |   INDEX FAST FULL SCAN| IND_T1_OBJECT_ID | 51986 |   203K|    28   (4)|
    |   3 |   TABLE ACCESS FULL   | T                | 51986 |   456K|     2   (0)|
    -------------------------------------------------------------------------------

    总结来说,first_rows_n基于成本计算,根据优先返回行数N重新计算各个对象的访问成本,
    从而生成最快返回前N条记录的执行计划.

    June 23

    and_equal,index_join,index_combine

    and_equal,index_join,index_combine这三种都是oracle利用索引关联获得数据的方法,三者的目的都是为了最大限度的利用索引,减少回表的代价.但是三者的实现方法是有区别的,下面一一来分析.
    and_equal:
    这种方式需要查询条件里面包括所有索引列,然后取得每个索引中得到的rowid列表,然后对这些列表做merge join,过滤出相同的rowid后再去表中获取数据或者直接从索引中获得数据.and_equal有一些限制,比如它只对单列索引有效,只对非唯一索引有效,使用到的索引不能超过5个,查询条件只能是"=".在10g中,and_equal已经被废弃了,只能通过hint才能生效.
    create table test as select * from dba_objects;
    create index ind_test_owner on test(owner);
    create index ind_test_object_name on  test(object_name);

    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    -------------------------------------------------------------------------------
    | Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)|
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |                      |     1 |    29 |     2   (0)|
    |*  1 |  AND-EQUAL        |                      |       |       |            |
    |*  2 |   INDEX RANGE SCAN| IND_TEST_OWNER       |     1 |       |     1   (0)|
    |*  3 |   INDEX RANGE SCAN| IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
    -------------------------------------------------------------------------------
    如果查询条件只包含owner

    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------
    | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                |     1 |    29 |     2   (0)|
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    29 |     2   (0)|
    |*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
    -----------------------------------------------------------------------------------
    修改owner,object_name为非空
    alter table test modify(owner not null);
    alter table test modify(object_name not null);

    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------
    | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                |     1 |    29 |     2   (0)|
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    29 |     2   (0)|
    |*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
    -----------------------------------------------------------------------------------
    效果一样
    查询条件是">"的情况
    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner>'test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
    |*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
    |*  2 |   INDEX RANGE SCAN          | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
    -----------------------------------------------------------------------------------------
    查询条件是in的情况
    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner in('test','dba') and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
    |*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
    |*  2 |   INDEX RANGE SCAN          | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
    -----------------------------------------------------------------------------------------

    再来看一下回表的情况
    SQL 10G>select/*+ and_equal(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                      |     1 |    40 |     2   (0)|
    |*  1 |  TABLE ACCESS BY INDEX ROWID| TEST                 |     1 |    40 |     2   (0)|
    |   2 |   AND-EQUAL                 |                      |       |       |            |
    |*  3 |    INDEX RANGE SCAN         | IND_TEST_OWNER       |     1 |       |     1   (0)|
    |*  4 |    INDEX RANGE SCAN         | IND_TEST_OBJECT_NAME |     2 |       |     1   (0)|
    -----------------------------------------------------------------------------------------
    先通过and_equal取得rowid列表,然后从表中返回数据.
     
    index_join:
    index join顾名思义是对index进行关联,oracle通过hash index join的方式实现了避免对表的访问.所有的数据都从索引中直接获得.它不受查询条件影响,可以是唯一索引,也可以是多列索引.
    SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    --------------------------------------------------------------------------------
    | Id  | Operation          | Name                 | Rows  | Bytes | Cost (%CPU)|
    --------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |                      |     1 |    29 |     3  (34)|
    |*  1 |  VIEW              | index$_join$_001     |     1 |    29 |     3  (34)|
    |*  2 |   HASH JOIN        |                      |       |       |            |
    |*  3 |    INDEX RANGE SCAN| IND_TEST_OWNER       |     1 |    29 |     1   (0)|
    |*  4 |    INDEX RANGE SCAN| IND_TEST_OBJECT_NAME |     1 |    29 |     1   (0)|
    --------------------------------------------------------------------------------
    可以不带查询条件,只不过由index range scan变成了index fast full scan
    SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
      2  ;
    Execution Plan
    ----------------------------------------------------------
    ------------------------------------------------------------------------------------
    | Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)|
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT       |                      | 51984 |  1472K|   478   (2)|
    |   1 |  VIEW                  | index$_join$_001     | 51984 |  1472K|   478   (2)|
    |*  2 |   HASH JOIN            |                      |       |       |            |
    |   3 |    INDEX FAST FULL SCAN| IND_TEST_OWNER       | 51984 |  1472K|   153   (2)|
    |   4 |    INDEX FAST FULL SCAN| IND_TEST_OBJECT_NAME | 51984 |  1472K|   322   (1)|
    ------------------------------------------------------------------------------------

    如果不是所有数据都能从索引获得,那么将不会使用index join
    SQL 10G>select/*+ index_join(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------
    | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                |     1 |    40 |     2   (0)|
    |*  1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     1 |    40 |     2   (0)|
    |*  2 |   INDEX RANGE SCAN          | IND_TEST_OWNER |     1 |       |     1   (0)|
    -----------------------------------------------------------------------------------

    index_combine:
    index combine最早是用在bitmap index上的,在9i开始oracle默认可以使用在btree索引上,这是由_b_tree_bitmap_plans参数来控制的.oracle将btree索引中获得的rowid信息通过BITMAP CONVERSION FROM ROWIDS的步骤转换成bitmap进行匹配,然后匹配完成后通过BITMAP CONVERSION TO ROWIDS再转换出rowid获得数据或者回表获得数据.
     
    SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    ---------------------------------------------------------------------------------------------
    | Id  | Operation                       | Name                 | Rows  | Bytes | Cost (%CPU)|
    ---------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                |                      |     1 |    29 |     2   (0)|
    |   1 |  BITMAP CONVERSION TO ROWIDS    |                      |     1 |    29 |     2   (0)|
    |   2 |   BITMAP AND                    |                      |       |       |            |
    |   3 |    BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
    |*  4 |     INDEX RANGE SCAN            | IND_TEST_OWNER       |       |       |     1   (0)|
    |   5 |    BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
    |*  6 |     INDEX RANGE SCAN            | IND_TEST_OBJECT_NAME |       |       |     1   (0)|
    ---------------------------------------------------------------------------------------------
    回表取数据的情况
    SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name,object_type from test where owner='test' and object_name='test';
    Execution Plan
    ----------------------------------------------------------
    ----------------------------------------------------------------------------------------------
    | Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)|
    ----------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                 |                      |     1 |    40 |     2   (0)|
    |   1 |  TABLE ACCESS BY INDEX ROWID     | TEST                 |     1 |    40 |     2   (0)|
    |   2 |   BITMAP CONVERSION TO ROWIDS    |                      |       |       |            |
    |   3 |    BITMAP AND                    |                      |       |       |            |
    |   4 |     BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
    |*  5 |      INDEX RANGE SCAN            | IND_TEST_OWNER       |       |       |     1   (0)|
    |   6 |     BITMAP CONVERSION FROM ROWIDS|                      |       |       |            |
    |*  7 |      INDEX RANGE SCAN            | IND_TEST_OBJECT_NAME |       |       |     1   (0)|
    ----------------------------------------------------------------------------------------------
    不带查询条件的情况,index combine将不被使用
    SQL 10G>select/*+ index_combine(test ind_test_owner ind_test_object_name)*/ owner,object_name from test
      2  ;
    Execution Plan
    ----------------------------------------------------------
    -----------------------------------------------------------------------------------
    | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)|
    -----------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                | 51984 |  1472K|  1480   (1)|
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEST           | 51984 |  1472K|  1480   (1)|
    |   2 |   INDEX FULL SCAN           | IND_TEST_OWNER | 51984 |       |   123   (2)|
    -----------------------------------------------------------------------------------
    index_combine会是and_equal的很好的替代者,随着and_equal的退出,index_combine将更多得被我们看到.
    April 06

    转载小惠惠的文章

    原文链接请看http://spaces.msn.com/roujiaweize/ 

    替小惠惠的blog做一下广告哈

    oracle cursor 游标(二)

     

    关于 ORA-01000: maximum open cursors exceeded 这个问题的相关东西,我搜了一些文章贴一下。
     
    ORA-01000 maximum open cursors exceeded

    Cause: A host language program attempted to open too many cursors. The initialization parameter OPEN_CURSORS determines the maximum number of cursors per user.

    Action: Modify the program to use fewer cursors. If this error occurs often, shut down Oracle, increase the value of OPEN_CURSORS, and then restart Oracle.

     
     
    关于cursor 的参数有这么几个:
    SQL> show parameter cursor
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    cursor_sharing                       string      EXACT
    cursor_space_for_time                boolean     FALSE
    open_cursors                         integer     800
    session_cached_cursors               integer     0
     
    这里,cursor_sharing跟open_cursors的数量没有关系。open_cursors的数量,它包括了oracle服务器端session_cached_cursors的数量,以及应用服务器端cursor cache size的数量。
     
    关于session_cached_cursors这个参数,可以看 汪海 写的2篇文章:
     
     
    这个问题的根源,除了上面3个参数,主要在于程序的问题,在itpub 上我们可以看到:
     

    很多朋友在Java开发中,使用Oracle数据库的时候,经常会碰到有ORA-01000: maximum open cursors exceeded.的错误。

    实际上,这个错误的原因,主要还是代码问题引起的。
    ora-01000: maximum open cursors exceeded.
    表示已经达到一个进程打开的最大游标数。

    这样的错误很容易出现在Java代码中的主要原因是:Java代码在执行conn.createStatement()和conn.prepareStatement()的时候,实际上都是相当与在数据库中打开了一个cursor。尤其是,如果你的createStatement和prepareStatement是在一个循环里面的话,就会非常容易出现这个问题。因为游标一直在不停的打开,而且没有关闭。

    一般来说,我们在写Java代码的时候,createStatement和prepareStatement都应该要放在循环外面,而且使用了这些Statment后,及时关闭。最好是在执行了一次executeQuery、executeUpdate等之后,如果不需要使用结果集(ResultSet)的数据,就马上将Statment关闭。

    对于出现ORA-01000错误这种情况,单纯的加大open_cursors并不是好办法,那只是治标不治本。实际上,代码中的隐患并没有解除。
    而且,绝大部分情况下,open_cursors只需要设置一个比较小的值,就足够使用了,除非有非常特别的要求。

    March 22

    pctversion in lob segment

    在oracle的官方文档上有这么一段话来解释pctversion
     
    PCTVERSION integer

    Specify the maximum percentage of overall LOB storage space used for maintaining old versions of the LOB. The default value is 10, meaning that older versions of the LOB data are not overwritten until they consume 10% of the overall LOB storage space.

    You can specify the PCTVERSION parameter whether the database is running in manual or automatic undo mode. PCTVERSION is the default in manual undo mode. RETENTION is the default in automatic undo mode.

     
     
    由于lob segment采用了自己特殊的一致性读的实现,不是使用undo tablespace来保留前映象,而是当发生update时在lob segment内分配一个chunk去插入一条新的记录。这样的话,如果一条记录修改了多次,那么它就存在多个版本,对于很大的lob对象来说,这是十分浪费空间的。所以oracle需要有一个办法来控制这个保留前映象的空间的大小,pctversion就是为了实现这个功能的。pctversion是一个百分比的值,就是当前所有lob空间中用来存放前映象的百分比,如果前映象的空间大于pctversion了,那么oracle将会重用这些前映象空间而不去扩展。
     
    对比一下不同pctversion对空间占用的影响
     
    SQL 10G>select TABLE_NAME,SEGMENT_NAME,CHUNK,PCTVERSION,RETENTION from user_lobs where segment_name='TEXT_LOB';
    TABLE_NAME                     SEGMENT_NAME                        CHUNK PCTVERSION  RETENTION
    ------------------------------ ------------------------------ ---------- ---------- ----------
    TESTLOB                        TEXT_LOB                             8192          1
     
    SQL 10G>truncate table testlob;
    Table truncated.
     
    SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
    SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
    --------------------------------------------------------------------------------- -----------------------------------------
    TEXT_LOB                                                                          .0625M
     

    SQL 10G>insert into testlob values(1,rpad('a',4000));
    1 row created.
     
    SQL 10G>commit;
    Commit complete.
     
    SQL 10G>select dbms_lob.getlength(text) from testlob;
    DBMS_LOB.GETLENGTH(TEXT)
    ------------------------
                        4000
     
    SQL 10G>begin        
      2  for i in 1..10 loop
      3  update testlob set text=text||text;
      4  commit;
      5  end loop;
      6  end;
      7  /
    PL/SQL procedure successfully completed.
     
    SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
    SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
    --------------------------------------------------------------------------------- -----------------------------------------
    TEXT_LOB                                                                          13M
     
    SQL 10G>truncate table testlob;
    Table truncated.
     
    SQL 10G>alter table testlob modify lob(text) (pctversion 99);
    Table altered.
     
    SQL 10G>insert into testlob values(1,rpad('a',4000));
    1 row created.
     
    SQL 10G>commit;
    Commit complete.
     
    SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
    SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
    --------------------------------------------------------------------------------- -----------------------------------------
    TEXT_LOB                                                                          .0625M
     
    SQL 10G>begin
      2  for i in 1..10 loop
      3  update testlob set text=text||text;
      4  commit;
      5  end loop;
      6  end;
      7  /
    PL/SQL procedure successfully completed.
     
    SQL 10G>select segment_name,bytes/1024/1024||'M' from user_segments where segment_name='TEXT_LOB';
    SEGMENT_NAME                                                                      BYTES/1024/1024||'M'
    --------------------------------------------------------------------------------- -----------------------------------------
    TEXT_LOB                                                                          17M
     
    SQL 10G>select dbms_lob.getlength(text) from testlob;
    DBMS_LOB.GETLENGTH(TEXT)
    ------------------------
                     4096000
     
     
     
    可以看到pctversion的设置对lob空间大小还是有影响的,在很大的lob segment中这种影响会表现的十分明显,有兴趣的人可以对大lob段做测试。
    March 16

    parallel query in partition table

    上一篇文章里面提到parallel query相关的checkpoint,也了解了direct read会对系统性能带来的潜在影响,这次让我们看一下parallel query在partition table上的表现。
     
     
    创建2种分区表
     
    CREATE TABLE TESTHASH(A NUMBER)
    PARTITION BY HASH (A)
    (PARTITION P1,
    PARTITION P2,
    PARTITION P3,
    PARTITION P4)
     
     
    CREATE TABLE TESTLIST(A NUMBER)
    PARTITION BY LIST (A)
    (PARTITION P1 VALUES(1),
    PARTITION P2 VALUES(2),
    PARTITION P3 VALUES(3),
    PARTITION P4 VALUES(4))
     
    SQL 10G>select distinct a from testhash partition (p1);
             6
    SQL 10G>select distinct a from testhash partition (p2);
             9
    SQL 10G>select distinct a from testhash partition (p3);
             2
    SQL 10G>select distinct a from testhash partition (p4);
             1
     
    SQL 10G>select count(*) from testhash partition (p1);
       1048623
    SQL 10G>select count(*) from testhash partition (p2);
       1048623
    SQL 10G>select count(*) from testhash partition (p3);
       1048623
    SQL 10G>select count(*) from testhash partition (p4);
       1048623
     
     
     
    SQL 10G>select distinct a from testlist partition (p1);
             1
    SQL 10G>select distinct a from testlist partition (p2);
             2
    SQL 10G>select distinct a from testlist partition (p3);
             3
    SQL 10G>select distinct a from testlist partition (p4);
             4
     
    SQL 10G>select count(*) from testlist partition (p1);
       1048623
    SQL 10G>select count(*) from testlist partition (p2);
       1048623
    SQL 10G>select count(*) from testlist partition (p3);
       1048623
    SQL 10G>select count(*) from testlist partition (p4);
       1048623
     
    两张分区表都是4个分区,每个分区包含1048623条记录
     
     
    另外还有几个考查direct read与否的指标
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ----------------------------------------------------------------          ----------
    consistent gets from cache                                           137327
    consistent gets direct                                                  574
    physical reads direct                                                     574
    table scans (direct read)                                                 0
     
    我们运行几个sql来看parallel query在分区表上的表现
     
     
    SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist;
      COUNT(*)
    ----------
       4194492
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           137449
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)
     
     
    SQL 10G>select 137449-137327 "consistent gets from cache",6934-574 "consistent gets direct",6934-574 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                           122                   6360                  6360
     
    可以看到当parallel query包含所有分区的时候将会去做direct read
     
    看看一个分区的情况
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           137449
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)                                                55
     
    SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a=1;
      COUNT(*)
    ----------
       1048623
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           139307
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)                                                55
     
    SQL 10G>select 139307-137449 "consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                          1858                      0                     0
     
     
    当查询一个分区时可以看到并没有采用direct read,而是采用普通的consistent get
     
     
    两个分区的情况
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           139307
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)                                                55
     
    SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2);
     
      COUNT(*)
    ----------
       2097246
     
     
    SQL 10G>SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           142788
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)                                                55
     
    SQL 10G>select 142788-139307"consistent gets from cache",6934-6934 "consistent gets direct",6934-6934 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                          3481                      0                     0
     
    同样没有发生direct read
     
     
    三个分区呢?
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           142788
    consistent gets direct                                                 6934
    physical reads direct                                                  6934
    table scans (direct read)                                                55
     
    SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(1,2,3);
      COUNT(*)
    ----------
       3145869
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           142907
    consistent gets direct                                                11704
    physical reads direct                                                 11704
    table scans (direct read)                                               110
     
     
    SQL 10G>select 142907-142788 "consistent gets from cache",11704-6934 "consistent gets direct",11704-6934 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                           119                   4770                  4770
     
    可以看到发生了3个分区的direct read
     
    如果我们查询不存在的分区值呢
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           142907
    consistent gets direct                                                11704
    physical reads direct                                                 11704
    table scans (direct read)                                               110
     
    SQL 10G>select /*+ parallel(testlist,4)*/count(*) from testlist where a in(5,6,7);
      COUNT(*)
    ----------
             0
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(56,248,53,51) and a.statistic#=b.statistic#;
    NAME                                                                  VALUE
    ---------------------------------------------------------------- ----------
    consistent gets from cache                                           142907
    consistent gets direct                                                11704
    physical reads direct                                                 11704
    table scans (direct read)                                               110
    SQL 10G>select 142907-142907 "consistent gets from cache",11704-11704"consistent gets direct",11704-11704"physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                             0                      0                     0
     
    很好,没有读testlist表的任何block
     
    再来看一下hash partition table的表现
     
    select /*+ parallel(testhash,4)*/count(*) from testhash;
     
    SQL 10G>select 149410-149287 "consistent gets from cache",18064-11704 "consistent gets direct",18064-11704 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                           123                   6360                  6360
                          
    select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1);
     
    SQL 10G>select 151269-149410 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                          1859                      0                     0
                         
    select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2);
     
    SQL 10G>select 154751-151269 "consistent gets from cache",18064-18064 "consistent gets direct",18064-18064 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                          3482                      0                     0
     
    select /*+ parallel(testhash,4)*/count(*) from testhash where a in(1,2,6);
     
    SQL 10G>select 154871-154751 "consistent gets from cache",22834-18064 "consistent gets direct",22834-18064 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                           120                   4770                  4770
                          
    select /*+ parallel(testhash,4)*/count(*) from testhash where a in(3);  
                       
    SQL 10G>select 160572-158713 "consistent gets from cache",37144-37144 "consistent gets direct",37144-37144 "physical reads direct" from dual;
     
    consistent gets from cache consistent gets direct physical reads direct
    -------------------------- ---------------------- ---------------------
                          1859                      0                     0
     
    可以看到hash partition table在其他情况都和list partition table一样,但是在查询不存在的分区值时还是会去读取testhash表的block,当然这是由于hash partition的机制决定的。
     
    总结来说,parallel query在partition table上并不是时时在做direct read,它是和查询包含的partition number有关的,至于如何利用好parallel query的优势那又是另外的话题了,大家看完这2篇文章后可以自己去找寻正确的结论。
     
    March 15

    DBWR parallel query checkpoint buffers written

    DBWR parallel query checkpoint buffers written是oracle的一种特殊的checkpoint的产物,这种checkpoint在oracle8 以前称作Extent-based Checkpoint,当发生parallel query时oracle必须做checkpoint把脏数据写入磁盘,因为parallel query走的时direct read,直接从文件读入pga,如果有脏数据没有被写回磁盘,那么读出来的结果将会不一致。在oracle8以前,如果发生direct read,那么当每一个extent被读入的时候oracle会先去检查data buffer中有没和这个extent相关的dirty block,如果有就会驱动dbwr来写出,同时增加一次checkpoint的次数。这样的话如果这个表有N个extent都存在dirty block,那么将会发生N次checkpoint,这样的话将会使parallel query的执行时间大大超过normal query。在oracle8中这种情况有了改进,Extent-based Checkpoint变成了object-based checkpoint,不再会对单独的extent做checkpoint。
     
    让我们再回到主题,DBWR parallel query checkpoint buffers written
     
    首先来看一下v$statname
     
    SQL 10G>select statistic#,name from v$statname where name like '%DBWR%';
    STATISTIC# NAME
    ---------- ----------------------------------------------------------------
            69 DBWR checkpoint buffers written
            70 DBWR thread checkpoint buffers written
            71 DBWR tablespace checkpoint buffers written
            72 DBWR parallel query checkpoint buffers written
            73 DBWR object drop buffers written
            74 DBWR transaction table writes
            75 DBWR undo block writes
            76 DBWR revisited being-written buffer
            77 DBWR make free requests
            78 DBWR lru scans
            79 DBWR checkpoints
    STATISTIC# NAME
    ---------- ----------------------------------------------------------------
            80 DBWR fusion writes
     
    再来创建一张测试表
     
    create table test(a number)
     
     
    SQL 10G>select count(*) from test;
      COUNT(*)
    ----------
       4194496
     
    SQL 10G>select distinct a from test;
             A
    ----------
             1
     
    这个测试表包含了4194496条值为1的记录
     
    先刷新data buffer
     
    SQL 10G>ALTER SESSION SET EVENTS 'immediate trace name flush_cache';
    Session altered.
     
    我们来更新其中一条记录
     
    SQL 10G>update test set a=2 where rownum<2;
    1 row updated.
     
    SQL 10G>commit;
    Commit complete.
     
    执行parallel query
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
    NAME                                                                         VALUE
    ----------------------------------------------------------------                ----------
    DBWR parallel query checkpoint buffers written                 28615
    DBWR checkpoints                                                        552
     
    SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
             A
    ----------
             2
             1

    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
    NAME                                                                        VALUE
    ----------------------------------------------------------------                ----------
    DBWR parallel query checkpoint buffers written                 28616
    DBWR checkpoints                                                        553

    checkpoint和buffer written都增加1
     
    再来更新分布在多个extent上的block来证实一下是不是现在的oracle版本用的是object-based checkpoint
     
    SQL 10G>update test set a=10 where mod(dbms_rowid.ROWID_BLOCK_NUMBER(rowid),1000)=0 and dbms_rowid.ROWID_ROW_NUMBER(rowid)=0;
    6 rows updated.
     
    SQL 10G>commit;
    Commit complete.
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
    NAME                                                                         VALUE
    ----------------------------------------------------------------                 ----------
    DBWR parallel query checkpoint buffers written                 28616
    DBWR checkpoints                                                        553
     
    SQL 10G>select/*+ parallel(test,4)*/distinct a from test;
             A
    ----------
            10
             2
             1
     
    SQL 10G>select b.name,a.value from v$sysstat a,v$statname b where a.statistic# in(72,79) and a.statistic#=b.statistic#;
    NAME                                                                         VALUE
    ----------------------------------------------------------------                 ----------
    DBWR parallel query checkpoint buffers written                 28622
    DBWR checkpoints                                                        554
     
     
    可以看到写出了6个块,但是只发生一次checkpoint,所以我们可以知道oracle已经采用了object-based checkpoint。
     
     
    总结来说,DBWR parallel query checkpoint buffers written就是当发生parallel query时导致checkpoint而写出的块数。
     
     
     
    February 14

    ora_rowscn

    10g里面有了一项新功能,我们可以查看某个表的某一行最后一次改动的scn,这个就是由ora_rowscn来提供。
     
    SQL 10G>create table test (a number);
    Table created.
     
    SQL 10G>select ora_rowscn from test;
    no rows selected
     
    SQL 10G>insert into test values(1);
    1 row created.
     
    SQL 10G>column ora_rowscn format 999999999999999999999
     
    insert 完成后我们可以查询到ora_rowscn

    SQL 10G>select ora_rowscn from test;
                ORA_ROWSCN
    ----------------------
               21749185488
     
    SQL 10G>commit;
    Commit complete.
     
    当事务递交,ora_rowscn发生变化
     
    SQL 10G>select ora_rowscn from test;
                ORA_ROWSCN
    ----------------------
               21749185507
     
    SQL 10G>insert into test values(2);
    1 row created.
     
    再插入一条记录,查看ora_rowscn,发现两条记录的ora_rowscn相同
     
    SQL 10G>select ora_rowscn from test;
                ORA_ROWSCN
    ----------------------
               21749185507
               21749185507
     
    SQL 10G>commit;
    Commit complete.
     
    事务递交后再次发现ora_rowscn发生变化
     
    SQL 10G>select ora_rowscn from test;
                ORA_ROWSCN
    ----------------------
               21749186216
               21749186216
     
    SQL 10G>select dbms_rowid.ROWID_RELATIVE_FNO(rowid),dbms_rowid.ROWID_BLOCK_NUMBER(rowid) from test;

    DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
    ------------------------------------ ------------------------------------
                                       1                                69650
                                       1                                69650
     
    SQL 10G>alter system dump datafile 1 block 69650;
    System altered.
     
    SQL 10G>exit
     
     
    Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
    buffer tsn: 0 rdba: 0x00411012 (1/69650)
    scn: 0x0005.105a3ea8 seq: 0x02 flg: 0x02 tail: 0x3ea80602
    frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
    Hex dump of block: st=0, typ_found=1
    Block header dump:  0x00411012
     Object id on Block? Y
     seg/obj: 0xff72  csc: 0x05.105a3bd0  itc: 2  flg: O  typ: 1 - DATA
         fsl: 0  fnx: 0x0 ver: 0x01
     
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x000d.026.000004da  0x01c01470.0184.2c  --U-    1  fsc 0x0000.105a3be3
    0x02   0x000e.008.000004e7  0x01c018a1.017d.17  --U-    1  fsc 0x0000.105a3ea8
     
    data_block_dump,data header at 0xb72f845c
    ===============
    tsiz: 0x1fa0
    hsiz: 0x16
    pbl: 0xb72f845c
    bdba: 0x00411012
         76543210
    flag=--------
    ntab=1
    nrow=2
    frre=-1
    fsbo=0x16
    fseo=0x1f94
    avsp=0x1f78
    tosp=0x1f78
    0xe:pti[0]      nrow=2  offs=0
    0x12:pri[0]     offs=0x1f9a
    0x14:pri[1]     offs=0x1f94
    block_row_dump:
    tab 0, row 0, @0x1f9a
    tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
    col  0: [ 2]  c1 02
    tab 0, row 1, @0x1f94
    tl: 6 fb: --H-FL-- lb: 0x2  cc: 1
    col  0: [ 2]  c1 03
    end_of_block_dump
     
     
     
    SQL 10G>select to_number('5105a3ea8','xxxxxxxxxxxxxxxxxx') ora_rowscn from dual;
            ORA_ROWSCN
    ------------------
           21749186216
    从上面的实验中可以发现,当表以常规方式创建的时候,ora_rowscn取自data block header的scn,而每行数据并没保存自己的rowscn,下面来看一下以rowdependencies创建的表是什么情况。
     
     
    SQL 10G>create table test (a number) rowdependencies;
    Table created.
     
    SQL 10G>select ora_rowscn from test;
    no rows selected
     
    SQL 10G>insert into test values(1);
    1 row created.
     
    insert后查询ora_rowscn ,发现ora_rowscn为空
     
    SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
    ------------------

    SQL 10G>commit;
    Commit complete.
     
    事务递交后,可以查询到ora_rowscn
     
     
    SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
    ------------------
           21749198001
     
    SQL 10G>insert into test values(2);
    1 row created.
     
    插入第2条数据,依然发现有一个ora_rowscn为空
     
    SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
    ------------------
           21749198001

    SQL 10G>commit;
    Commit complete.
     
    递交后查询ora_rowscn,发现2条记录的ora_rowscn并不一样,这和上面的情况不一样,我们知道这是因为ora_rowscn被保存在行内的缘故
     
    SQL 10G>select ora_rowscn from test;
            ORA_ROWSCN
    ------------------
           21749198001
           21749198017
     
     
     
    SQL 10G>alter system dump datafile 1 block 69650;
    System altered.
     
    SQL 10G>exit
     
    Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
    buffer tsn: 0 rdba: 0x00411012 (1/69650)
    scn: 0x0005.105a6cc1 seq: 0x02 flg: 0x02 tail: 0x6cc10602
    frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
    Hex dump of block: st=0, typ_found=1
    seg/obj: 0xff73  csc: 0x05.105a687c  itc: 2  flg: O  typ: 1 - DATA
         fsl: 0  fnx: 0x0 ver: 0x01
     
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x0010.005.000004e5  0x01c02e78.01c5.1e  --U-    1  fsc 0x0000.105a6cb1
    0x02   0x000c.02c.000004d4  0x01c00feb.01a5.24  --U-    1  fsc 0x0000.105a6cc1
     
    data_block_dump,data header at 0xc70e45c
    ===============
    tsiz: 0x1fa0
    hsiz: 0x16
    pbl: 0x0c70e45c
    bdba: 0x00411012
         76543210
    flag=--R-----
    ntab=1
    nrow=2
    frre=-1
    fsbo=0x16
    fseo=0x1f88
    avsp=0x1f6c
    tosp=0x1f6c
    0xe:pti[0]      nrow=2  offs=0
    0x12:pri[0]     offs=0x1f94
    0x14:pri[1]     offs=0x1f88
    block_row_dump:
    tab 0, row 0, @0x1f94
    tl: 12 fb: --H-FL-- lb: 0x1  cc: 1
    dscn 0x0000.00000000
    col  0: [ 2]  c1 02
    tab 0, row 1, @0x1f88
    tl: 12 fb: --H-FL-- lb: 0x2  cc: 1
    dscn 0x0000.00000000
    col  0: [ 2]  c1 03
    end_of_block_dump 
     
     
    dump block后发现,现在dscn还是0,我们查询出来的ora_rowscn实际上是从Scn/Fsc
    中获得的,真正ora_rowscn被保存在行内是在itl发生cleanout时会把Scn/Fsc刷到dscn
     
    SQL 10G>update test set a=1 where a=1;
    1 row updated.
     
    SQL 10G>commit;
    Commit complete.
     
     
    SQL 10G>update test set a=1 where a=1;
    1 row updated.
     
    SQL 10G>commit;
    Commit complete.
     
    SQL 10G>alter system dump datafile 1 block 69650;
    System altered.
     
    SQL 10G>exit
     
    Start dump data blocks tsn: 0 file#: 1 minblk 69650 maxblk 69650
    buffer tsn: 0 rdba: 0x00411012 (1/69650)
    scn: 0x0005.105a6dff seq: 0x02 flg: 0x02 tail: 0x6dff0602
    frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
    Hex dump of block: st=0, typ_found=1
     Object id on Block? Y
     seg/obj: 0xff73  csc: 0x05.105a6dfd  itc: 2  flg: O  typ: 1 - DATA
         fsl: 0  fnx: 0x0 ver: 0x01
     
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x0012.024.000004db  0x01c024fd.016a.08  C---    0  scn 0x0005.105a6df8
    0x02   0x000d.00f.000004dc  0x01c031b3.0194.28  --U-    1  fsc 0x0000.105a6dff
     
    data_block_dump,data header at 0xc70e45c
    ===============
    tsiz: 0x1fa0
    hsiz: 0x16
    pbl: 0x0c70e45c
    bdba: 0x00411012
         76543210
    flag=--R-----
    ntab=1
    nrow=2
    frre=-1
    fsbo=0x16
    fseo=0x1f88
    avsp=0x1f6c
    tosp=0x1f6c
    0xe:pti[0]      nrow=2  offs=0
    0x12:pri[0]     offs=0x1f94
    0x14:pri[1]     offs=0x1f88
    block_row_dump:
    tab 0, row 0, @0x1f94
    tl: 12 fb: --H-FL-- lb: 0x2  cc: 1
    dscn 0x0005.105a6df8
    col  0: [ 2]  c1 02
    tab 0, row 1, @0x1f88
    tl: 12 fb: --H-FL-- lb: 0x0  cc: 1
    dscn 0x0005.105a6cc1
    col  0: [ 2]  c1 03
    end_of_block_dump
     
     
    另外,oracle还提供了ora_rowscn到timestamp的转换
     
     
    SQL 10G>select scn_to_timestamp(ora_rowscn) from test;
    SCN_TO_TIMESTAMP(ORA_ROWSCN)
    ---------------------------------------------------------------------------
    14-FEB-06 03.55.05.000000000 PM
    14-FEB-06 03.44.05.000000000 PM
     
    更多ora_rowscn的用途请各位自己去摸索吧。
    January 19

    oracle rowid

    搞oracle的人都很清楚rowid这个东西,rowid就是唯一标志记录物理位置的一个id,在oracle 8版本以前,rowid由file#+block#+row#组成,占用6个bytes的空间,10 bit 的 file# ,22bit 的 block# ,16 bit 的 row#。
     
    从oracle 8开始rowid变成了extend rowid,由data_object_id#+rfile#+block#+row#组成,占用10个bytes的空间, 32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.由于rowid的组成从file#变成了rfile#,所以数据文件数的限制也从整个库不能超过1023个变成了每个表空间不能超过1023个数据文件。
     
     
    说了rowid的组成,那么我们再来看看rowid在索引里面占用的字节数又是什么样子的。在oracle 8以前索引中存储的rowid占用字节数也是6bytes,在oracle8之后,虽然oracle使用了extend rowid,但是在普通索引里面依然存储了bytes的rowid,只有在global index中存储的是10bytes的extend rowid,而extend rowid也是global index出现的一个必要条件,下面我们会解释原因。
     
     
    为什么golbal index需要把data_object_id#也包含在index rowid entry中呢?如果不包含会这么样?首先我们需要知道index的rowid entry的存在是为了能根据它找到表的这条记录存在哪个具体的物理位置,我们需要知道它在哪个数据文件,在哪个block,在那一行,普通的索引oracle根据rfile#,block#,row#就可以知道了,但是partition table可以分布在多个表空间,也就是可以分布在多个数据文件,当我们建立local index时,index rowid entry并不包含data_object_id#,因为oracle可以知道这个index对应的是哪一个table分区,并可以得到table分区的ts#(tablespace号),那么oracle根据ts#和rfile#就可以找到具体的数据文件。但是如果换成是golbal index,如果不包含data_object_id#,那么我们并不能知道这个索引对应着哪个表分区,也自然不能知道它的rfile#和file#的转换关系,所以它将找不到所对应的记录。包含data_object_id#后,oracle可以根据data_object_id#实现rfile#和file#的转换然后找到记录对应的物理位置。需要注意的是要理解以上概念我们还是需要了解file#和rfile#的区别。
     
    关于file#和rfile#的区别可以参考biti_rainy的一篇blog
     
    继续上面的话题,我们猜想oracle实现rfile#和file#的转换是不是由一些递归sql来实现,所以我们做了一个测试来看是否oracle从file$等基表中实现呢?我们做了一个10046 trace,发现并不存在相应的递归sql,那oracle怎么实现呢?可能是直接取一些x$table中的数据了吧,那哪个x$table保留这些信息呢?
     
    x$kccfe!!!
     
     
    SQL 10G>desc x$kccfe
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     ADDR                                               RAW(4)
     INDX                                               NUMBER
     INST_ID                                            NUMBER
     FENUM                                              NUMBER
     FECSZ                                              NUMBER
     FEBSZ                                              NUMBER
     FESTA                                              NUMBER
     FECRC_SCN                                          VARCHAR2(16)
     FECRC_TIM                                          VARCHAR2(20)
     FECRC_THR                                          NUMBER
     FECRC_RBA_SEQ                                      NUMBER
     FECRC_RBA_BNO                                      NUMBER
     FECRC_RBA_BOF                                      NUMBER
     FECRC_ETB                                          RAW(132)
     FECPS                                              VARCHAR2(16)
     FECPT                                              VARCHAR2(20)
     FECPC                                              NUMBER
     FESTS                                              VARCHAR2(16)
     FESTT                                              VARCHAR2(20)
     FEBSC                                              VARCHAR2(16)
     FEFNH                                              NUMBER
     FEFNT                                              NUMBER
     FEDUP                                              NUMBER
     FEURS                                              VARCHAR2(16)
     FEURT                                              VARCHAR2(20)
     FEOFS                                              VARCHAR2(16)
     FEONC_SCN                                          VARCHAR2(16)
     FEONC_TIM                                          VARCHAR2(20)
     FEONC_THR                                          NUMBER
     FEONC_RBA_SEQ                                      NUMBER
     FEONC_RBA_BNO                                      NUMBER
     FEONC_RBA_BOF                                      NUMBER
     FEONC_ETB                                          RAW(132)
     FEPOR                                              NUMBER
     FETSN                                              NUMBER
     FETSI                                              NUMBER
     FERFN                                              NUMBER
     FEPFT                                              NUMBER
     FEDOR                                              NUMBER
     FEPDI                                              NUMBER
     FEFDB                                              NUMBER
     FEPLG_SCN                                          VARCHAR2(16)
     FEPAX                                              NUMBER
     FEFLG                                              NUMBER
     
    从这个x$table中oracle可以实现file和rfile的转换。
     
     
    最后我们来看一个例子
     
    SQL 10G>desc test
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     A                                                  NUMBER
     B                                                  VARCHAR2(32)
     
    test是一张分区表
     
    SQL 10G>select table_name,partition_name,tablespace_name from dba_tab_partitions where table_name='TEST';
    TABLE_NAME                     PARTITION_NAME                 TABLESPACE_NAME
    ------------------------------ ------------------------------ ------------------------------
    TEST                           P1                             TESTROWID
    TEST                           P2                             TESTROWID
     
     
    这条记录所在的物理位置
     
    SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
    DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
    --------------- ---------- ---------- ----------
              63665         28         15          0
     
     
     
     
    我们对它创建全局索引
     
    SQL 10G>create index ind_test on test(a);  
    Index created.
     
    再创建本地索引
     
    SQL 10G>create index ind_test_local on test(b) local;  
    Index created.
     
     
    SQL 10G>select dump(rowid,16) rid from test;
    RID
    ----------------------------------------------------------------------------
    Typ=69 Len=10: 0,0,f8,b1,7,0,0,f,0,0
     
    去看看全局索引和本地索引中rowid entry的区别
     
    SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST';
    HEADER_BLOCK HEADER_FILE
    ------------ -----------
            1403           4
     
     
    SQL 10G>select header_block,header_file from dba_segments where segment_name='IND_TEST_LOCAL';
    HEADER_BLOCK HEADER_FILE
    ------------ -----------
              11          33
              11          34
     
     
    SQL 10G>ALTER SYSTEM DUMP DATAFILE 4 BLOCK 1404;
    System altered.
     
     
    SQL 10G>ALTER SYSTEM DUMP DATAFILE 33 BLOCK 12;
    System altered.
     
     
    全局索引ind_test的rowid entry

    col 1; len 10; (10):  00 00 f8 b1 07 00 00 0f 00 00
     
     
    本地索引ind_test_local的rowid entry
     
    col 1; len 6; (6):  07 00 00 0f 00 00
     
    可以看出本地索引存储了6bytes rowid,全局索引存储了10bytes rowid
     
    再来看一下00 00 f8 b1 07 00 00 0f 00 00
    转换成bit就是
    00000000 00000000 11111000 10110001 00000111 00000000 00000000 00001111 00000000 00000000
     
    32bit的 data_object_id#,10 bit 的 rfile#,22bit 的 block#,16 bit 的 row#.
     
    00000000 00000000 11111000 10110001  data_object_id#
    转换成10进制就是
     
    2^15+2^14+2^13+2^12+2^11+2^7+2^5+2^4+2^0=63665
     
    00000111 00 rfile#
     
    2^4+2^3+2^2=28 
    0000000000000000001111  block#
     
    2^3+2^2+2^1+2^0=15
     

    0000000000000000 rowi#
     
    0
     
    SQL 10G>select dbms_rowid.ROWID_OBJECT(rowid) data_object_id#,dbms_rowid.ROWID_RELATIVE_FNO(rowid) rfile#,dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from test ;
    DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
    --------------- ---------- ---------- ----------
              63665         28         15          0
     
    验证通过
     
     
    再来看一下如果file#超过1023后oracle会怎么处理
     
    SQL 10G>select file#,ts#,rfile# from v$datafile  where file#>1023;
         FILE#        TS#     RFILE#
    ---------- ---------- ----------
          1024         14          1
          1025         14          3
          1026         14          4
          1027         14          5
          1028         14          6
          1029         14          7
          1030         14          8
          1031         14          9
          1032         14         10
          1033         14         11
          1034         14         12
         FILE#        TS#     RFILE#
    ---------- ---------- ----------
          1035         14         13
          1036         14         14
          1037         15         14
    可以看到在一个tablespace里面rfile#从1开始到1023
     
     
    SQL 10G>select file#,rfile# from v$datafile  where ts#=14 order by file#;
         FILE#     RFILE#
    ---------- ----------
             2          2
            15         15
            16         16
            17         17
            18         18
            19         19
            20         20
            21         21
            22         22
            23         23
            24         24
           。。。。。。
        FILE#     RFILE#
    ---------- ----------
          1015       1015
          1016       1016
          1017       1017
          1018       1018
          1019       1019
          1020       1020
          1021       1021
          1022       1022
          1023       1023
          1024          1
          1025          3
         FILE#     RFILE#
    ---------- ----------
          1026          4
          1027          5
          1028          6
          1029          7
          1030          8
          1031          9
          1032         10
          1033         11
          1034         12
          1035         13
          1036         14
     
    January 16

    what is data_object_id

    前些天有人问data_object_id究竟是个什么东东,既然有了object_id那还要data_object_id干吗?
     
    其实object_id和data_object_id同样是表示数据库对象的一个唯一标志,但是object_id表示的是逻辑id,data_object_id表示的是物理id。如果一些object没有物理属性的话那它就不存在data_object_id,例如procedure,function,package,data type,db link,mv定义,view定义,临时表,分区表定义等等这些object都是没有对应着某个segment,因此它们的data_object_id都为空。
     
    当表刚创建的时候它的object_id和data_object_id都是相等的,但是如果表经过move或truncate后那么data_object_id将会有变化。
     

    SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
     OBJECT_ID DATA_OBJECT_ID
    ---------- --------------
         63053          63053
     
    刚开始创建表时object_id=data_object_id
     
    SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
    HEADER_FILE HEADER_BLOCK     BLOCKS
    ----------- ------------ ----------
              4         2019          8
    SQL 10G>alter table t move;
    Table altered.
     
    SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
     OBJECT_ID DATA_OBJECT_ID
    ---------- --------------
         63053          63463
    在move以后可以看到data_object_id发生变化了
     
    SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
    HEADER_FILE HEADER_BLOCK     BLOCKS
    ----------- ------------ ----------
              4          467          8
     
    SQL 10G>truncate table t;
    Table truncated.
     
    SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
     OBJECT_ID DATA_OBJECT_ID
    ---------- --------------
         63053          63464
    SQL 10G>SELECT HEADER_FILE,HEADER_BLOCK,BLOCKS FROM DBA_SEGMENTS WHERE SEGMENT_NAME='T' AND OWNER='TEST';
    HEADER_FILE HEADER_BLOCK     BLOCKS
    ----------- ------------ ----------
              4          467          8
     
    truncate之后虽然segment的位置没有移动,但是data_object_id还是发生变化了。
     
     
    SQL 10G>alter table t add(b number);
    Table altered.
     
    SQL 10G>select object_id,data_object_id from user_objects where object_name='T';
     OBJECT_ID DATA_OBJECT_ID
    ---------- --------------
         63053          63464
     
    我们对t表加一个字段,发现data_object_id没有发生变化,我们可以认为只有当segment发生变化时data_object_id才会随之变化。我们来看看data_object_id从哪里来的
     
     
    SQL 10G> select text from dba_views where view_name='DBA_OBJECTS';
     
    select u.name, o.name, o.subname, o.obj#, o.dataobj#,
           decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
                          4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
                          7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
                          11, 'PACKAGE BODY', 12, 'TRIGGER',
                          13, 'TYPE', 14, 'TYPE BODY',
                          19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
                          22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
                          28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
                          32, 'INDEXTYPE', 33, 'OPERATOR',
                          34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
                          40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
                          42, NVL((SELECT distinct 'REWRITE EQUIVALENCE'
                                   FROM sum$ s
                                   WHERE s.obj#=o.obj#
                                         and bitand(s.xpflags, 8388608) = 8388608),
                                  'MATERIALIZED VIEW'),
                          43, 'DIMENSION',
                          44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
                          48, 'CONSUMER GROUP',
                          51, 'SUBSCRIPTION', 52, 'LOCATION',
                          55, 'XML SCHEMA', 56, 'JAVA DATA',
                          57, 'SECURITY PROFILE', 59, 'RULE',
                          60, 'CAPTURE', 61, 'APPLY',
                          62, 'EVALUATION CONTEXT',
                          66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
                          72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
                          81, 'FILE GROUP',
                         'UNDEFINED'),
           o.ctime, o.mtime,
           to_char(o.stime, 'YYYY-MM-DD:HH24:MI:SS'),
           decode(o.status, 0, 'N/A', 1, 'VALID', 'INVALID'),
           decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'),
           decode(bitand(o.flags, 4), 0, 'N', 4, 'Y', 'N'),
           decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N')
    from sys.obj$ o, sys.user$ u
    where o.owner# = u.user#
      and o.linkname is null
      and (o.type# not in (1  /* INDEX - handled below */,
                          10 /* NON-EXISTENT */)
           or
           (o.type# = 1 and 1 = (select 1
                                  from sys.ind$ i
                                 where i.obj# = o.obj#
                                   and i.type# in (1, 2, 3, 4, 6, 7, 9))))
      and o.name != '_NEXT_OBJECT'
      and o.name != '_default_auditing_options_'
    union all
    select u.name, l.name, NULL, to_number(null), to_number(null),
           'DATABASE LINK',
           l.ctime, to_date(null), NULL, 'VALID','N','N', 'N'
    from sys.link$ l, sys.user$ u
    where l.owner# = u.user#
     
    可以看到data_object_id是从obj$.dataobj#来的,而obj$.dataobj#又对应着seg$.HWMINCR
     
     
    SQL 10G>select max(HWMINCR) from sys.seg$;

    MAX(HWMINCR)
    ------------
           63464
    这里的HWMINCR就是data_object_id的来源,每次seg$里生成新的一条记录都会增加HWMINCR这个值,同时obj$.dataobj#也会跟着变化。
     
     
     
    December 28

    dynamic sampling

    dynamic sampling对于我们来说其实不算太陌生,从9i r2开始,dynamic sampling其实已经不动声色地融入到数据库中了。
    我们经常会碰到由于一些表没有分析导致执行计划错误的情况,但是dynamic sampling的出现一定程度的减少了错误的产生。dynamic sampling针对没有分析过的表可以采样估计表的选择性,对于生产正确的执行计划有一定的帮助。
     
    dynamic sampling分为10个级别,从0-10,由参数optimizer_dynamic_sampling控制
     
    下面是取自"Performance Tuning Guide and Reference "的对10种级别的定义
     
    • Level 0: Do not use dynamic sampling.
    • Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
    • Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is the default number of dynamic sampling blocks.
    • Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks.
    • Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks.
    • Level 5: Apply dynamic sampling to all tables that meet the Level 4 criteria using 2 times the default number of dynamic sampling blocks.
    • Level 6: Apply dynamic sampling to all tables that meet the Level 5 criteria using 4 times the default number of dynamic sampling blocks.
    • Level 7: Apply dynamic sampling to all tables that meet the Level 6 criteria using 8 times the default number of dynamic sampling blocks.
    • Level 8: Apply dynamic sampling to all tables that meet the Level 7 criteria using 32 times the default number of dynamic sampling blocks.
    • Level 9: Apply dynamic sampling to all tables that meet the Level 8 criteria using 128 times the default number of dynamic sampling blocks.
    • Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
    dynamic_sampling hint的级别定义如下
     
    • Level 0: Do not use dynamic sampling.
    • Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).
    • Level 2: The number of blocks sampled is 2 times the default number of dynamic sampling blocks.
    • Level 3: The number of blocks sampled is 4 times the default number of dynamic sampling blocks.
    • Level 4: The number of blocks sampled is 8 times the default number of dynamic sampling blocks.
    • Level 5: The number of blocks sampled is 16 times the default number of dynamic sampling blocks.
    • Level 6: The number of blocks sampled is 32 times the default number of dynamic sampling blocks.
    • Level 7: The number of blocks sampled is 64 times the default number of dynamic sampling blocks.
    • Level 8: The number of blocks sampled is 128 times the default number of dynamic sampling blocks.
    • Level 9: The number of blocks sampled is 256 times the default number of dynamic sampling blocks.
    • Level 10: Read all blocks in the table.

     

    下面做一些测试

     

    SQL> create table test as select object_id a from dba_objects;

    Table created.

     

    SQL>  show parameter optimizer_dynamic_sampling

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_dynamic_sampling           integer     1


    SQL> set autotrace trace explain;


    SQL> select count(*) from test;

    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (AGGREGATE)
       2    1     TABLE ACCESS (FULL) OF 'TEST'

     

    SQL> alter session set optimizer_dynamic_sampling=10;

    Session altered.

     

    SQL> select count(*) from test;

    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (AGGREGATE)
       2    1     TABLE ACCESS (FULL) OF 'TEST'


    dynamic sampling没有起作用,因为在9i里面dynamic sampling需要在多表关联的语句里面才会起作用。在10g里面optimizer_dynamic_sampling在2及以上级别时单表就能dynamic sampling。

     

    SQL 10G>alter session set optimizer_dynamic_sampling=2;

    Session altered.

     

    SQL 10G>select count(*) from test;

     

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1950795681

    -------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
    -------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |    21   (5)| 00:00:01 |
    |   1 |  SORT AGGREGATE    |      |     1 |            |          |
    |   2 |   TABLE ACCESS FULL| TEST | 48065 |    21   (5)| 00:00:01 |
    -------------------------------------------------------------------

    Note
    -----
       - dynamic sampling used for this statement

     

     

    看看关联查询的情况

     

    SQL> create table test1 as select * from test;

    Table created.

     

    SQL> select count(*) from test a,test1 b where a.a=b.a;

    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (AGGREGATE)
       2    1     MERGE JOIN
       3    2       SORT (JOIN)
       4    3         TABLE ACCESS (FULL) OF 'TEST1'
       5    2       SORT (JOIN)
       6    5         TABLE ACCESS (FULL) OF 'TEST'

     

    如果两个表都没分析,那么dynamic sampling不起作用,分析其中一个表

     

    SQL> analyze table test1 compute statistics;

    Table analyzed.

     

    SQL> select count(*) from test a,test1 b where a.a=b.a;

    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=17)
       1    0   SORT (AGGREGATE)
       2    1     HASH JOIN (Cost=19 Card=23706 Bytes=403002)
       3    2       TABLE ACCESS (FULL) OF 'TEST1' (Cost=6 Card=23706 Byte
              s=94824)

       4    2       TABLE ACCESS (FULL) OF 'TEST' (Cost=6 Card=23706 Bytes
              =308178)

     

    dynamic sampling起作用了。

     

    在10g里面有点区别,即使2个表都没分析过,dynamic sampling也可以起作用

     

    SQL 10G>alter session set optimizer_dynamic_sampling=1;

    Session altered.

     

    SQL 10G>select count(*) from test a,test1 b where a.a=b.a;

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2909046986

    --------------------------------------------------------------------------------
    -----

    | Id  | Operation           | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time
        |

    --------------------------------------------------------------------------------
    -----

    |   0 | SELECT STATEMENT    |       |     1 |    26 |       |   151   (4)| 00:00
    :02 |

    |   1 |  SORT AGGREGATE     |       |     1 |    26 |       |            |
        |

    |*  2 |   HASH JOIN         |       | 43244 |  1097K|  1056K|   151   (4)| 00:00
    :02 |

    |   3 |    TABLE ACCESS FULL| TEST1 | 43235 |   548K|       |    21   (5)| 00:00
    :01 |

    |   4 |    TABLE ACCESS FULL| TEST  | 45439 |   576K|       |    21   (5)| 00:00
    :01 |

    --------------------------------------------------------------------------------
    -----


    Predicate Information (identified by operation id):
    ---------------------------------------------------

       2 - access("A"."A"="B"."A")

     

    Note
    -----
       - dynamic sampling used for this statement

     

    其实归根到底是由于10g没有了基于规则的优化器,在9i里面如果一个查询中的所有表没有统计数据,那么他将选择基于规则的优化器而忽略dynamic sampling,而在10g因为这个原因所以dynamic sampling生效。