合 Oracle中的基数(Cardinality)和可选择率(Selectivity)
 
简介
基数(Cardinality)是Oracle预估的返回行数,即对目标SQL的某个具体执行步骤的执行结果所包含记录数的估算值。如果是针对整个目标SQL,那么此时的Cardinality就表示该SQL最终执行结果所包含记录数的估算值。例如,一张表T有1000行数据,列COL1上没有直方图,没有空值,并且不重复的值(Distinct Value)有500个。那么,在使用条件“WHERE COL1=
可选择率(Selectivity)是指施加指定谓词条件后返回结果集的记录数占未施加任何谓词条件的原始结果集的记录数的比率。可选择率的取值范围显然是0~1,它的值越小,就表明可选择性越好。当可选择率为1时的可选择性是最差的。CBO就是用可选择率来估算对应结果集的Cardinality的,可选择率和Cardinality之间的关系如下所示:
1 | cardinality=NUM_ROWS*selectivity |
其中,NUM_ROWS表示表的总行数。
在Oracle数据库中,Oracle会默认认为SQL语句的WHERE条件中出现的各列彼此之间是独立的,是没有关联关系的。所以,如果目标SQL语句各列之间是以AND来组合的话,那么该SQL语句整个WHERE条件的组合可选择率就等于各个列各自施加查询条件后可选择率的乘积。在得到了SQL语句整个WHERE条件的组合可选择率后,Oracle会用它来估算整个SQL语句返回结果集的Cardinality,估算的方法就是用目标表的总记录数(NUM_ROWS)乘组合可选择率。但Oracle默认认为的各列之间是独立的、没有关联关系的前提条件并不总是正确的,在实际的应用中各列之间有关联关系的情况实际上并不罕见。在这种情况下如果还用上述计算方法来计算目标SQL语句整个WHERE条件的组合可选择率并用它来估算返回结果集的Cardinality的话,那么估算结果可能就会与实际结果有较大的偏差,进而可能导致CBO选错执行计划,所以Oracle又引入了动态采样和多列统计信息。
下表给出了一些常见的可选择率计算公式:
示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | 下面给出一个示例: DROP TABLE T_ROWS_20170605_LHR; CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000; UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100; EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',estimate_percent => 100); LHR@orclasm > COL LOW_VALUE FORMAT A20 LHR@orclasm > COL HIGH_VALUE FORMAT A20 LHR@orclasm > SELECT D.LOW_VALUE,D.HIGH_VALUE,UTL_RAW.CAST_TO_NUMBER(D.LOW_VALUE) LOW_VALUE2,UTL_RAW.CAST_TO_NUMBER(D.HIGH_VALUE) HIGH_VALUE2, D.NUM_DISTINCT,D.NUM_NULLS FROM DBA_TAB_COL_STATISTICS D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.OWNER = 'LHR' AND D.COLUMN_NAME='ID'; LOW_VALUE HIGH_VALUE LOW_VALUE2 HIGH_VALUE2 NUM_DISTINCT NUM_NULLS -------------------- -------------------- ---------- ----------- ------------ ---------- C20202 C302 101 10000 9900 100 LHR@orclasm > SELECT MIN(T.ID),DUMP(MIN(T.ID),16) LOW_VALUE,MAX(T.ID),DUMP(MAX(T.ID),16) HIGH_VALUE FROM T_ROWS_20170605_LHR T; MIN(T.ID) LOW_VALUE MAX(T.ID) HIGH_VALUE ---------- -------------------- ---------- -------------------- 101 Typ=2 Len=3: c2,2,2 10000 Typ=2 Len=2: c3,2 下面分别执行如下4条SQL语句并获取执行计划: SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000;--1 SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000; --9000 SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000; --9001 SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100; --101 LHR@orclasm > set autot on exp LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=1000; COUNT(1) ---------- 1 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 4 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID"=1000) -- ROUND(NUM_ROWS*(1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) LHR@orclasm > SELECT ROUND(10000*1/9900*((10000-100)/10000)) VALUE FROM DUAL; VALUE ---------- 1 LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>1000; COUNT(1) ---------- 9000 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9001 | 36004 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID">1000) --ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE))*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101))*((10000-100)/10000)) VALUE FROM DUAL; VALUE ---------- 9001 LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID>=1000; COUNT(1) ---------- 9001 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 9002 | 36008 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID">=1000) --ROUND(NUM_ROWS*((HIGH_VALUE-VAL)/(HIGH_VALUE-LOW_VALUE)+1/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS)) LHR@orclasm > SELECT ROUND(10000*((10000-1000)/(10000-101)+1/9900)*((10000-100)/10000)) VALUE FROM DUAL; VALUE ---------- 9002 LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID BETWEEN 1000 AND 1100; COUNT(1) ---------- 101 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 4 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 102 | 408 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID"<=1100 AND "T"."ID">=1000) LHR@orclasm > --ROUND(NUM_ROWS*(((VAL2-VAL1)/(HIGH_VALUE-LOW_VALUE)+2/NUM_DISTINCT)*((NUM_ROWS-NUM_NULLS)/NUM_ROWS))) LHR@orclasm > SELECT ROUND(10000*(((1100-1000)/(10000-101)+2/9900)*((10000-100)/10000))) VALUE FROM DUAL; VALUE ---------- 102 可见预估行数和用公式计算出来的结果相吻合。 下面再查看有频率直方图的时候基数的计算。 DROP TABLE T_ROWS_20170605_LHR; CREATE TABLE T_ROWS_20170605_LHR AS SELECT ROWNUM ID,'NAME1' SAL FROM DUAL CONNECT BY LEVEL<=10000; UPDATE T_ROWS_20170605_LHR T SET T.ID='' WHERE T.ID<=100; UPDATE T_ROWS_20170605_LHR SET ID=2 WHERE ID BETWEEN 101 AND 200; UPDATE T_ROWS_20170605_LHR SET ID=3 WHERE ID BETWEEN 200 AND 3000; UPDATE T_ROWS_20170605_LHR SET ID=9 WHERE ID BETWEEN 3000 AND 9999; SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID; 查看数据分布: LHR@orclasm > SELECT T.ID,COUNT(*) FROM T_ROWS_20170605_LHR T GROUP BY T.ID; ID COUNT(*) ---------- ---------- 100 10000 1 2 100 3 2800 9 6999 收集频率直方图: LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T_ROWS_20170605_LHR',CASCADE=>TRUE,METHOD_OPT=>'FOR COLUMNS ID SIZE 6',estimate_percent => 100); PL/SQL procedure successfully completed. LHR@orclasm > SELECT D.COLUMN_NAME,D.NUM_DISTINCT,D.NUM_NULLS,D.NUM_BUCKETS,D.HISTOGRAM,D.DENSITY FROM DBA_TAB_COLUMNS D WHERE D.TABLE_NAME = 'T_ROWS_20170605_LHR' AND D.COLUMN_NAME='ID'; COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM DENSITY ------------------------------ ------------ ---------- ----------- --------------- ---------- ID 4 100 4 FREQUENCY .000050505 LHR@orclasm > COL COLUMN_NAME FORMAT A6 LHR@orclasm > SELECT TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE,NVL((ENDPOINT_NUMBER-(LAG(ENDPOINT_NUMBER) OVER (ORDER BY ENDPOINT_VALUE))),ENDPOINT_NUMBER) COUNTS FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ROWS_20170605_LHR' AND COLUMN_NAME='ID'; TABLE_NAME COLUMN ENDPOINT_NUMBER ENDPOINT_VALUE COUNTS ------------------------------ ------ --------------- -------------- ---------- T_ROWS_20170605_LHR ID 100 2 100 T_ROWS_20170605_LHR ID 2900 3 2800 T_ROWS_20170605_LHR ID 9899 9 6999 T_ROWS_20170605_LHR ID 9900 10000 1 当目标列有频率直方图并且对目标列施加等值查询条件时,如果查询条件的输入值等于目标列的某个Bucket的ENDPOINT_VALUE,那么cardinality=Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER: LHR@orclasm > SET AUTOT ON LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=3; COUNT(1) ---------- 2800 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 2800 | 8400 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID"=3) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 23 consistent gets 0 physical reads 0 redo size 526 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 可见,预估行数为2800,和直方图中存储的值吻合(2900-100)。 当目标列有频率直方图并且对目标列施加等值查询条件时,如果查询条件的输入值不等于目标列的任意一个Bucket的ENDPOINT_VALUE,那么cardinality=MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)/2: LHR@orclasm > SELECT COUNT(1) FROM T_ROWS_20170605_LHR T WHERE T.ID=4; COUNT(1) ---------- 0 Execution Plan ---------------------------------------------------------- Plan hash value: 612708570 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 3 | 9 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T_ROWS_20170605_LHR | 1 | 3 | 9 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T"."ID"=4) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 23 consistent gets 0 physical reads 0 redo size 525 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed LHR@orclasm > select round(1/2) from dual; ROUND(1/2) ---------- 1 |
在直方图中,由于MIN(Current_ENDPOINT_NUMBER-Previous_ENDPOINT_NUMBER)=1,所以,ROUND(1/2)=1,和执行计划中的预估行数相吻合。
相关文章
- 【MOS】收集统计信息导致游标失效ROLL_INVALID_MISMATCH Rolling Cursor Invalidations with DBMS_STATS.AUTO_INVALIDATE (Doc ID 557661.1)
- Oracle如何锁住或解锁统计信息?
- 在Oracle中,对表执行TRUNCATE操作会将表的统计信息也清除掉吗
- Oracle如何查询表的DML操作数据变化量
- 在Oracle中,当收集表的统计信息时应该注意哪些问题
- Oracle如何查询表和索引的历史统计信息
- Oracle中的统计信息之直方图(Histogram)系列
- 在Oracle中,什么是待定的统计信息(Pending Statistic)
- 什么是Oracle的基数反馈(Cardinality Feedback)?
- Oracle统计信息之动态采样(Dynamic Sampling)
- Oracle中的统计信息介绍
- Oracle优化器统计信息顾问任务AUTO_STATS_ADVISOR_TASK
- Oracle中的自动收集统计信息
- 在Oracle中,在新建或重建索引后统计信息是否自动收集
- Oracle多列统计信息
- Oracle收集统计信息之NO_INVALIDATE参数
- 统计信息不准确导致执行计划走了笛卡尔积
- Oracle收集内部对象和数据字典的统计信息
- Oracle并行和并发收集统计信息(含常用SQL)
- Oracle中如何找出统计信息过期的表



