Switch to standard view 
  Sybase logo
 
 
 



xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40"> USING OPTDIAG SIMULATE STATISTICS MODE

By Eric Miner

Optimizer Engineering

Sybase, Inc.

 

WHAT IS OPTDIAG SIMULATE STATISTICS?

Have you ever wondered how a query or set of queries would optimize if your dataset were different? Have you been in the situation where Technical Support asked for your dataset in order to reproduce an optimizer problem you experienced? Optdiag simulate allows both to be done quickly and easily without changing your actual dataset or sending it to Technical Support.

Optdiag's simulate statistics mode is designed to perform two functions - to allow Technical Support to reproduce optimizer cases for users and to allow users to perform what-if analysis. In both cases the actual data from the database is not required. This saves a great deal of time and resources. In this mode optdiag can output a file which contains the same information as other optdiag modes along with additional information about some server level resources.

The optdiag simulate statistics output file format differs in one significant way from other optdiag modes. It contains an actual and simulated line for each statistics value that can be simulated. When read back into a target server via the -o option the simulated values are written to sysstatistics in a special format that then can then be used by the optimizer to estimate costs of queries using values other than the actual values. Below is an example. The actual value is proceeded with a # and thus will not be written to the system table, only the simulated line will be written. Only values in rows labeled '(simulated)' can be edited. When changing statistics only change these values, and no other information. The simulated values can be changed by using a text editor.

Data page count: 44308.0000000000000 (simulated)

# Data page count: 44308.0000000000000 (actual)

Optdiag can read and write statistics in a combination of two formats, default format and binay. The simplest way to view statistics is by using the parameter statistics alone in the optdiag command line this will output statistics in the ASCII format. The parameter statistics must be in the optdiag command line. Binary mode should be used if there is an issue of precision with floating point values. See "New Functionality in ASE 11.9.2" chapter 8 for more information on when to use binary mode. The use of binary and simulate are optional. However, binary and simulate can be used in combination. For example, if you are simulating on a different type of computer than where the source database is you may want to consider using binary in combination with simulate in the command line. Below is an example of simulated binary statistics, notice that the editable value is in binary format.

Data page count: 0x0000000080a2e540 (simulated)

# Data page count: 44308.0000000000000 (simulated)

# Data page count: 44308.0000000000000 (actual)

 

TERMS AND SYNTAX USED IN THIS PAPER:

See New Functionality in ASE 11.9.2 for a detailed discussion of optdiag command line parameters.

Test server: This is the ASE server that you are using to perform the simulation.

Source server: This is the ASE server from which optdiag simulate statistics files were obtained. In many cases this will be a production dataset.

Dataset - The entire database, or a set of tables in the database.

Table/Index level statistics - These are the statistics that are used to describe various aspects of a table or index.

Column level statistics - These are the statistics that are used to describe a column to the optimizer. They contain the histogram (distribution information) and the density values, along with the last time these statistics were modified.

Shared statistics - These are statistics that are used by the optimizer to estimate the cost of a query but don't belong to a single table. They include cache and parallelism related statistics.

Optdiag syntax review:

See the document New Functionality in ASE 11.9.2 for a detailed description of optdiag syntax and other requirements. Below is the syntax used in simulation:

optdiag simulate statistics db_name -U -P -S -o output_filename

The above will output a single file containing simulated statistics for all tables in the database

Optdiag simulate statistics db_name..table_name -U -P -S -o output_filename

The above will output a single file containing simulated statistics for the specified table.

To input a file use the -i option followed by the input file name.

WHAT CAN BE SIMULATED?

All table, index and column statistics can be simulated, along with cache sizes, configured degrees of max parallelism and largest partition size information. By manipulating these values you can test queries to see how variations effect query plans and thus performance. Column level statistics can also be changed using optdiag simulate; they will be written to the system table sysstatistics. You will need to take special steps to return column level statistics to their original values (see Simulating Column Level Statistics and Removing Simulate Statistics later in this paper).

The optdiag simulate statistics output files will contain all values that can be simulated. Most of these values are obtained from the two system tables that hold statistical values - sysstatistics (column statistics) and systabstats (table/index statistics). A handful of values are gathered from elsewhere within the server. These include values on available caches and their sizes, information on parallelism and the size of the largest partition.

SIMULATING VALUES THAT ARE NOT STORED IN THE SYSTEM

Steps should be taken to insure that the cache, parallelism and partition values are properly simulated in the test server. The first two are referred to as shared statistics. The ability to change these values is unique to the optdiag simulate format.

Caches:

Caches that appear in the optdiag simulate output file must be present in the test server before the file is read in for simulation. If not, optdiag will fail and report an error. You will need to be sure that all named caches you wish to use in the simulation, or which are in the output file you obtained from the source server are created in the test server. They can be created using the minimal amount of cache (512K). Also, if tables and/or indexes are bound to named caches in the source server, or if you want to simulate objects bound to caches, you must bind these objects to caches in the target server. If you are using any special cache strategies they must also be specified.

Here we are simulating a default cache with a 16K pool of 50Mb while the test server's actual default data cache has no 16K pool.

Size of 2K pool in Kb: 6902 (simulated)

# Size of 2K pool in Kb: 6902 (actual)

Size of 4K pool in Kb: 0 (simulated)

# Size of 4K pool in Kb: 0 (actual)

Size of 8K pool in Kb: 0 (simulated)

# Size of 8K pool in Kb: 0 (actual)

Size of 16K pool in Kb: 25600 (simulated)

# Size of 16K pool in Kb: 0 (actual)

Max Parallel Degree and Max Scan Degree:

The simulated values for these two setting can be changed as you wish for what-if analysis. If you are working with Technical Support on a case please give your TS Engineer all information about any session level settings for these values that you may have used with the query(ies) in question.

Max parallel degree: 12 (simulated)

# Max parallel degree: 1 (actual)

Max scan parallel degree: 12 (simulated)

# Max scan parallel degree: 1 (actual)

Partitions:

If a table is partitioned the size of its largest partition will appear at the end of the 'statistics for table' (on clustered index if one exists) section of the optdiag output. Changing this value will effect the costing of queries using parallelism. If a table is partitioned in the source server, or if you want to simulate with partitioned tables the tables in the test server must be partitioned.

Pages in largest partition: 5730.000000000000000 (simulated)

# Pages in largest partition: 573.000000000000000 (actual)

SIMULATING COLUMN LEVEL STATISTICS

Modifying the column level statistics is done the same way in all optdiag modes statistics, binary and simulate statistics. When you change column level statistics via optdiag simulate they are written directly to sysstatistics, thus they are not simulated as are the other statistics. However changes to column level statistics will be used by the optimizer in the simulation. Keep in mind that to set column level statistics back to their original values you must either run update statistics on the table or column or by reading in an optdiag file obtained before modifying the column level statistics. See 'Removing Simulated Statistics' in this paper titled for more information.

Changing the column level statistics will enable you to simulate different data distributions, densities and default selectivity values. These values will have a direct, and in many cases dramatic, effect on the optimizer's query plan choice.

See "New Functionality in ASE 11.9.2" section titled 'Changing Statistics with optdiag' for a very good detailed discussion of how to change column level statistics.

HOW MUCH OPTDIAG SIMULATE OUTPUT TO GET?

There are two approaches to obtaining the necessary optdiag simulate outputs. You can get a single simulate output file for the entire dataset:

optdiag simulate statistics database_name -Uuser -Ppw ...

Or a single file for each table you plan to use in the simulation:

optdiag simulate statistics database_name..table_name -Uuser -Ppw ...

When working with an output file from the entire database you will need to make sure that all tables and their indexes have been created in the test server. This is the best approach if you will be performing your simulation with queries that reference most or all of the tables in the database. If you plan to simulate using only a few tables in your queries you can get a single file for each table. You will then need to only create the tables you want, along with their indexes, and read in the files individually.

Technical Support may take the same approach. Verify with your Technical Support Engineer if they want the outputs from the entire database or only selected tables.

Note: You can obtain optdiag simulate output files from a database with tables that are empty (just the table and indexes created, but with no rows) however you will not get any column level statistics since no values were present when indexes were created. It is advisable to get your optdiag simulate output files from an existing database, this will save you the time and effort of manually creating the column level statistics.

If need be, column level statistics can be cut and pasted into an optdiag simulate file that was obtained from an empty dataset. To do this the column must exist in the test server, you must insure that the entire section of statistics for the column is pasted in. Also, make sure to remove the column's name from the "No statistics for remaining columns:" list.

REMOVING SIMULATED STATISTICS

During your simulation you may want to remove the changes you've made to simulated statistics and return to the original values. For statistics that are 'shared', cache and parallelism related statistics use the following command to return to their original values:

delete shared statistics

This will remove the shared statistics from system tables in the master database.

There are two ways to return to original values after using optdiag simulat:

TO RETURN TO ORIGINAL COLUMN LEVEL STATISTICS:

1.       Use the delete statistics table_name command for each table. This will remove all statistics for the table in sysstatistics. To get back to the original statistics values you will then need to run update statistics table_name to update statistics on all columns of the table that have statistics or update statistics column_name to update statistics on a individual column. This could take a while if the table is large, you may want to consider the second approach listed below.

2.       Before beginning your simulation get optdiag statistics files for each table. Do not change values in these files. After altering values and performing your simulation you can read these back into to the server using the -i option of the optdiag command. This approach will take less time than updating the statistics on each table.

TO RETURN TO ORIGINAL TABLE OR INDEX LEVEL STATISTICS:

The only way to replace simulated table or index level statistics is to set both of them back to their original values with an optdiag statistics file. This makes step 2 above the best approach for replacing simulated statistics with their original values.

OPTDIAG SIMULATE IN 'WHAT IF' ANALYSIS

The major advantage of optdiag simulate is the ability to perform 'what if' analysis.

In 'what if' analysis you can run queries against various statistics in order to determine how the changed statistics will effect query plans. It is advisable to keep statistics as close as possible to the values you expect to see in the actual dataset. If you only have a maximum of 500Mb memory available to you it may not be very useful to simulate the use of Gb of memory.

PREPARING THE SIMULATION

Empty vs. Populated datasets:

It may be tempting to simply create tables and indexes and then begin 'what-if' analysis on these empty tables. However, optdiag simulate output files from empty tables will not include column statistics. You will need to either get optdiag output files from an existing dataset or from a small version of the dataset you want to use in simulation. You'll need to make sure that the statistics reflect what you expect the distribution of data to be.

Query outputs from an existing dataset:

If you will be simulating an existing dataset in a source server with a set of queries that are run on that dataset you may want to get optimizer outputs from these queries run on the source server. These output files will be essential for comparison against query plans optimized using simulate statistics. Get the following outputs from runs of your queries and/or procedures:

showplan

dbcc traceon (3640,302,310)

Testing queries against simulated statistics:

Once you've read in the simulated statistics via optdiag you'll want to see how they'll effect query plans. To do this you will need to issue the following set command at the session:

set statistics simulate on

This set command tells the optimizer to use the simulated statistics rather than the actual statistics.

If you are testing a stored procedure which has been previously compiled you will need to execute it with the recompile option and with set statistics simulate on, this will allow the simulated statistics to be used when compiling the procedure.

It is advisable to use traceons 302 and 310 along with showplan to examine the optimizer's behavior when using simulated statistics. There are examples further on in this paper using these.

When using simulated statistics there is no need to use set statistics io on. This is because if you are simulating on datasets that are empty, or different than the source, the I/O that does occur may differ greatly from I/O on the source dataset. This could result in misleading I/O outputs. For example if the table you are working with has 100 pages and you decide to simulate 10,000 pages statistics I/O output will show the cost of a table scan takes 100 I/O's not 10,000.

You can use set statistics time on to see any differences in parse and compile time and its corresponding server cpu time. However, the server elapsed time and total cpu time will change when the simulated values become actual values.

Verifying that simulated statistics were used:

Both showplan and traceon 302 will report when simulated statistics are being used by the optimizer.

Traceon 302 example:

Statistics for this column have been edited.

Showplan example:

Optimized using simulated statistics.

Simulation check list:

<list things that need to be done or set on for simulate to work, see page 8-33 of new functionality. I might leave this out as a repeat of info>

USING OPTDIAG SIMULATE IN TECHNICAL SUPPORT CASE REPRODUCTION

Under some circumstances Technical Support may ask that you provide them with your dataset to insure accurate analysis of your case. Generally the actual data is not needed, if you can provide the Technical Support Engineer with the optdiag simulate and other neccessary output files. The behavior you're seeing can usually be reproduced using these files. You will want to provide the Technical Support Engineer with the following in order to have your case reproduced via optdiag simulate:

Information to Provide to Technical Support:

         The ASE configuration file

         Scripts containing create table and create index statements for all tables involved in the analysis. This will make it easier for the Technical Support Engineer to begin the reproduction/simulation

         Scripts containing all create view, trigger and stored procedure statements if involved.

         sp_help output for all tables involved

         Information on all caches available in the server, including the default data cache, parallelism used, and Information about the partitioning, if any, of all tables and databases involved. This will be included in the sp_help output, but if you have a script that partitions tables provide it. Also, including information on any non-default cache strategy used.

         Information about any object bound to a cache, including whether tempdb is bound to a cache or not.

         Optdiag simulate statistics outputs for all tables involved. This may be provided as a single optdiag simulate statistics file for the entire database.

         Information about any session level settings that were used. These include trace flags and parallel settings.

         All queries, stored procedures and cursors involved.

         If you have not already provided them, outputs of the problem query(ies) and /or stored procedures (with recompilation) run with the following dbcc traceon and set command:

traceons 302 and 310

set statistics io on and set statistics time on (if set noexec is not on)

set showplan on

         If you used a 'force' (forceplan or force index) provide the above outputs from a run of the query with and without the use of the force(s).

         If you have obtained the above outputs using actual and simulated statistics provide

both sets of outputs.

         If you have changed concurrency optimization from default please indicate its present value.

Description of optdiag simulate statistics output -

This section will take a look at the various statistical values that can be changed using optdiag simulate. An explanation of the value will be given and how it can effect query plans will be discussed. Some examples will be given and discussed.

OptDiag/11.9.2/1031/P/NT (IX86)/OS 3.51,4.0/OPT/Fri Aug 14 01:35:23 1998

Adaptive Server Enterprise/11.9.2/1031/P/NT (IX86)/OS 3.51,4.0/FBU/Fri Aug 14 01:42:41 1998

Server name: "no_name"

If the name of the source server is different than the test server, make sure to add the test server's name here.

Specified database: "tpcd"

You'll need to change this if the name of your simulated database name is different than the original. It is not advisable to change object names.

Specified table owner: not specified

Generally no need to change this unless the owner in the source table is someone other than dbo.

Specified table: "orders"

Generally no need to change this unless the name of the table in the target table is different than in the source database.

Specified column: not specified

Generally no need to change unless the column name is different in the target dataset.

Note: if any of the above items were specified on the command line for optdiag make sure they match objects and users in the target database.

CONFIGURATION STATISTICS SECTION -

Keep in mind that the rows beginning with a # are seen as comments by optdiag and are skipped when reading in the file.

Configuration Parameters:

Last update of simulated: Feb 26 1999 10:01AM

There is no need to change the above value. It will be useful however to keep track of when you last changed the values for max parallel and max scan degrees.

Max parallel degree: 1 (simulated)

# Max parallel degree: 1 (actual)

Max scan parallel degree: 1 (simulated)

# Max scan parallel degree: 1 (actual)

See 'Simulating Values That Are Not Stored In The System Tables' previously in this paper for a description of the max parallel degree and max scan degree values above.

Configuration for cache: "default data cache"

Last update of simulated: Feb 26 1999 10:01AM

There is no need to change the above value. It will be useful however to keep track of when you changed the values for caches last.

Size of 2K pool in Kb: 15992 (simulated)

# Size of 2K pool in Kb: 15992 (actual)

Size of 4K pool in Kb: 0 (simulated)

# Size of 4K pool in Kb: 0 (actual)

Size of 8K pool in Kb: 0 (simulated)

# Size of 8K pool in Kb: 0 (actual)

Size of 16K pool in Kb: 5120 (simulated)

# Size of 16K pool in Kb: 0 (actual)

See 'Simulating Values That Are Not Stored In The System Tables' previously in this paper for a description of the cache values above.

TABLE/ CLUSTERED INDEX STATISTICS SECTION

The statistics that belong to the table and its clustered index (if one exists) are stored in the systabstats system table. Currently in optdiag these are the only statistics that are written to the system tables as actual statistics and not in the simulated format. Once written via optdiag simulate, column level statistics will remain until overwritten see the section in this paper titled 'Removing Simulated Statistics' for more information.

Table owner: "dbo"

Generally no need to change. If you suspect there may be an issue with table ownership make sure to check this value in the source database.

Table name: "orders"

Generally no need to change. It is advisable not to change the names of objects if an optdiag simulate file was obtained from a source server.

Statistics for table: "orders"

If there is a clustered index on the table this line will read "Statistics for index: clustered index name". All table statistics will be listed under the clustered index. In this case there is no clustered index, so the "Statistics for table:" is used.

Last update of simulated: Feb 23 1999 3:31PM

There is no need to change this value. However, it will be useful if you want to verify the last time you changed statistics during your simulation.

Data page count: 9398.00000000000000 (simulated)

# Data page count: 9398.00000000000000 (actual)

Changing this value will change the number of pages the optimizer believes are in the

table. If there is no clustered index on the table this value will be the number of datapages

in the table. If there is a clustered index this value will represent the datapages in the leaf

level of the clustered index.

Empty data page count: 1.0000000000000000 (simulated)

# Empty data page count: 1.0000000000000000 (actual)

This is the number of pages allocated to the table that are empty. This value can be

changed to simulate fragmentation of the table.

Data row count: 150000.000000000000 (simulated)

# Data row count: 150000.000000000000 (actual)

This is the number of data rows in the table. This value can be changed to simulate a

larger or smaller table. In many cases you may want to increase this value. Make sure you

also increase the page count value accordingly.

Forwarded row count: 0.0000000000000000 (simulated)

# Forwarded row count: 0.0000000000000000 (actual)

This is the number of rows that have been moved off of their original pages to other pages

due to a deferred update. This value can also be used to simulate table fragmentation.

Deleted row count: 0.0000000000000000 (simulated)

# Deleted row count: 0.0000000000000000 (actual)

This value is the number of rows that have been marked as deleted, but whose space has

not been compacted by reorg, or taken by another row. This value can also be used to

simulate fragmentation in the table. For more information on the Reorg utility see 'New

Functionality in ASE 11.9.2'.

Data page CR count: 0.0000000000000000 (simulated)

# Data page CR count: 0.0000000000000000 (actual)

This is the data page clustering count for this table. If there is a clustered index it will be the

value for the clustered index. As this number increases the value for 'Data page cluster

ratio' below decreases. With all the derived statistics the closer to 1.000 the better the

clustering is and the less expensive data access is. This can be used to simulate various

degrees of clustering within the table/index. All the CR counts will need to be 'tuned'. That

is, if you change the page and row counts you will need to test values for the CR counts to

make sure that the resulting derived statistics are reasonable. This will be easier if you are

working with optdiag simulate output files from an existing dataset in a source server.

OAM + allocation page count: 41.0000000000000000 (simulated)

# OAM + allocation page count: 41.0000000000000000 (actual)

This value is used by the optimizer to estimate the cost of OAM scans on DOL (data only

locked) tables. If your table is DOL and you expect it grow large you may want to increase

this value in simulation.

First extent data pages: 0.0000000000000000 (simulated)

# First extent data pages: 0.0000000000000000 (actual)

This is the number of pages that share the first extent of an allocation unit with the

allocation page on DOL tables only. This value is used in costing large I/O. In general there

is no need to change it.

Data row size: 102.500000000000000 (simulated)

# Data row size: 102.500000000000000 (actual)

This is the average length of a datarow. You may want to vary this value if you expect the

Average size of a row to be different than that shown.

Pages in largest partition: 11493.0000000000000 (simulated)

# Pages in largest partition: 11493.0000000000000 (actual)

If the table is partitioned the size of the largest partition is printed in the line above.

Changing this value may effect how queries will be costed when using parallelism.

Derived statistics:

# Data page cluster ratio: 0.9999900000000001 (simulated)

# Data page cluster ratio: 0.9999900000000001 (actual)

See note for Derived statistics below.

TABLE/ NON-CLUSTERED INDEX STATISTICS SECTION

The section below contains statistics for the non-clustered indexes on the table.

Statistics for index: "orders_ckey" (nonclustered)

Index column list: "o_custkey", "o_orderdate", o_orderkey"

Last update of simulated: Feb 23 1999 3:31PM

Leaf count: 1579.00000000000000 (simulated)

# Leaf count: 1579.00000000000000 (actual)

This is the number of pages in the leaf level of this index. If you increase the number of rows

and pages in the table you will need to increase this number too. As the number of pages in the

table grows so will the number of pages in an index. You'll need to make a guess as to the best

value based on the index row length and how many rows fit on an index page. If you have

optdiag simulate output files from an existing dataset you can use that information to estimate

how index pages will grow along with datapages.

Empty leaf page count: 0.0000000000000000 (simulated)

# Empty leaf page count: 0.0000000000000000 (actual)

This is the number of empty pages in the leaf level of this index. This value can be used to

simulate fragmentation of the leaf level of the index. Such fragmentation will effect the

estimated cost of using the index.

Data page CR count: 149872.000000000000 (simulated)

# Data page CR count: 149872.000000000000 (actual)

This is the number of times a different extent must be accessed in order to read datapages

sequentially from this index. A value of 0 indicates that the index is perfectly clustered in

relation to the datapages. This value is used to obtain the data page cluster ratio in the

derived statistics. The larger this number is, the more 'jumps' there are between extents to

read datapages, the lower the data page cluster ratio (in the derived statistics) becomes.

Index page CR count: 200.000000000000000 (simulated)

# Index page CR count: 200.000000000000000 (actual)

This is the number of times a different extent must be accessed in order to read

Sequentially, the leaf level of this index. A value of 0 indicates that the index pages

perfectly clustered for a covered index scan. This value will be used to estimate a cost for

using large I/O for an index scan.

Data row CR count: 149989.000000000000 (simulated)

# Data row CR count: 149989.000000000000 (actual)

This value represents the number of different datapage reads that will have to be done to

scan the datapages in the order of the leaf rows. This value determines how well clustered

the values on the index leaf are in relation to the data rows. This will be used to determine

how efficient the use of the non-clustered index will be.

First extent leaf pages: 0.0000000000000000 (simulated)

# First extent leaf pages: 0.0000000000000000 (actual

This is the number of index leaf pages that are stored within the first extent of an allocation

unit. It is stored only for indexes on DOL tables Such pages must be accessed using 2K

I/O. It will be used in determining the I/O size to use.

Leaf row size: 19.0000000000000000 (simulated)

# Leaf row size: 19.0000000000000000 (actual)

This is the average size of the leaf level row.

Index height: 2.0000000000000000 (simulated)

# Index height: 2.0000000000000000 (actual)

This is the number of levels in the index B-tree, excluding the leaf level.

Derived statistics:

# Data page cluster ratio: 0.0007862163506122 (simulated)

# Data page cluster ratio: 0.0007862163506122 (actual)

# Index page cluster ratio: 0.9985517740767560 (simulated)

# Index page cluster ratio: 0.9985517740767560 (actual)

# Data row cluster ratio: 0.0000782344615691 (simulated)

# Data row cluster ratio: 0.0000782344615691 (actual)

The derived statistics are calculated using the three CR count values seen previously in the output. Data page CR count, Index page CR count and Data row CR count. The derived statistics cannot be directly changed via any optdiag mode. The closer the derived statistics are to 1 the better data clustering is. As the CR count values increase the values of the derived statistics decrease, indicating poorer data clustering.

COLUMN STATISTICS SECTION -

Column statistics include histograms, density values and the time of the most recent modification of column statistics. Each column in the specified table that has statistics will be listed. These values can be changed in optdiag simulate statistics mode the same as in statistics mode. For details see New Functionality in ASE 11.9.2.

Statistics for column: "o_custkey"

Last update of column statistics: Jan 28 1999 3:32:38:976PM

There is no need to change the two values above.

Range cell density: 0.0001066867555556

Total density: 0.0001066867555556

Range selectivity: default used (0.33)

In between selectivity: default used (0.25)

The above values can be changed. Changing them may have dramatic effects on query plans.

Range cell density:

This value represents the average number of duplicate values that fall into range cells. It does not include values that fall into frequency count cells. This value, along with a range cell's weight, is used to estimate the cost of search arguments (how many rows will qualify) whose value falls within a range cell. As the number of frequency count cells increases this value will decrease. If the column contains only frequency count cells this value will be 0. If you are simulating a column with all range cells there is no need to change this value. If you expect frequency count cells to appear in this column it is better to change the cell types than this value. Increasing the number of cells in the histogram via update statistics or create index is the easiest way to allow frequency cells to appear in the histogram. Cell types can also be changed via optdiag. Care must be taken when doing this. See "Editing Histograms" in 'New Functionality in ASE 11.9.2' for tips on how to best do this.

Total density:

This value represents the average number of duplicate values in all cells of the column (both range and frequency count cells). It is used when costing join statements. This value is also used for equi-sargs, search arguments using the '=' operator and when the search value is not known at run time. This value replaces the old 'magic number'.

Changing this value will effect the estimated costs of joins. If you are changing the values of the cells you will need to take into account changes to this value. This value will not automatically change when you change histogram values.

Range selectivity:

These values replace the 'magic numbers' used in earlier versions of ASE. When the optimizer does not know the value of a SARG when it is optimized the default selectivity value is used to establish selectivity of the SARG.

If a range SARG (<, <=,>,>=) is being optimized and the value is not known the Range selectivity value will be used, the default of this value is .33.

In between selectivity:

If a between SARG (beteen, col >= and col <=) is being optimized and the value is not known the Range selectivity value will be used, the default of this value is .25.

When changing either of these values make sure to replace the text with a decimal value.

Range selectivity: default used (0.33)

Range selectivity: 0.10

Histogram for column: "o_custkey"

Do not change this value.

Column datatype: integer

Do not change this value.

Requested step count: 20

Actual step count: 20

Step Weight Value

1 0.00000000 <= 0

2 0.05264667 <= 791

3 0.05272667 <= 1579

4 0.05268000 <= 2365

5 0.05269333 <= 3151

6 0.05264000 <= 3932

7 0.05264667 <= 4730

8 0.05266667 <= 5512

9 0.05264667 <= 6302

10 0.05268667 <= 7091

11 0.05266667 <= 7882

12 0.05263333 <= 8683

13 0.05262667 <= 9466

14 0.05268000 <= 10256

15 0.05268667 <= 11059

16 0.05267333 <= 11849

17 0.05264000 <= 12632

18 0.05269333 <= 13432

19 0.05271333 <= 14207

20 0.05195333 <= 14999

There are a few rules to keep in mind when editing the histogram.

The step numbers must increase monotonically. If they do not optdiag will fail with an error.

The weight of a cell must be between 1 and 0.

The sum of all the cell weights must be close to 1 (between 0.99 and 1.01).

See New Functionality in ASE 11.9.2 chapter 8 for more information on editing histograms.

No statistics for remaining columns: "o_clerk"

(default values used) "o_comment"

"o_orderdate"

"o_orderkey"

"o_orderpriority"

"o_orderstatus"

"o_shippriority"

"o_totalprice"

There is no need to change the above items unless you have manually added statistics to a column that did not previously have them.

AFFECTS ON THE DERIVED STATISTICS OF CHANGING THE ROW, PAGE AND INDEX LEAF PAGE COUNTS -

NOTE: Some outputs have been removed for brevity. Values changed for the simulation are in bold.

Before changes row count, page count and non-clustered index leaf count.

Table name: "orders"

Statistics for table: "orders"

Last update of simulated: Mar 29 1999 10:25AM

Data page count: 9398.0000000000000000 (simulated)

# Data page count: 9398.0000000000000000 (actual)

Data row count: 150000.0000000000000000 (simulated)

# Data row count: 150000.0000000000000000 (actual)

Statistics for index: "orders_ckey" (nonclustered)

Index column list: "o_custkey", "o_orderdate", "o_orderkey"

Last update of simulated: Mar 29 1999 10:25AM

Leaf count: 1579.0000000000000000 (simulated)

# Leaf count: 1579.0000000000000000 (actual)

Derived statistics:

# Data page cluster ratio: 0.0007862163506122 (simulated)

# Data page cluster ratio: 0.0007862163506122 (actual)

# Index page cluster ratio: 0.9985517740767560 (simulated)

# Index page cluster ratio: 0.9985517740767560 (actual)

# Data row cluster ratio: 0.0000782344615691 (simulated)

# Data row cluster ratio: 0.0000782344615691 (actual)

After increasing row and page counts and non-clustered leaf page count by 10.

Table name: "orders"

Statistics for table: "orders"

Last update of simulated: Mar 29 1999 10:29AM

Data page count: 93980.0000000000000000 (simulated)

# Data page count: 9398.0000000000000000 (actual)

Data row count: 1500000.0000000000000000 (simulated)

# Data row count: 150000.0000000000000000 (actual)

Derived statistics:

# Data page cluster ratio: 0.9999900000000001 (simulated)

# Data page cluster ratio: 1.0000000000000000 (actual)

Statistics for index: "orders_ckey" (nonclustered)

Index column list: "o_custkey", "o_orderdate", "o_orderkey"

Last update of simulated: Mar 29 1999 10:22AM

Leaf count: 15790.0000000000000000 (simulated)

# Leaf count: 1579.0000000000000000 (actual)

Derived statistics:

# Data page cluster ratio: 0.0008463480443573 (simulated)

# Data page cluster ratio: 0.0007862163506122 (actual)

# Index page cluster ratio: 0.9999900000000001 (simulated)

# Index page cluster ratio: 0.9985517740767560 (actual)

# Data row cluster ratio: 0.9601641796246286 (simulated)

# Data row cluster ratio: 0.0000782344615691 (actual)

The effects of the changes are dramatically seen in the index's data row cluster ratio. This illustrates the need to check the derived statistics as you change other values for the table and/or index. In the above derived statistics the simulated data row cluster ratio is much closer to being fully clustered than the actual value.

Effect on a clustered index is minimal. The data page cluster ratio in the derived statistics changes only slightly when the page and row counts are increased by a factor of ten.



Back to Top
© Copyright 2010, Sybase Inc.