Improving Performance in SQL Anywhere
So You Want to Improve Performance
Join the club. Performance is an issue on everyone's agenda, since faster and better mean more effective, and ultimately, more profitable. While Adaptive Server Anywhere is designed to give you great performance right out of the box with very little intervention on your part, there is always room to fine tune your database to give you even better results.
Performance Analysis Tools
Before you can get to where you want to be in terms of performance, you have to know where you are. A variety of tools are available to help you analyze the current performance of your Adaptive Server Anywhere database. With every release, improvements to these tools add new capabilities and make them more powerful and better integrated. Tools include request-level logging, procedure profiling, graphical plans, the Performance Monitor and timing utilities.
Request-level logging is a good starting point for performance analysis of a specific application when it is not obvious whether the server or the client is at fault. It is also useful in determining the specific request to the server that might be responsible for problems.
Request level logging logs individual requests received from and responses sent to an application. It's most useful for determining what the server is being asked to do by the application.
Logged information includes timestamps, connection ids, and request type, for example. You can use the -zr database server option to specify what type of information is logged. You can redirect the output to a file for further analysis using the -zo option.
The sa_get_request_times ( [ request_log_filename [, connection_id ] ] ) stored procedure reads a request-level log and populates a global temporary table satmp_request_time with statements from the log and their execution times. The time recorded is straightforward for INSERT/UPDATE/DELETE statements. For queries, the time recorded is the total elapsed time from PREPARE to DROP (describe/open/fetch/close). That means you need to be aware of any open cursors.
Analyze satmp_request_time for candidates. Statements that are cheap but frequently executed may represent performance problems
You can use sa_get_request_profile( [ request_log_filename [, connection_id ] ] ) to call sa_get_request_times() and summarize the resulting satmp_request_time into another global temporary table satmp_request_profile. This procedure also groups statements together and provides the number of calls, execution times, and so on.
Procedure profiling shows you how long it takes your stored procedures, functions, events, and triggers to execute. You can also view the execution time for each line of a procedure. Using the database profiling information, you can determine which procedures can be fine-tuned to improve performance within your database.
Procedure profiling can help you analyze specific database procedures (including stored procedures, functions, events and triggers) found to be expensive via request level logging. It can also help you discover expensive, hidden, procedures, for example triggers, events, and nested stored procedure calls. As well, it can help pin-point potential problem areas within the body of a procedure
You can use stored procedures to view procedure profiling information that has been gathered by the server. The sa_procedure_profile_summary stored procedure provides information about all of the procedures within the database. You can use this procedure to view the profiling data for stored procedures, functions, events, and triggers within the same result set. However, a better way to examine this information is to use Sybase Central.
Profiling can be enabled/disabled dynamically and the data it generates is transient, stored in memory by the server. You can view it using the Profile tab in Sybase Central. Once profiling is enabled, the database gathers profiling information until you disable profiling or until the server is shut down. Profiling information is cumulative, and accurate to 1 ms.
The graphical plan feature in Interactive SQL displays the execution plan for a query. It is useful for diagnosing performance issues with specific queries. For example, the information in the plan may help you decide where to add an index to your database.
The graphical plan provides a great deal more information than the short or long plans. You can choose to see the graphical plan either with or without statistics. Both allow you to quickly view which parts of the plan have been estimated as the most expensive. The graphical plan with statistics, though more expensive to view, also provides the actual query execution statistics as monitored by the server when the query is executed, and permits direct comparison between the estimates used by the query optimizer in constructing the access plan with the actual statistics monitored during execution. Note, however, that the optimizer is often unable to precisely estimate a query's cost, so expect there to be differences. The graphical plan is the default format for access plans.
You can obtain detailed information about the nodes in the plan by clicking the node in the graphical diagram. The graphical plan with statistics shows you all the estimates that are provided with the graphical plan, but also shows actual runtime costs of executing the statement. To do this, the statement must actually be executed. This means that there may be a delay in accessing the plan for expensive queries. It also means that any parts of your query such as deletes or updates are actually executed, although you can perform a rollback to undo these changes.
Use the graphical plan with statistics when you are having performance problems, and the estimated row count or run time differs from your expectations. The graphical plan with statistics provides estimates and actual statistics for you to compare. A large difference between actual and estimate is a warning sign that the optimizer might not have sufficient information to prepare correct estimates.
Following are some of the key statistics you can check in the graphical plan with statistics, and some possible remedies:
Row count measures the rows in the result set. If the estimated row count is significantly different from the actual row count, the selectivity of underlying predicates is probably incorrect.
Accurate selectivity estimates are critical for the proper operation of the query optimizer. For example, if the optimizer mistakenly estimates a predicate to be highly selective (with, say, a selectivity of 5%), but in reality, the predicate is much less selective (for example, 50%), then performance may suffer. In general, estimates may not be precise. However, a significantly large error does indicate a possible problem. If the predicate is over a base column for which there does not exist a histogram, executing a CREATE STATISTICS statement to create a histogram may correct the problem. If selectivity error remains a problem then, as a last resort, you may wish to consider specifying a user estimate of selectivity along with the predicate in the query text.
Runtime measures the time to execute the query. If the runtime is incorrect for a table scan or index scan, you may improve performance by executing the REORGANIZE TABLE statement. You can use the sa_table_fragmentation and the sa_index_density function to determine whether the table or index are fragmented.
When the source of the estimate is Guess, the optimizer has no information to use, which may indicate a problem. If the estimate source is Index and the selectivity estimate is incorrect, your problem may be that the index is skewed: you may benefit from defragmenting the index with the REORGANIZE INDEX statement.
If the number of cache reads and cache hits are exactly the same, then your entire database is in cache—an excellent thing. When reads are greater than hits, it means that the server is attempting to go to cache but failing, and that it must read from disk. In some cases, such as hash joins, this is expected. In other cases, such as nested loops joins, a poor cache-hit ratio may indicate a performance problem, and you may benefit from increasing your cache size.
The Performance Monitor is useful for tracking detailed information about database server actions, including disk and memory access.
With the Sybase Central Performance Monitor, you can graph a variety of statistics of any Adaptive Server Anywhere database server that you can connect to in Sybase Central. All statistics in Sybase Central are shown in the Statistics folder.
Features of the Performance Monitor include:
Real-time updates (at adjustable intervals)
A color-coded and resizable legend
Configurable appearance properties
When you're using the Sybase Central Performance Monitor, note that it uses actual queries against the server to gather its statistics, so the monitor itself affects some statistics (such as Cache Reads/sec). As a more precise alternative, you can graph server statistics using the Windows Performance Monitor.
The Windows monitor has two advantages:
It offers more performance statistics (mainly those concerned with network communications).
Unlike the Sybase Central monitor, the Windows monitor is non-intrusive. It uses a shared-memory scheme instead of performing queries against the server, so it does not affect the statistics themselves.
If you run multiple versions of Adaptive Server Anywhere simultaneously, it is also possible to run multiple versions of the Performance Monitor simultaneously
Some performance testing utilities, including fetchtst, instest, and trantest, are available in the <installation dir>\samples\asa\ directory. Complete documentation can be found in the Readme.txt file in the same folder as the utility. These tools will give you more accurate timings than the graphical plan with statistics. These utilities can provide an indication of the best achievable performance (for example, throughput) for a given server and database configuration.
Fetchtst measures fetch rates for an arbitrary query. Instest determines the time required for rows to be inserted into a table. Trantest measures the load that can be handled by a given server configuration given a database design and a set of transactions.
When the database server processes a transaction, it can lock one or more rows of a table. The locks maintain the reliability of information stored in the database by preventing concurrent access by other transactions. They also improve the accuracy of result queries by identifying information which is in the process of being updated.
The database server places these locks automatically and needs no explicit instruction. It holds all the locks acquired by a transaction until the transaction is completed. The transaction that has access to the row is said to hold the lock. Depending on the type of lock, other transactions may have limited access to the locked row, or none at all.
Performance can be compromised if a row or rows are frequently accessed by a number of users simultaneously. If you suspect locking problems, consider using the sa_locks procedure to obtain information on locks in the database. If lock issues are identified, information on the connection processes involved can be found using the AppInfo connection property.
21 Tips to Improve Performance
Now that you have a better understanding of the current state of your database, you need to decide how to tweak things to get the most out of your system.
Ultimately, how well your database performs depends heavily on the design of its bones. And so, one of the most basic of ways of improving performance is with good schema design. The database schema is the skeleton of your database, and includes definitions of such things as tables, views, triggers, and the relationships between them. Re-evaluate your database schema and make note of the following areas where small changes can offer impressive gains.
Tip 1: Be wary of wide tables
Tables with a large number of columns are known as wide tables. When the number of columns in a table causes the size of individual rows to exceed the database page size, each row is split across two or more database pages. The more pages a row takes up, the longer it takes to read each row. If you find performance lacking, and you know you have tables with many columns, consider normalizing your tables to reduce the number of columns. If that is not possible, a larger database page size may be helpful, especially if most tables are wide.
Tip 2: Pare down primary keys
Wide primary keys are composed of two or more columns. The more columns contained in your primary key, the more demand there is on the server. Reducing the number of columns in your primary keys can improve performance.
Tip 3: Declare constraints
Undeclared primary key-foreign key relationships exist between tables when there is an implied relationship between the values of columns in different tables. It is true that not declaring the relationship can save time on index maintenance, however, declaring the relationship can improve performance of queries when joins take place because the cost model is able to do a better job of estimation.
Tip 4: Eliminate expensive and errant data types
Data types store information about specific sets of data, including ranges of values, the operations that can be performed on those values, and how the values are stored in memory. You can improve performance by using the appropriate data type for your data. For instance, avoid assigning a data type of char or string to values that only contain numeric data. And whenever possible, choose economical data types over the more expensive numeric and string types.
Tip 5: Apply AUTOINCREMENT
Primary key values must be unique. Although there are a variety of ways to create unique values for primary keys, the most efficient method is setting the default column value to be AUTOINCREMENT ON. You can use this default for any column in which you want to maintain unique values. Using the AUTOINCREMENT feature to generate primary key values is faster than other methods because the value is generated by the server.
Tip 6: Target triggers
Evaluate the use of triggers to see if some of the triggers could be replaced by features available in the server. For instance, triggers to update columns with the latest update time and user information can be replaced with the corresponding special values in the server. As well, using the default settings on existing triggers can also improve performance.
Tip 7: Curtail cascades
Cascading referential actions are costly in terms of performance, because they cause updates to multiple tables for every transaction. For example, if the foreign key from employee to department were defined with ON UPDATE CASCADE, then updating the department ID would automatically update the employee table. While cascading referential actions are convenient, sometimes it might be more efficient to implement them in application logic.
Tip 8: First come, first served
Columns in a row are accessed in a sequential manner in the order of their creation. For example, in order to access columns at the end of a row, Adaptive Server Anywhere has to skip over any columns that appear earlier in the row. Primary key columns are always stored at the beginning of rows. For this reason, it is important to create tables such that small and/or frequently accessed columns are placed before seldom accessed columns in the table.
Tip 9: Upgrade and win
Adaptive Server Anywhere is constantly being evaluated and enhanced in terms of performance features and usability. With each subsequent release, you can take advantage of new features and behavior changes that will help you optimize performance.
While you can simply run a database created with an older version of Adaptive Server Anywhere on a newer release, many of the new features are only available if the database is upgraded. Running the Upgrade utility adds and modifies system tables, system procedures and database options to upgrade a database from an older version of Adaptive Server Anywhere to a newer version.
Tip 10: Ready, steady, rebuild
Rebuilding your database is the process of unloading and reloading your entire database. It is sometimes called upgrading your database file format.
Rebuilding removes all the information, including data and schema, and puts it all back in a uniform fashion, thus filling space and improving performance much like defragmenting your disk drive. It also gives you the opportunity to change certain settings. In comparison, using the upgrade utility is quicker than rebuilding, and it does not affect the storage of your files.
When you upgrade your database, rebuilding also gives you access to all new features and performance enhancements in the latest version of the software.
Tip 11: Compare configurations
Go over your database/server configuration, and make sure that your hardware and file space are appropriate for your usage.
Tip 12: Apply proper page size
The page size you choose can affect the performance of your database. Adaptive Server Anywhere supports page sizes of (in bytes) 1024, 2048, 4096, 8192, 16384, or 32768, with 2048 being the default. There are advantages and disadvantages to whichever page size you choose.
While smaller pages hold less information and may force less efficient use of space, small page sizes allow Adaptive Server Anywhere to run with fewer resources because it can store more pages in a cache of the same size. Small pages are particularly useful if your database must run on small machines with limited memory. They can also help in situations when you use your database primarily to retrieve small pieces of information from random locations.
By contrast, a larger page size tends to benefit queries that perform sequential table scans. Larger page sizes also tend to benefit large databases. Often, the physical design of disks permits them to retrieve fewer large blocks more efficiently than many small ones. Other benefits of large page sizes include improving the fan-out of your indexes, thereby reducing the number of index levels, and allowing tables to include more columns.
Keep in mind that larger page sizes have additional memory requirements. And since the maximum number of rows stored on a page is 255, tables with small rows will not fill each page and therefore use space inefficiently. As well, extremely large page sizes (16 kb or 32 kb) are not recommended for most applications unless you can be sure that a large database server cache is always available. Investigate the effects of increased memory and disk space on performance characteristics before using 16 kb or 32 kb page sizes.
Tip 13: Fix file fragmentation
To eliminate operating system file fragmentation problems, periodically run one of the available disk defragmentation utilities. File fragmentation can have a detrimental impact on performance.
The database server determines the number of file fragments in the database file when you start a database on Windows NT/2000/XP, and displays the following information in the server message window when the number of fragments is greater than one:
Database file "mydatabase.db" consists of nnn fragments
You can also obtain the number of database file fragments using the DBFileFragments database property.
Tip 14: Diminish database fragmentation
Fragmentation occurs naturally as you make changes to your database. If fragmentation becomes excessive, performance can suffer. This becomes even more of an issue as your database increases in size.
If your database is highly fragmented, try reorganizing your tables to reduce database fragmentation and improve performance. In some cases, for example after extensive delete/update/insert activity on a number of tables, it might even be worthwhile to rebuild your database.
Indexes are designed to speed up searches on particular columns, but they can become fragmented if many DELETEs are performed on the indexed table. This may result in reduced performance if the index is accessed frequently and the cache is not large enough to hold all of the index.
The sa_index_density stored procedure provides information about the degree of fragmentation in a database's indexes. You must have DBA authority to run this procedure. The following statement calls the sa_index_density stored procedure:
CALL sa_index_density (['table_name'[,'owner_name']])
If your index is highly fragmented, you can run REORGANIZE TABLE. You can also drop the index and recreate it. However, if the index is a primary key, you will also have to drop and recreate the foreign key indexes.
As well, you can improve performance by creating a clustered index on a table. Clustered indexes cause table rows to be stored in approximately the same order as they appear in the index.
Table fragmentation occurs when rows are not stored contiguously, or when rows are split between multiple pages. Performance decreases because these rows require additional page accesses.
Adaptive Server Anywhere reserves extra room on each page to allow rows to grow slightly. When an update to a row causes it to grow beyond the space available on the current page reserve, the row is split and the initial row location contains a pointer to another page where the continuous row is stored. For example, filling empty rows with UPDATE statements or inserting new columns into a table can lead to severe row splitting. As more rows are stored on separate pages, more time is required to access the additional pages.
You can use the sa_table_fragmentation stored procedure to obtain information about the degree of fragmentation of your database tables. The following statement calls the sa_table_fragmentation stored procedure:
CALL sa_table_fragmentation (['table_name' [,'owner_name']])
There are three ways to minimize table fragmentation:
You can specify the percentage of space in a table page that should be reserved for future updates. This PCTFREE specification can be set with CREATE TABLE, ALTER TABLE, DECLARE LOCAL TEMPORARY TABLE, or LOAD TABLE.
You can reorganize specific tables. To defragment particular tables or parts of tables, you can run REORGANIZE TABLE. Reorganizing tables does not disrupt database access.
You can rebuild the entire the database. Rebuilding is more comprehensive in that it defragments all tables, including system tables. Additionally, it rearranges the table rows so they appear in the order specified by the clustered index and primary keys
Tip 15: Acquire adequate hardware
When running on a PC, make sure your server meets Adaptive Server Anywhere's minimum CPU, memory and disk requirements:
Windows Windows 95, Windows 98, Windows Me, Windows NT (v4.0+), Windows 2000, or Windows XP.
an Intel 486 or higher CPU, or the equivalent .
Adaptive Server Anywhere can run with as little as 4 Mb of memory. If you use Java in the database, Adaptive Server Anywhere requires 8 Mb of memory. If you are using the administration tools, Adaptive Server Anywhere requires at least 32 Mb of RAM. Your computer must have this much memory in addition to the requirements for the operating system.
Enough disk space to hold your database and log files.
Keep in mind that these are the minimums. If you are meeting only the minimum hardware requirements, and find that performance is suffering, consider upgrading some or all of your hardware. In general, evaluate the hardware configuration to see if it is adequate for the kind of work load being placed on the server.
Tip 16: Fiddle with file placement
Disk drives operate much more slowly than modern processors or RAM. Often, simply waiting for the disk to read or write pages is the reason that a database server is slow. You almost always improve database performance when you put different physical database files on different physical devices. For example, while one disk drive is busy swapping database pages to and from the cache, another device can be writing to the log file. To gain these benefits, the devices must be independent. A single disk, partitioned into smaller logical drives, is unlikely to yield benefits.
By placing the transaction log mirror file (if you use one) on a physically separate drive, you gain better protection against disk failure, and Adaptive Server Anywhere runs faster because it can efficiently write to the log and log mirror files. To specify the location of the transaction log and transaction log mirror files, use the dblog command line utility, or the Change Log File Settings utility in Sybase Central.
Adaptive Server Anywhere may need more space than is available to it in the cache for such operations as sorting and forming unions. When it needs this space, it generally uses it intensively. The overall performance of your database becomes heavily dependent on the speed of the device containing the temporary file. If the temporary file is on a fast device, physically separate from the one holding the database file, Adaptive Server Anywhere will run faster. This is because many of the operations that necessitate using the temporary file also require retrieving a lot of information from the database. Placing the information on two separate disks allows the operations to take place simultaneously.
Choose the location of your temporary file carefully. On Windows, Adaptive Server Anywhere examines the following environment variables, in the order ASTMP, TMP, TMPDIR, and TEMP to determine the directory in which to place the temporary file. If none of these is defined, Adaptive Server Anywhere places its temporary file in the current directory—not a good location for the best performance.
You can also divide your database into multiple dbspaces, located on separate devices to improve performance. In such a case, group tables in the separate dbspaces so that common join operations read information from different files.
A similar strategy involves placing the temporary and database files on a RAID device or a Windows NT stripe set. Although such devices act as a logical drive, they dramatically improve performance by distributing files over many physical drives and accessing the information using multiple heads.
Tip 17: Pick quick or complete
The OPTIMIZATION_GOAL option controls whether Adaptive Server Anywhere optimizes SQL statements for response time (first-row) or for total resource consumption (all-rows). In simpler terms, you can pick whether to optimize query processing towards returning the first row quickly, or towards minimizing the cost of returning the complete result set.
If the option is set to first-row, Adaptive Server Anywhere chooses an access plan that is intended to reduce the time to fetch the first row of the query's result, possibly at the expense of total retrieval time. In particular, the Adaptive Server Anywhere optimizer will typically avoid, if possible, access plans that require the materialization of results in order to reduce the time to return the first row. With this setting, for example, the optimizer favors access plans that utilize an index to satisfy a query's ORDER BY clause, rather than plans that require an explicit sorting operation.
You can use the FASTFIRSTROW table hint in a query's FROM clause to set the optimization goal for a specific query to first-row, without having to change the OPTIMIZATION_GOAL setting.