MySQL 8.0 Reference Manual（读书笔记54节–Optimization and Indexes(1)）

The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.

Although it can be tempting【ˈtemptɪŋ 诱人的;吸引人的;有吸引力的;】 to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. Indexes also add to the cost of inserts, updates, and deletes because each index must be updated. You must find the right balance to achieve fast queries using the optimal set of indexes.

1 How MySQL Uses Indexes

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant【ˈreləvənt 相关的;有意义的;有价值的;切题的;】 rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially【səˈkwɛntʃəli 按顺序;继续地，从而;】.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions: Indexes on spatial【ˈspeɪʃl 空间的;】 data types use R-trees; MEMORY tables also support hash indexes; InnoDB uses inverted【ɪnˈvɜːrtɪd 反向的，倒转的，颠倒的，（尤指）倒置的;】 lists for FULLTEXT indexes.

MySQL uses indexes for these operations:

• To find the rows matching a WHERE clause quickly.

• To eliminate【ɪˈlɪmɪneɪt 消除;排除;消灭，干掉(尤指敌人或对手);清除;(比赛中)淘汰;】 rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index).

• If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

• To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not.

For comparisons between nonbinary string columns, both columns should use the same character set. For example, comparing a utf8mb4 column with a latin1 column precludes use of an index.

Comparison of dissimilar【dɪˈsɪmɪlər 不一样的;不相似的;不同的;】 columns (comparing a string column to a temporal【ˈtempərəl 时间的;颞的;世俗的;太阳穴的;现世的;世间的;】 or numeric column, for example) may prevent use of indexes if values cannot be compared directly without conversion. For a given value such as 1 in the numeric column, it might compare equal to any number of values in the string column such as '1', ' 1', '00001', or '01.e1'. This rules out use of any indexes for the string column.

• To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor【预处理器;预处理;预处理程序;前处理;前处理器;】 that checks whether you are using WHERE key_part_N = constant on all key parts that occur before key_col in the index. In this case, MySQL does a single key lookup for each MIN() or MAX() expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:

SELECT MIN(key_part2),MAX(key_part2)
 FROM tbl_name WHERE key_part1=10;

• To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable index (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. (Or, if the index is a descending index, the key is read in forward order.)

• In some cases, a query can be optimized to retrieve values without consulting【kənˈsʌltɪŋ 咨询，请教;查阅，查询;(与某人)商议，商量(以得到许可或帮助决策);参看;】 the data rows. (An index that provides all the necessary results for a query is called a covering index.) If a query uses from a table only columns that are included in some index, the selected values can be retrieved from the index tree for greater speed:

SELECT key_part3 FROM tbl_name
 WHERE key_part1=1

Indexes are less important for queries on small tables, or big tables where report queries process most or all of the rows. When a query needs to access most of the rows, reading sequentially is faster than working through an index. Sequential reads minimize disk seeks, even if not all the rows are needed for the query.

2.Primary Key Optimization

The primary key for a table represents【ˌreprɪˈzents 代表;意味着;相当于;等于;维护…的利益;作为…的代言人;】 the column or set of columns that you use in your most vital【ˈvaɪtl 至关重要的，必不可少的;生命的;生命统计的，生死统计的;生气勃勃的，充满生机的;致命的，生死攸关的;对…极重要的;维持生命所必需的;热情洋溢的;】 queries. It has an associated index, for fast query performance. Query performance benefits from the NOT NULL optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.

If your table is big and important, but does not have an obvious【ˈɑːbviəs 明显的;显然的;当然的;公认的;平淡无奇的;易理解的;无创意的;因显而易见而不必要的;】 column or set of columns to use as a primary key, you might create a separate column with auto-increment values to use as the primary key. These unique IDs can serve as pointers to corresponding【ˌkɔːrəˈspɑːndɪŋ 相应的;相关的;符合的;】 rows in other tables when you join tables using foreign keys.

3.SPATIAL Index Optimization

MySQL permits creation of SPATIAL indexes on NOT NULL geometry-valued columns.The optimizer checks the SRID attribute for indexed columns to determine which spatial reference system (SRS) to use for comparisons, and uses calculations appropriate to the SRS. (Prior to MySQL 8.0, the optimizer performs comparisons of SPATIAL index values using Cartesian calculations; the results of such operations are undefined if the column contains values with non-Cartesian SRIDs.)

For comparisons to work properly, each column in a SPATIAL index must be SRID-restricted. That is, the column definition must include an explicit SRID attribute, and all column values must have the same SRID.

The optimizer considers SPATIAL indexes only for SRID-restricted columns:

• Indexes on columns restricted to a Cartesian SRID enable Cartesian bounding box computations.

• Indexes on columns restricted to a geographic【ˌdʒiəˈgræfɪk 地理（学）（上）的；地区（性）的;】 SRID enable geographic bounding box computations.

The optimizer ignores SPATIAL indexes on columns that have no SRID attribute (and thus are not SRIDrestricted). MySQL still maintains such indexes, as follows:

• They are updated for table modifications (INSERT, UPDATE, DELETE, and so forth). Updates occur as though the index was Cartesian, even though the column might contain a mix of Cartesian and geographical values.

• They exist only for backward compatibility (for example, the ability to perform a dump in MySQL 5.7 and restore in MySQL 8.0). Because SPATIAL indexes on columns that are not SRID-restricted are of no use to the optimizer, each such column should be modified:

• Verify that all values within the column have the same SRID. To determine the SRIDs contained in a geometry column col_name, use the following query:

SELECT DISTINCT ST_SRID(col_name) FROM tbl_name;

If the query returns more than one row, the column contains a mix of SRIDs. In that case, modify its contents so all values have the same SRID.

• Redefine the column to have an explicit SRID attribute.

• Recreate the SPATIAL index.

4 Foreign Key Optimization

If a table has many columns, and you query many different combinations of columns, it might be efficient to split the less-frequently used data into separate tables with a few columns each, and relate them back to the main table by duplicating the numeric ID column from the main table. That way, each small table can have a primary key for fast lookups of its data, and you can query just the set of columns that you need using a join operation. Depending on how the data is distributed, the queries might perform less I/O and take up less cache memory because the relevant columns are packed together on disk. (To maximize performance, queries try to read as few data blocks as possible from disk; tables with only a few columns can fit more rows in each data block.)

5.Column Indexes

The most common type of index involves a single column, storing copies of the values from that column in a data structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE clause.

The maximum number of indexes per table and the maximum index length is defined per storage engine.

All storage engines support at least 16 indexes per table and a total index length of at least 256 bytes. Most storage engines have higher limits.

5.1 Index Prefixes

With col_name(N) syntax in an index specification for a string column, you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:

CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));

Prefixes can be up to 767 bytes long for InnoDB tables that use the REDUNDANT or COMPACT row format. The prefix length limit is 3072 bytes for InnoDB tables that use the DYNAMIC or COMPRESSED row format. For MyISAM tables, the prefix length limit is 1000 bytes.

【Prefix limits are measured in bytes, whereas the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB). Take this into account when specifying a prefix length for a nonbinary string column that uses a multibyte character set.】

If a search term exceeds the index prefix length, the index is used to exclude non-matching rows, and the remaining rows are examined for possible matches.

5.2 FULLTEXT Indexes

FULLTEXT indexes are used for full-text searches. Only the InnoDB and MyISAM storage engines support FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column prefix indexing is not supported.

Optimizations are applied to certain kinds of FULLTEXT queries against single InnoDB tables. Queries with these characteristics are particularly efficient:

• FULLTEXT queries that only return the document ID, or the document ID and the search rank.

• FULLTEXT queries that sort the matching rows in descending order of score and apply a LIMIT clause to take the top N matching rows. For this optimization to apply, there must be no WHERE clauses and only a single ORDER BY clause in descending order.

• FULLTEXT queries that retrieve only the COUNT(*) value of rows matching a search term, with no additional WHERE clauses. Code the WHERE clause as WHERE MATCH(text) AGAINST ('other_text'), without any > 0 comparison operator.

For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the process of developing an execution plan.

An implication【ˌɪmplɪˈkeɪʃn 可能的影响(或作用、结果);含意;(被)牵连，牵涉;暗指;】 of this behavior is that EXPLAIN for full-text queries is typically slower than for non-full-text queries for which no expression evaluation occurs during the optimization phase.

EXPLAIN for full-text queries may show Select tables optimized away in the Extra column due to matching occurring during optimization; in this case, no table access need occur during later execution.

5.3 Spatial Indexes

You can create indexes on spatial【ˈspeɪʃl 空间的;】 data types. MyISAM and InnoDB support R-tree indexes on spatial types. Other storage engines use B-trees for indexing spatial types (except for ARCHIVE, which does not support spatial type indexing).

5.4 Indexes in the MEMORY Storage Engine

The MEMORY storage engine uses HASH indexes by default, but also supports BTREE indexes.

6 Multiple-Column Indexes

MySQL can create composite【kəmˈpɑːzət 复合的;合成的;混成的;混合成的;】 indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.

【As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns. In MySQL, it is very easy to use this extra column:

SELECT * FROM tbl_name
 WHERE hash_col=MD5(CONCAT(val1,val2))
 AND col1=val1 AND col2=val2;

】

Suppose that a table has the following specification:

CREATE TABLE test (
 id INT NOT NULL,
 last_name CHAR(30) NOT NULL,
 first_name CHAR(30) NOT NULL,
 PRIMARY KEY (id),
 INDEX name (last_name,first_name)
);

The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name index is used for lookups in the following queries:

SELECT * FROM test WHERE last_name='Jones';
SELECT * FROM test
 WHERE last_name='Jones' AND first_name='John';
SELECT * FROM test
 WHERE last_name='Jones'
 AND (first_name='John' OR first_name='Jon');
SELECT * FROM test
 WHERE last_name='Jones'
 AND first_name >='M' AND first_name < 'N';

However, the name index is not used for lookups in the following queries:

SELECT * FROM test WHERE first_name='John';
SELECT * FROM test
 WHERE last_name='Jones' OR first_name='John';

Suppose that you issue the following SELECT statement:

SELECT * FROM tbl_name
 WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer attempts to use the Index Merge optimization, or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but do not use an index to perform lookups because (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

7 Verifying Index Usage

Always check whether all your queries really use the indexes that you have created in the tables. Use the EXPLAIN statement.

8 InnoDB and MyISAM Index Statistics Collection

Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.

MySQL uses the average value group size in the following ways:

• To estimate how many rows must be read for each ref access

• To estimate how many rows a partial join produces, that is, the number of rows produced by an operation of the form

(...) JOIN tbl_name ON tbl_name.key = expr

As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.

The average value group size is related to table cardinality, which is the number of value groups. The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table.

For a join based on the <=> comparison operator, NULL is not treated differently from any other value: NULL <=> NULL, just as N <=> N for any other N.

However, for a join based on the = operator, NULL is different from non-NULL values: expr1 = expr2 is not true when expr1 or expr2 (or both) are NULL. This affects ref accesses for comparisons of the form tbl_name.key = expr: MySQL does not access the table if the current value of expr is NULL, because the comparison cannot be true.

For = comparisons, it does not matter how many NULL values are in the table. For optimization purposes, the relevant value is the average size of the non-NULL value groups. However, MySQL does not currently enable that average size to be collected or used.

For InnoDB and MyISAM tables, you have some control over collection of table statistics by means of the innodb_stats_method and myisam_stats_method system variables, respectively. These variables have three possible values, which differ as follows:

• When the variable is set to nulls_equal, all NULL values are treated as identical (that is, they all form a single value group).

If the NULL value group size is much higher than the average non-NULL value group size, this method skews the average value group size upward. This makes index appear to the optimizer to be less useful than it really is for joins that look for non-NULL values. Consequently, the nulls_equal method may cause the optimizer not to use the index for ref accesses when it should.

• When the variable is set to nulls_unequal, NULL values are not considered the same. Instead, each NULL value forms a separate value group of size 1.

If you have many NULL values, this method skews the average value group size downward. If the average non-NULL value group size is large, counting NULL values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-NULL values. Consequently, the nulls_unequal method may cause the optimizer to use this index for ref lookups when other methods may be better.

• When the variable is set to nulls_ignored, NULL values are ignored.

If you tend to use many joins that use <=> rather than =, NULL values are not special in comparisons and one NULL is equal to another. In this case, nulls_equal is the appropriate statistics method.

The innodb_stats_method system variable has a global value; the myisam_stats_method system variable has both global and session values. Setting the global value affects statistics collection for tables from the corresponding storage engine. Setting the session value affects statistics collection only for the current client connection. This means that you can force a table's statistics to be regenerated with a given method without affecting other clients by setting the session value of myisam_stats_method.

To regenerate MyISAM table statistics, you can use any of the following methods:

• Execute myisamchk --stats_method=method_name --analyze

• Change the table to cause its statistics to go out of date (for example, insert a row and then delete it), and then set myisam_stats_method and issue an ANALYZE TABLE statement

Some caveats regarding the use of innodb_stats_method and myisam_stats_method:

• You can force table statistics to be collected explicitly, as just described. However, MySQL may also collect statistics automatically. For example, if during the course of executing statements for a table, some of those statements modify the table, MySQL may collect statistics. (This may occur for bulk inserts or deletes, or some ALTER TABLE statements, for example.) If this happens, the statistics are collected using whatever value innodb_stats_method or myisam_stats_method has at the time. Thus, if you collect statistics using one method, but the system variable is set to the other method when a table's statistics are collected automatically later, the other method is used.

• There is no way to tell which method was used to generate statistics for a given table.

• These variables apply only to InnoDB and MyISAM tables. Other storage engines have only one method for collecting table statistics. Usually it is closer to the nulls_equal method.