Appendix D Restrictions and Limits

Table of Contents

D.1 Restrictions on Subqueries
D.2 Restrictions on Character Sets
D.3 Limits in MySQL
D.3.1 Limits of Joins
D.3.2 The Maximum Number of Columns Per Table
D.3.3 Windows Platform Limitations

The discussion here describes restrictions that apply to the use of MySQL features such as subqueries.

D.1 Restrictions on Subqueries

  • Known bug: If you compare a NULL value to a subquery using ALL, ANY, or SOME, and the subquery returns an empty result, the comparison might evaluate to the nonstandard result of NULL rather than to TRUE or FALSE. This is fixed in MySQL 5.0.36 and 5.1.16.

  • A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

  • Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.

    A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

    The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

    SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
    

    The optimizer rewrites the statement to a correlated subquery:

    SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
    

    If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

    An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

  • In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:

    DELETE FROM t WHERE ... (SELECT ... FROM t ...);
    UPDATE t ... WHERE col = (SELECT ... FROM t ...);
    {INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
    

    Exception: The preceding prohibition does not apply if you are using a subquery for the modified table in the FROM clause. Example:

    UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS _t ...);
    

    Here the result from the subquery in the FROM clause is stored as a temporary table, so the relevant rows in t have already been selected by the time the update to t takes place.

  • Row comparison operations are only partially supported:

    • For expr IN (subquery), expr can be an n-tuple (specified using row constructor syntax) and the subquery can return rows of n-tuples.

    • For expr op {ALL|ANY|SOME} (subquery), expr must be a scalar value and the subquery must be a column subquery; it cannot return multiple-column rows.

    In other words, for a subquery that returns rows of n-tuples, this is supported:

    (val_1, ..., val_n) IN (subquery)
    

    But this is not supported:

    (val_1, ..., val_n) op {ALL|ANY|SOME} (subquery)
    

    The reason for supporting row comparisons for IN but not for the others is that IN is implemented by rewriting it as a sequence of = comparisons and AND operations. This approach cannot be used for ALL, ANY, or SOME.

  • Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:

    (col1, col2, ...) = (val1, val2, ...)
    col1 = val1 AND col2 = val2 AND ...
    
  • Subqueries in the FROM clause cannot be correlated subqueries. They are materialized (executed to produce a result set) before evaluating the outer query, so they cannot be evaluated per row of the outer query.

  • MySQL does not support LIMIT in subqueries for certain subquery operators:

    mysql> SELECT * FROM t1
        ->   WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
    ERROR 1235 (42000): This version of MySQL doesn't yet support
     'LIMIT & IN/ALL/ANY/SOME subquery'
    
  • The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.

    An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example:

    SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);
    

    That statement can be rewritten as follows:

    SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;
    

    But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery.

  • MySQL permits a subquery to refer to a stored function that has data-modifying side effects such as inserting rows into a table. For example, if f() inserts rows, the following query can modify data:

    SELECT ... WHERE x IN (SELECT f() ...);
    

    This behavior is nonstandard (not permitted by the SQL standard). In MySQL, it can produce indeterminate results because f() might be executed a different number of times for different executions of a given query depending on how the optimizer chooses to handle it.

    For replication, one implication of this indeterminism is that such a query can produce different results on the master and its slaves.

  • Possible future optimization: MySQL does not rewrite the join order for subquery evaluation. In some cases, a subquery could be executed more efficiently if MySQL rewrote it as a join. This would give the optimizer a chance to choose between more execution plans. For example, it could decide whether to read one table or the other first.

    Example:

    SELECT a FROM outer_table AS ot
    WHERE a IN (SELECT a FROM inner_table AS it WHERE ot.b = it.b);
    

    For that query, MySQL always scans outer_table first and then executes the subquery on inner_table for each row. If outer_table has a lot of rows and inner_table has few rows, the query probably will not be as fast as it could be.

    The preceding query could be rewritten like this:

    SELECT a FROM outer_table AS ot, inner_table AS it
    WHERE ot.a = it.a AND ot.b = it.b;
    

    In this case, we can scan the small table (inner_table) and look up rows in outer_table, which will be fast if there is an index on (ot.a,ot.b).

  • Possible future optimization: A correlated subquery is evaluated for each row of the outer query. A better approach is that if the outer row values do not change from the previous row, do not evaluate the subquery again. Instead, use its previous result.

  • Possible future optimization: A subquery in the FROM clause is evaluated by materializing the result into a temporary table, and this table does not use indexes. This does not allow the use of indexes in comparison with other tables in the query, although that might be useful.

  • Possible future optimization: If a subquery in the FROM clause resembles a view to which the merge algorithm can be applied, rewrite the query and apply the merge algorithm so that indexes can be used. The following statement contains such a subquery:

    SELECT * FROM (SELECT * FROM t1 WHERE t1.t1_col) AS _t1, t2 WHERE t2.t2_col;
    

    The statement can be rewritten as a join like this:

    SELECT * FROM t1, t2 WHERE t1.t1_col AND t2.t2_col;
    

    This type of rewriting would provide two benefits:

    • It avoids the use of a temporary table for which no indexes can be used. In the rewritten query, the optimizer can use indexes on t1.

    • It gives the optimizer more freedom to choose between different execution plans. For example, rewriting the query as a join enables the optimizer to use t1 or t2 first.

  • Possible future optimization: For IN, = ANY, <> ANY, = ALL, and <> ALL with uncorrelated subqueries, use an in-memory hash for a result or a temporary table with an index for larger results. Example:

    SELECT a FROM big_table AS bt
    WHERE non_key_field IN (SELECT non_key_field FROM table WHERE condition)
    

    In this case, we could create a temporary table:

    CREATE TABLE t (key (non_key_field))
    (SELECT non_key_field FROM table WHERE condition)
    

    Then, for each row in big_table, do a key lookup in t based on bt.non_key_field.

D.2 Restrictions on Character Sets

  • Identifiers are stored in mysql database tables (user, db, and so forth) using utf8, but identifiers can contain only characters in the Basic Multilingual Plane (BMP). Supplementary characters are not permitted in identifiers.

  • The ucs2 character sets has the following restrictions:

    • It cannot be used as a client character set, which means that it does not work for SET NAMES or SET CHARACTER SET. (See Section 9.1.4, “Connection Character Sets and Collations”.)

    • It is currently not possible to use LOAD DATA INFILE to load data files that use this character set.

    • FULLTEXT indexes cannot be created on a column that this character set. However, you can perform IN BOOLEAN MODE searches on the column without an index.

  • The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

D.3 Limits in MySQL

This section lists current limits in MySQL 4.1.

D.3.1 Limits of Joins

In MySQL 4.1, the maximum number of tables that can be referenced in a single join is 61. This also applies to the number of tables that can be referenced in the definition of a view.

D.3.2 The Maximum Number of Columns Per Table

There is a hard limit of 4096 columns per table, but the effective maximum may be less for a given table. The exact limit depends on several interacting factors, listed in the following discussion.

  • Every table has a maximum row size of 65,535 bytes. This maximum applies to all storage engines, but a given engine might have additional constraints that result in a lower effective maximum row size.

    The maximum row size constrains the number of columns because the total width of all columns cannot exceed this size. For example, utf8 characters require up to three bytes per character, so for a CHAR(255) CHARACTER SET utf8 column, the server must allocate 255 × 3 = 765 bytes per value. Consequently, a table cannot contain more than 65,535 / 765 = 85 such columns.

    Storage for variable-length columns includes length bytes, which are assessed against the row size. For example, a VARCHAR(255) CHARACTER SET utf8 column takes two bytes to store the length of the value, so each value can take up to 767 bytes.

    BLOB and TEXT columns count from one to four plus eight bytes each toward the row-size limit because their contents are stored separately.

    Declaring columns NULL can reduce the maximum number of columns permitted. NULL columns require additional space in the row to record whether their values are NULL.

    For MyISAM and ISAM tables, each NULL column takes one bit extra, rounded up to the nearest byte. The maximum row length in bytes can be calculated as follows:

    row length = 1
                 + (sum of column lengths)
                 + (number of NULL columns + delete_flag + 7)/8
                 + (number of variable-length columns)
    

    delete_flag is 1 for tables with static row format. Static tables use a bit in the row record for a flag that indicates whether the row has been deleted. delete_flag is 0 for dynamic tables because the flag is stored in the dynamic row header.

    These calculations do not apply for InnoDB tables, for which storage size is no different for NULL columns than for NOT NULL columns.

  • Each table has an .frm file that contains the table definition. The server uses the following expression to check some of the table information stored in the file against an upper limit of 64KB:

    if (info_length+(ulong) create_fields.elements*FCOMP+288+
        n_length+int_length+com_length > 65535L || int_count > 255)
    

    The portion of the information stored in the .frm file that is checked against the expression cannot grow beyond the 64KB limit, so if the table definition reaches this size, no more columns can be added.

    The relevant factors in the expression are:

    • info_length is space needed for screens. This is related to MySQL's Unireg heritage.

    • create_fields.elements is the number of columns.

    • FCOMP is 17.

    • n_length is the total length of all column names, including one byte per name as a separator.

    • int_length is related to the list of values for ENUM and SET columns.

    • com_length is the total length of column comments.

    Thus, using long column names can reduce the maximum number of columns, as can the inclusion of ENUM or SET columns, or use of column comments.

  • Individual storage engines might impose additional restrictions that limit table column count. Examples:

    • InnoDB permits no more than 1000 columns.

    • InnoDB restricts row size to something less than half a database page (approximately 8000 bytes), not including VARBINARY, VARCHAR, BLOB, or TEXT columns.

D.3.3 Windows Platform Limitations

The following limitations apply to use of MySQL on the Windows platform:

  • Number of file descriptors

    The number of open file descriptors on Windows is limited to a maximum of 2048, which may limit the ability to open a large number of tables simultaneously. This limit is due not to Windows but to C runtime library compatibility functions used to open files on Windows that use the POSIX compatibility layer.

    This limitation will also cause problems if you try to set open_files_limit to a value greater than the 2048 file limit.

  • Process memory

    On Windows 32-bit platforms it is not possible by default to use more than 2GB of RAM within a single process, including MySQL. This is because the physical address limit on Windows 32-bit is 4GB and the default setting within Windows is to split the RAM between kernel (2GB) and user/applications (2GB).

    You can increase this limit to 3GB by specifying the /3GB option in the boot.ini file. This changes the kernel/application memory split to 1GB and 3GB respectively. This boot option is available on Windows XP, Windows Server 2003, and Windows Server 2008.

    Some versions of Windows have a boot time setting to enable larger applications by reducing the kernel application. Alternatively, to use more than 2GB, use a 64-bit version of Windows.

  • File system aliases

    When using MyISAM tables, you cannot use aliases within Windows link to the data files on another volume and then link back to the main MySQL datadir location.

    This facility is often used to move the data and index files to a RAID or other fast solution, while retaining the main .frm files in the default data directory configured with the datadir option.

  • Limited number of ports

    Windows systems have about 4,000 ports available for client connections, and after a connection on a port closes, it takes two to four minutes before the port can be reused. In situations where clients connect to and disconnect from the server at a high rate, it is possible for all available ports to be used up before closed ports become available again. If this happens, the MySQL server appears to be unresponsive even though it is running. Note that ports may be used by other applications running on the machine as well, in which case the number of ports available to MySQL is lower.

    For more information about this problem, see http://support.microsoft.com/default.aspx?scid=kb;en-us;196271.

  • Concurrent reads

    MySQL depends on the pread() and pwrite() calls to be able to mix INSERT and SELECT. Currently, we use mutexes to emulate pread()/pwrite(). We will, in the long run, replace the file level interface with a virtual interface so that we can use the readfile()/writefile() interface on NT, 2000, and XP to get more speed. The current implementation limits the number of open files that MySQL can use to 2,048 (1,024 before MySQL 4.0.19), which means that you cannot run as many concurrent threads on NT, 2000, XP, and 2003 as on Unix.

    This problem is fixed in MySQL 5.5.

  • Blocking read

    MySQL uses a blocking read for each connection. That has the following implications if named-pipe connections are enabled:

    • A connection is not disconnected automatically after eight hours, as happens with the Unix version of MySQL.

    • If a connection hangs, it is impossible to break it without killing MySQL.

    • mysqladmin kill does not work on a sleeping connection.

    • mysqladmin shutdown cannot abort as long as there are sleeping connections.

    These problems are fixed in MySQL 5.1. (Bug #31621)

  • ALTER TABLE

    While you are executing an ALTER TABLE statement, the table is locked from being used by other threads. This has to do with the fact that on Windows, you cannot delete a file that is in use by another thread. In the future, we may find some way to work around this problem.

  • DROP TABLE

    DROP TABLE on a table that is in use by a MERGE table does not work on Windows because the MERGE handler does the table mapping hidden from the upper layer of MySQL. Because Windows does not permit you to drop files that are open, you first must flush all MERGE tables (with FLUSH TABLES) or drop the MERGE table before dropping the table. We will fix this at the same time we introduce views.

  • DATA DIRECTORY and INDEX DIRECTORY

    The DATA DIRECTORY and INDEX DIRECTORY options for CREATE TABLE are ignored on Windows, because MySQL does not support Windows symbolic links. These options also are ignored on systems that have a nonfunctional realpath() call.

  • DROP DATABASE

    You cannot drop a database that is in use by another session.

  • Case-insensitive names

    File names are not case sensitive on Windows, so MySQL database and table names are also not case sensitive on Windows. The only restriction is that database and table names must be specified using the same case throughout a given statement. See Section 8.2.2, “Identifier Case Sensitivity”.

  • Directory and file names

    On Windows, MySQL Server supports only directory and file names that are compatible with the current ANSI code pages. For example, the following Japanese directory name will not work in the Western locale (code page 1252):

    datadir="C:/私たちのプロジェクトのデータ"
    

    The same limitation applies to directory and file names referred to in SQL statements, such as the data file path name in LOAD DATA INFILE.

  • The \ path name separator character

    Path name components in Windows are separated by the \ character, which is also the escape character in MySQL. If you are using LOAD DATA INFILE or SELECT ... INTO OUTFILE, use Unix-style file names with / characters:

    mysql> LOAD DATA INFILE 'C:/tmp/skr.txt' INTO TABLE skr;
    mysql> SELECT * INTO OUTFILE 'C:/tmp/skr.txt' FROM skr;
    

    Alternatively, you must double the \ character:

    mysql> LOAD DATA INFILE 'C:\\tmp\\skr.txt' INTO TABLE skr;
    mysql> SELECT * INTO OUTFILE 'C:\\tmp\\skr.txt' FROM skr;
    
  • Problems with pipes

    Pipes do not work reliably from the Windows command-line prompt. If the pipe includes the character ^Z / CHAR(24), Windows thinks that it has encountered end-of-file and aborts the program.

    This is a problem mainly when you try to apply a binary log as follows:

    shell> mysqlbinlog binary_log_file | mysql --user=root
    

    If you have a problem applying the log and suspect that it is because of a ^Z / CHAR(24) character, you can use the following workaround:

    shell> mysqlbinlog binary_log_file --result-file=/tmp/bin.sql
    shell> mysql --user=root --execute "source /tmp/bin.sql"
    

    The latter command also can be used to reliably read in any SQL file that may contain binary data.

  • Killing MySQL from the Task Manager

    On Windows 95, you cannot kill MySQL from the Task Manager or with the shutdown utility. You must stop it with mysqladmin shutdown or the NET STOP ... command.

  • Windows 95 and threads

    Windows 95 leaks about 200 bytes of main memory for each thread creation. Each connection in MySQL creates a new thread, so you should not run mysqld for an extended time on Windows 95 if your server handles many connections! Newer versions of Windows do not suffer from this bug.