MySQL中字符串查询效率大比拼 |
您所在的位置:网站首页 › mysql查询空字符串 › MySQL中字符串查询效率大比拼 |
背景
最近有个同事对字符串加索引,加完后,发现多了个奇奇怪怪的数字执行的SQL如下: alter table string_index_test add index `idx_name` (`name`) USING BTREE;这个奇怪数字就是191,它很是疑惑,也没指定索引的长度 InnoDB has a maximum index length of 767 bytes for tables that use COMPACT or REDUNDANT row format, so for utf8mb3 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8mb3 columns with indexes longer than 191 characters, you must index a smaller number of characters. In an InnoDB table that uses COMPACT or REDUNDANT row format, these column and index definitions are legal:col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255)) To use utf8mb4 instead, the index must be smaller:col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191)) 大概意思就是InnoDB最大索引长度为 767 字节数,用的编码是utf8mb4,则可以存储191个字符(767/4 约等于 191),编码字段长度超出最大索引长度后MySQL 默认在普通索引追加了191 思考 1、MySQL中如何提高字符串查询效率?对字符串加索引?一般情况下,是不建议在字符串加索引,占空间如果一定要加,建议可以指定长度,前提是字符串前面部分区分度好的话,此时这类索引就叫前缀索引 2、前缀索引有什么问题?区分度不好的话,很容易发生碰撞,进而引发一系列问题我们再通过执行计划来分析一波 准备了单表100W的数据进行测试使用性能压力测试工具mysqlslap性能测试脚本 mysqlslap -uroot -p --concurrency=100,200 --iterations=1 --number-of-queries=1 --create-schema=test --query=C:\xxx\query.sql–concurrency=100,200 测试并发的线程数/客户端数,第一次100,第二次200–iterations=1 指定测试重复次数1次–number-of-queries=1 指定每个线程执行的 SQL 语句数量上限(不精确)–create-schema=test 指定查询的数据库test 1、不加索引查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’; Benchmark Average number of seconds to run all queries: 8.328 seconds Minimum number of seconds to run all queries: 8.328 seconds Maximum number of seconds to run all queries: 8.328 seconds Number of clients running queries: 100 Average number of queries per client: 0 Benchmark Average number of seconds to run all queries: 18.078 seconds Minimum number of seconds to run all queries: 18.078 seconds Maximum number of seconds to run all queries: 18.078 seconds Number of clients running queries: 200 Average number of queries per client: 02、加字符串索引alter table string_index_test add index idx_name (name) USING BTREE;查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’; Benchmark Average number of seconds to run all queries: 0.250 seconds Minimum number of seconds to run all queries: 0.250 seconds Maximum number of seconds to run all queries: 0.250 seconds Number of clients running queries: 100 Average number of queries per client: 0 Benchmark Average number of seconds to run all queries: 1.438 seconds Minimum number of seconds to run all queries: 1.438 seconds Maximum number of seconds to run all queries: 1.438 seconds Number of clients running queries: 200 Average number of queries per client: 03、使用CRC32创建索引 CRC全称为Cyclic Redundancy Check,又叫循环冗余校验。CRC32是CRC算法的一种,返回值的范围0~2^32-1,使用bigint存储 加一个name_crc32列,创建这个列的所有,索引空间小很多,利用整型加速查询加索引:alter table string_index_test add index idx_nam_crc32 (name_crc32) USING BTREE;查询的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name_crc32=CRC32(‘forlan’) and name=‘forlan’;因为CRC32存在发生碰撞,所以加上name条件,才能筛选出正确的数据 Benchmark Average number of seconds to run all queries: 0.266 seconds Minimum number of seconds to run all queries: 0.266 seconds Maximum number of seconds to run all queries: 0.266 seconds Number of clients running queries: 100 Average number of queries per client: 0 Benchmark Average number of seconds to run all queries: 0.390 seconds Minimum number of seconds to run all queries: 0.390 seconds Maximum number of seconds to run all queries: 0.390 seconds Number of clients running queries: 200 Average number of queries per client: 0 总结 通过对字符串加索引,可以提高查询效率,但需要注意指定长度,无法使用覆盖索引 通过使用CRC32,需要额外存一个字段,将字符串转为整数存储,节省空间,效率提升并不是很大,但存在碰撞问题,可以加多字符串筛选条件 -对于CRC32存在碰撞问题,可以使用CRC64减少碰撞,但需要安装 common_schema database函数库 |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |