mysql 亿级数据量 ( sum ,group by )的优化

您所在的位置:网站首页 上亿条数据查询优化 mysql 亿级数据量 ( sum ,group by )的优化

mysql 亿级数据量 ( sum ,group by )的优化

2024-07-10 21:02| 来源: 网络整理| 查看: 265

        今天开发提出需求,让统计数据,一询问才得知表中的数据量已达亿级以上。具体的sql如下:

SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUM(zf),SUM(ybnje) FROM medicare2017 WHERE zy_enter_date BETWEEN '2017-01-01 00:00:00' AND '2017-12-31 12:59:59' GROUP BY id_province_code,age,gender;

然后查看该sql的执行计划

mysql> explain SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUM(zf),SUM(ybnje) FROM medicare2017 WHERE zy_enter_date BETWEEN '2017-01-01 00:00:00' AND '2017-12-31 12:59:59' GROUP BY id_province_code,age,gender; +----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+--------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+--------------------------------------------------------+ | 1 | SIMPLE | medicare2017 | NULL | range | idx_zy_enter_date,idx_province_age_gender | idx_zy_enter_date | 4 | NULL | 4836248 | 100.00 | Using index condition; Using temporary; Using filesort |

执行计划中“Extra”中竟然出现了“Using temporary; Using filesort ”,看到这种情况我们就得进行优化了,虽然“type”列出现了“range”。出现这种情况是因为sql语句中使用了“group by”或者是“order by ”,然后进行了文件排序。

接着,我们就需要给group by后面的字段建立索引了,mysql索引原则是最左匹配前缀原则,我们给“id_province_code,age,gender”三字段添加一个复合索引(按照最左匹配原则):

alter table medicare2017 add index idx_ipc_age_gener(id_province_code,age,gender);

但一查看表结构

KEY `idx_province_age_gender` (`id_province_code`,`gender`,`age`)

早期已经建立好了,但是,怎么还会出现“Using temporary; Using filesort”,查看官档发现,group by 默认是要排序的,所以即使我们添加了索引,还是会引起文件排序。这样,我们的解决方案是:强制关闭排序:order by null

最后我们根据官方文档的建议,进行了sql的最终优化:

mysql> mysql> explain SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUMM(ybnje) FROM medicare2017 WHERE zy_enter_date BETWEEN '2017-01-01 00:00:00' AND '2017-12-31 12:59:59' GROUP BY id_province_code,age,gender order by null; +----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+ | 1 | SIMPLE | medicare2017 | NULL | range | idx_zy_enter_date,idx_province_age_gender | idx_zy_enter_date | 4 | NULL | 4836248 | 100.00 | Using index condition; Using temporary | +----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+

查看执行计划,发现没有文件排序了,但是还是有“using tempory”,别着急,这是不重要的,只要执行sql的性能提升了就可以了

最终,运行最终优化后的sql发现比没有强制关闭排序的sql,整整快了将近4个小时(表中数据将近3亿,没关闭排序前的sql我运行了将近4个小时,还在跑,但优化后只跑了几十秒)看来优化的综合评估是很重要的。

 



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3