group by分组排序(组内排序)

您所在的位置:网站首页 如何将数据分组排序 group by分组排序(组内排序)

group by分组排序(组内排序)

2024-07-15 20:53| 来源: 网络整理| 查看: 265

一、项目场景:

有张调用记录表有多个客户端的调用记录,每条调用记录有当次调用记录的详细信息 分页列表需要查出 每个客户端的调用次数、以及最新一次调用的信息。

这里举个栗子来模拟一下业务,创建一张评论表,里面存了每个用户的多个评论信息。SQL如下

CREATE TABLE `user_comment` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `user_id` int(10) unsigned NOT NULL, `contents` varchar(200) NOT NULL, `create_time` datetime NOT NULL, `update_time` datetime NOT NULL, PRIMARY KEY (`id`), KEY `user_id` (`user_id`), KEY `addtime` (`create_time`), KEY `uid_addtime` (`user_id`,`create_time`) ) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8;

插入数据

INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (1, 1, '评论1', '2017-05-17 00:00:00', '2017-05-17 00:00:00'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (2, 1, '评论2', '2017-05-17 00:00:01', '2017-05-17 00:00:01'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (3, 2, '评论1', '2017-05-17 00:00:02', '2017-05-17 00:00:02'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (4, 2, '评论2', '2017-05-17 00:00:03', '2017-05-17 00:00:03'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (5, 3, '评论1', '2017-05-17 00:00:04', '2017-05-17 00:00:04'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (6, 1, '评论3', '2017-05-17 00:00:05', '2017-05-17 00:00:05'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (7, 4, '评论1', '2017-05-17 00:00:06', '2017-05-17 00:00:06'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (8, 4, '评论2', '2017-05-17 00:00:07', '2017-05-17 00:00:07'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (9, 4, '评论3', '2017-05-17 00:00:08', '2017-05-17 00:00:08'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (10, 4, '评论4', '2017-05-17 00:00:09', '2017-05-17 00:00:09'); INSERT INTO `test`.`user_comment`(`id`, `user_id`, `contents`, `create_time`, `update_time`) VALUES (11, 3, '评论2', '2017-05-17 00:00:10', '2017-05-17 00:00:10');

查看数据结果

mysql> select * from user_comment; +----+---------+----------+---------------------+---------------------+ | id | user_id | contents | create_time | update_time | +----+---------+----------+---------------------+---------------------+ | 1 | 1 | 评论1 | 2017-05-17 00:00:00 | 2017-05-17 00:00:00 | | 2 | 1 | 评论2 | 2017-05-17 00:00:01 | 2017-05-17 00:00:01 | | 3 | 2 | 评论1 | 2017-05-17 00:00:02 | 2017-05-17 00:00:02 | | 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | | 5 | 3 | 评论1 | 2017-05-17 00:00:04 | 2017-05-17 00:00:04 | | 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | | 7 | 4 | 评论1 | 2017-05-17 00:00:06 | 2017-05-17 00:00:06 | | 8 | 4 | 评论2 | 2017-05-17 00:00:07 | 2017-05-17 00:00:07 | | 9 | 4 | 评论3 | 2017-05-17 00:00:08 | 2017-05-17 00:00:08 | | 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | | 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | +----+---------+----------+---------------------+---------------------+ 11 rows in set (0.05 sec) 二、问题描述:

现在要查询的是,每个用户的最新一条评论信息、以及统计用户总共评论了多少次。

这里很容易想到用 group by 去对 user_id 进行分组,得出总共有多少个用户评论,但是此时,非 group by字段的选择是怎么样的呢?

我的理解是,group by 对非分组的列,选择是最先遇到的一条记录。即每个分组里 id 最小的。所以,如果我们要拿到每个分组里最新的一条评论信息,则在分组前对评论进行排序即可,

SQL如下

SELECT c.*, count( c.user_id ) AS comment_count FROM ( SELECT * FROM user_comment ORDER BY create_time DESC ) c GROUP BY c.user_id;

在 MYSQL 5.6 版本下执行的结果

mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc )c group by c.user_id; +----+---------+----------+---------------------+---------------------+---------------+ | id | user_id | contents | create_time | update_time | comment_count | +----+---------+----------+---------------------+---------------------+---------------+ | 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 | | 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 | | 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 | | 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 | +----+---------+----------+---------------------+---------------------+---------------+ 4 rows in set (0.08 sec)

在 MYSQL 5.7 版本下执行的结果

mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc )c group by c.user_id; +----+---------+----------+---------------------+---------------------+---------------+ | id | user_id | contents | create_time | update_time | comment_count | +----+---------+----------+---------------------+---------------------+---------------+ | 1 | 1 | 评论1 | 2017-05-17 00:00:00 | 2017-05-17 00:00:00 | 3 | | 3 | 2 | 评论1 | 2017-05-17 00:00:02 | 2017-05-17 00:00:02 | 2 | | 5 | 3 | 评论1 | 2017-05-17 00:00:04 | 2017-05-17 00:00:04 | 2 | | 7 | 4 | 评论1 | 2017-05-17 00:00:06 | 2017-05-17 00:00:06 | 4 | +----+---------+----------+---------------------+---------------------+---------------+ 4 rows in set (0.03 sec)

这里就踩了个坑,在 GROUP BY 实现组内排序的时候,发现同一条SQL 在不同的 MYSQL版本下会查询出不同的数据。本次测试版本为 MYSQL 5.6 和 MYSQL 5 .7,结果刚好相反。在5.6下是可以拿到正确结果的,而更高版本不行,所以要换一种实现方式。

三、原因分析:

MYSQL认为 group by是不稳定的,所以要消除这种不稳定性,要明确聚合分组的列。在MYSQL 5.7 版本对 group by 做了调整。详情这里不阐述,移步百度自行搜索。

四、解决方案:

注:网上有网友提出,在上面sql的基础上,在内循环加上 limit 即可解决, 我个人认为这样有点鸡肋,因为不知道表的数据有多少。 网友这种方式实现如下:

SELECT c.*, count( c.user_id ) AS comment_count FROM ( SELECT * FROM user_comment ORDER BY create_time DESC LIMIT 1000 ) c GROUP BY c.user_id; mysql> select c.*, count(c.user_id) as comment_count from (select * from user_comment order by create_time desc limit 1000 )c group by c.user_id; +----+---------+----------+---------------------+---------------------+---------------+ | id | user_id | contents | create_time | update_time | comment_count | +----+---------+----------+---------------------+---------------------+---------------+ | 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 | | 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 | | 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 | | 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 | +----+---------+----------+---------------------+---------------------+---------------+ 4 rows in set (0.04 sec)

如上可以看到得到了我们想要的结果,也和MYSQL 5.7执行的一致。但是这种方式不太行,我们另寻途径。

1)组内排序的另一种实现方式

如果是自增id的话,id最大,则记录最新。如果不是自增,则可以用创建时间来区分。巧用 Max(xxx)函数来实现组内排序。

SQL如下

SELECT * FROM user_comment WHERE id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id ) ORDER BY user_id

效果和上述方式一致,也可以达到组内排序的效果。

+----+---------+----------+---------------------+---------------------+ | id | user_id | contents | create_time | update_time | +----+---------+----------+---------------------+---------------------+ | 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | | 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | | 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | | 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | +----+---------+----------+---------------------+---------------------+ 4 rows in set (0.04 sec)

注意: 写到这里就达到组内排序的目标了。但是还没达到我的要求。因为就是这样处理后,没法再做统计汇总了,所以要达到目的还是要改写。

最终SQL如下:

SELECT c2.*, COUNT( c1.user_id ) AS comment_count FROM user_comment c1 RIGHT JOIN ( SELECT * FROM user_comment WHERE id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id ) ) c2 ON c1.user_id = c2.user_id GROUP BY c2.user_id

查询结果

mysql> SELECT c2.*, COUNT( c1.user_id ) AS comment_count FROM user_comment c1 RIGHT JOIN ( SELECT * FROM user_comment WHERE id IN ( SELECT MAX( id ) FROM user_comment GROUP BY user_id ) ) c2 ON c1.user_id = c2.user_id GROUP BY c2.user_id; +----+---------+----------+---------------------+---------------------+---------------+ | id | user_id | contents | create_time | update_time | comment_count | +----+---------+----------+---------------------+---------------------+---------------+ | 6 | 1 | 评论3 | 2017-05-17 00:00:05 | 2017-05-17 00:00:05 | 3 | | 4 | 2 | 评论2 | 2017-05-17 00:00:03 | 2017-05-17 00:00:03 | 2 | | 11 | 3 | 评论2 | 2017-05-17 00:00:10 | 2017-05-17 00:00:10 | 2 | | 10 | 4 | 评论4 | 2017-05-17 00:00:09 | 2017-05-17 00:00:09 | 4 | +----+---------+----------+---------------------+---------------------+---------------+ 4 rows in set (0.07 sec)

码字不易,如有帮助点个赞趴~~



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3