【clickhouse踩坑记录】clickhouse日志分析 <Error> void DB::StorageKafka::threadFunc()

您所在的位置:网站首页 csv变成了错误日志 【clickhouse踩坑记录】clickhouse日志分析 <Error> void DB::StorageKafka::threadFunc()

【clickhouse踩坑记录】clickhouse日志分析 <Error> void DB::StorageKafka::threadFunc()

2024-07-03 09:43| 来源: 网络整理| 查看: 265

背景

今天意外的发现clickhouse集群,clickhouse的日志路径一直打error日志。为了彻底把error消除,折腾了一番。

报错内容:

2021.11.30 23:49:41.774027 [ 44 ] {} void DB::StorageKafka::threadFunc(): Code: 73, e.displayText() = DB::Exception: Unknown format JSONAsString, Stack trace: 0.0x3512b60 StackTrace::StackTrace() /usr/bin/clickhouse 1.0x351cdaf DB::Exception::Exception(std::string const&, int) /usr/bin/clickhouse 2.0x657b9ee DB::FormatFactory::getCreators(std::string const&) const /usr/bin/clickhouse 3.0x657c85b DB::FormatFactory::getInput(std::string const&, DB::ReadBuffer&, DB::Block const&, DB::Context const&, unsigned long, std::function) const /usr/bin/clickhouse 4.0x69da7a3 DB::KafkaBlockInputStream::readImpl() /usr/bin/clickhouse 5.0x6073008 DB::IBlockInputStream::read() /usr/bin/clickhouse 6.0x61e8861 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::thread(std::shared_ptrDB::ThreadGroupStatus, unsigned long) /usr/bin/clickhouse 7.0x61e91eb ThreadFromGlobalPool::ThreadFromGlobalPool(void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::&&)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler&&, std::shared_ptrDB::ThreadGroupStatus&&, unsigned long&)::{lambda()#1}::operator()() const /usr/bin/clickhouse 8.0x3554f13 ThreadPoolImplstd::thread::worker(std::_List_iteratorstd::thread) /usr/bin/clickhouse 9.0x791d69f ? /usr/bin/clickhouse 10.0x7fd3412c0e65 start_thread /usr/lib64/libpthread-2.17.so 11.0x7fd340de588d __clone /usr/lib64/libc-2.17.so (version 19.17.4.11)

分析过程 从error日志看,似乎不能看到太多的信息。博主的水平只能看出两点:

Code: 73, e.displayText() = DB::Exception: Unknown format JSONAsString 这里是json解析出现问题。 KafkaBlockInputStream 这里跟kafka相关,所以初步猜测可能是kafka引擎解析字符串出现问题。

整体方向有了,所以寻着找kafka引擎的表去看,问题应该就能解决。但是看了system.tables,没有发现有kafka引擎相关的表。

备注:system.tables表中的engine字段表示表引擎,

说明,这里查的时候疏忽了一点,error日志其实只出现了A节点,一直都没发现这个点。查system.tables时跑去了B节点。而恰巧,这张kafka引擎表,只建在了A节点,是个本地表。所以才一直没在system.table中找到kafka引擎相关的表。【这一切都是后面才恍然大悟的】

这个时候很纳闷,开始怀疑一开始的判断。萌生另外的猜想:会不会有其他程序消费kafka数据往clickhouse写导致的报错。尝试无脑重启节点等也都不能解决。纳闷了一圈,偶然看见“日志”两字,突然意识到,可以试试改一下clickhouse配置,把trace日志都打出来。所以修改文件/etc/clickhouse-server/config.xml。将日志level由error改为trace,如下: trace /u/logs/clickhouse-server/clickhouse-server.log /u/logs/clickhouse-server/clickhouse-server.err.log 1000M 10 改完后再去看error日志,其实也没有太多有用的信息。但正常日志很明显了很多【日志如下】,StorageKafka (kafka_user_behavior_src_test) 是个新出现的信息。但还不能确定kafka_user_behavior_src_test这个到底是表?group?还是topic? 找了clickhouse开发文档中的源码:StorageKafka.cpp。可惜不懂C++,掌握C++的朋友有兴趣的可以研究一下。

2021.12.01 00:22:24.712135 [ 26 ] {} StorageKafka (kafka_user_behavior_src_test): Execution took 501 ms. 2021.12.01 00:22:25.212277 [ 43 ] {} StorageKafka (kafka_user_behavior_src_test): Started streaming to 1 attached views 2021.12.01 00:22:25.212868 [ 70 ] {} StorageKafka (kafka_user_behavior_src_test): Already subscribed to topics: 2021.12.01 00:22:25.212873 [ 52 ] {} StorageKafka (kafka_user_behavior_src_test): Already subscribed to topics: 2021.12.01 00:22:25.212962 [ 70 ] {} StorageKafka (kafka_user_behavior_src_test): Already assigned to topics: 2021.12.01 00:22:25.212998 [ 52 ] {} StorageKafka (kafka_user_behavior_src_test): Already assigned to topics: 2021.12.01 00:22:25.713303 [ 70 ] {} StorageKafka (kafka_user_behavior_src_test): Stalled 2021.12.01 00:22:25.713311 [ 52 ] {} StorageKafka (kafka_user_behavior_src_test): Stalled 2021.12.01 00:22:25.713694 [ 43 ] {} UnionBlockInputStream: Waiting for threads to finish 2021.12.01 00:22:25.713729 [ 43 ] {} UnionBlockInputStream: Waited for threads to finish 2021.12.01 00:22:25.713810 [ 43 ] {} virtual DB::UnionBlockInputStream::~UnionBlockInputStream(): Code: 73, e.displayText() = DB::Exception: Unknown format JSONAsString, Stack trace: 0.0x3512b60 StackTrace::StackTrace() /usr/bin/clickhouse 1.0x351cdaf DB::Exception::Exception(std::string const&, int) /usr/bin/clickhouse 2 0x657b9ee DB::FormatFactory::getCreators(std::string const&) const /usr/bin/clickhouse 3 0x657c85b DB::FormatFactory::getInput(std::string const&, DB::ReadBuffer&, DB::Block const&, DB::Context const&, unsigned long, std::function) const /usr/bin/clickhouse 4.0x69da7a3 DB::KafkaBlockInputStream::readImpl() /usr/bin/clickhouse 5.0x6073008 DB::IBlockInputStream::read() /usr/bin/clickhouse 6.0x61e8861 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::thread(std::shared_ptrDB::ThreadGroupStatus, unsigned long) /usr/bin/clickhouse 7.0x61e91eb ThreadFromGlobalPool::ThreadFromGlobalPool(void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::&&)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler&&, std::shared_ptrDB::ThreadGroupStatus&&, unsigned long&)::{lambda()#1}::operator()() const /usr/bin/clickhouse 8.0x3554f13 ThreadPoolImplstd::thread::worker(std::_List_iteratorstd::thread) /usr/bin/clickhouse 9.0x791d69f ? /usr/bin/clickhouse 10.0x7f4154b86e65 start_thread /usr/lib64/libpthread-2.17.so 11.0x7f41546ab88d __clone /usr/lib64/libc-2.17.so (version 19.17.4.11)

6 . 再后来直到发现节点B没有error日志,这才意识到,会不会是该表只建了本地表,去A节点的clickhouse中的system.tables一查,还真是如此。第五点的疑问也迎刃而解了,kafka_user_behavior_src_test其实并非groupid,而是KafkaEngine表的表名。在节点A就能找到这张表。

其实这里在此之前,还尝试去各kafka集群里搜groups,不过均没搜到。方向都偏了。

把相关测试表处理一下,报错就消失啦~~ 总结

如果日志满足不了您分析定位问题,不妨修改一下clickhouse的配置文件,将error级别修改为trace,再配合观察info日志与error日志,可以帮您将问题再缩小一个范围。真实有用!!!



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3