Linux之random子系统问题解决分享 |
您所在的位置:网站首页 › shrd指令 › Linux之random子系统问题解决分享 |
文章目录
@[toc]问题信息问题描述
问题排查测试一、Hicore卡顿业务定位结论
测试二、系统资源信息分析结论
测试三、random相关分析结论一结论二总结
问题拓展random模块基本概念接口说明使用关注
hw_random模块基本概念接口说明使用说明
问题解决crng_ready方案trust_cpu方案hw_random方案random加速方案haveged工具
备注术语与缩写调试命令
问题信息
问题描述
T673 V1.1项目在集成测试Build 1期间, 使用T673设备升级系统测试Build1整机包测试启动耗时, 发现设备启动至进入应用用时1分钟左右, 耗时较其他项目(T671B等)时间长, 用户体验差,需要修复。 与应用同事沟通收集信息: T673 V1.1项目在今年四月份异常结项,未有关注启动耗时优化;T673 V1.1项目在今年九月份重新开启,至十月下旬的系统测试Build1期间未有关注启动耗时优化;在项目重启至系统测试B1之前有合入以下内容: 触摸屏新物料支持;PHY新物料支持;韦根驱动从基线同步;应用程序支持硬解密;其他; 问题排查 测试一、Hicore卡顿业务定位应用同事在Hicore启动过程中增加时间戳信息,以初步定位耗时位置。调试整机固件, ACS_673_F1plus_CN_STD_V1.0.0_build221020.zip 测试日志 [10-20 09:33:14.152][E][anymouse]|acs_config_db_set_version|206|timetj exec start // 卡顿起始位置 [10-20 09:33:24.009][W][anymouse]|hardWatchDog|172|Feed dog is working properly. [10-20 09:33:34.025][W][anymouse]|hardWatchDog|172|Feed dog is working properly. [10-20 09:33:34.420][E][anymouse]|acs_config_db_set_version|212|timetj exec end // 卡顿消失位置 结论 应用同事初步确认:卡顿长位置位于acs_config_db_set_version,用时约20s。 测试二、系统资源信息分析top信息 Mem: 123584K used, 321324K free, 19752K shrd, 1088K buff, 70360K cached CPU: 52.3% usr 0.0% sys 0.0% nic 47.6% idle 0.0% io 0.0% irq 0.0% sirq Load average: 1.85 0.40 0.13 2/112 1000 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 955 951 root S 120m 27.7 0 52.3 [uicore] // uicore处于休眠状态,CPU占用较高 952 1 root S 178m 40.9 1 0.0 [hicore] 859 856 root S 37896 8.5 0 0.0 [log4j]Uicore的CPU占用较高,做移除测试。 移除Uicore后,再次捕捉卡顿时间点的top信息 [01-25 01:14:25.143][E][anymouse]|acs_config_db_set_version|206|timetj exec start Enter Debug Mode. [root@dvrdvs ] # TMOUT=0 [root@dvrdvs ] # top -n1 Mem: 116800K used, 328108K free, 18524K shrd, 1076K buff, 64360K cached CPU: 0.0% usr 4.5% sys 0.0% nic 95.4% idle 0.0% io 0.0% irq 0.0% sirq Load average: 2.35 0.52 0.17 1/94 969 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 927 1 root S 178m 40.9 1 0.0 [hicore] 858 856 root S 28676 6.4 0 0.0 [log4j] 750 1 root S 14404 3.2 0 0.0 [bsp_log4j] 。。。 [01-25 01:14:51.871][E][anymouse]|acs_config_db_set_version|212|timetj exec endperf分析文件,截图: 经测试,每次进入(timetj exec start)、退出(timetj exec end)卡顿时都有随机妖数相关打印信息: Hicore向随机随机数发生器读取256节点随机数:random: hicore: uninitialized urandom read (256 bytes read) ; 随机数发生器初始化完成:random: crng init done; 源码查找 gaoyang3513@General:~/Source_code/01-Nt9852x/04-T673/01-IBPPrj/SDK_T673_V1.1$ grep -rn "urandom read" . ./kernel/drivers/char/random.c:1896: "urandom read (%zd bytes read)\n", gaoyang3513@General:~/Source_code/01-Nt9852x/04-T673/01-IBPPrj/SDK_T673_V1.1$ grep -rn "crng init done" kernel/ kernel/drivers/char/random.c:962: pr_notice("random: crng init done\n");栈回溯确认调用路径 测试补丁: Index: kernel/drivers/char/random.c =================================================================== --- kernel/drivers/char/random.c (revision 523917) +++ kernel/drivers/char/random.c (working copy) @@ -238,6 +238,7 @@ * Eastlake, Steve Crocker, and Jeff Schiller. */ +#include "linux/printk.h" #include #include #include @@ -960,6 +961,7 @@ process_random_ready_list(); wake_up_interruptible(&crng_init_wait); pr_notice("random: crng init done\n"); + dump_stack(); if (unseeded_warning.missed) { pr_notice("random: %d get_random_xx warning(s) missed " "due to ratelimiting\n", @@ -1824,6 +1826,9 @@ urandom_warning.interval = 0; unseeded_warning.interval = 0; } + + printk("[%s|%u] -- GY\n", __FUNCTION__, __LINE__); + return 0; } early_initcall(rand_initialize); @@ -1891,10 +1896,12 @@ if (!crng_ready() && maxwarn > 0) { maxwarn--; - if (__ratelimit(&urandom_warning)) + if (__ratelimit(&urandom_warning)) { printk(KERN_NOTICE "random: %s: uninitialized " "urandom read (%zd bytes read)\n", current->comm, nbytes); + dump_stack(); + } spin_lock_irqsave(&primary_crng.lock, flags); crng_init_cnt = 0; spin_unlock_irqrestore(&primary_crng.lock, flags); 使用内核dump_stack实现栈回溯打印;在random模块初始化位置,增加打印信息;在打印random: hicore: uninitialized urandom read (256 bytes read) 之后,打印栈回溯信息;在打印random: crng init done之后,打印栈回溯信息;打印信息如下: [ 1.063632] nvt_rng f0680000.rng: Register nvt_rng_probe successfully // 硬件随机数驱动加载成功 ... [ 8.726708] random: fast init done // fast_pool初始化成功,crng_init=1; ... /home/config/config.db init ok...[ 9.267438] random: hicore: uninitialized urandom read (256 bytes read) // 获取random [ 9.276544] CPU: 0 PID: 946 Comm: hicore Tainted: P O 4.19.91 #2-ga8b08bf6-dirty [ 9.285224] Hardware name: Novatek Video Platform [ 9.289913] Backtrace: [ 9.292362] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 9.299914] r7:7ee32760 r6:60030013 r5:00000000 r4:80c890bc [ 9.305563] [] (show_stack) from [] (dump_stack+0x90/0xac) [ 9.312775] [] (dump_stack) from [] (urandom_read+0x84/0x2e4) [ 9.320239] r7:7ee32760 r6:00000100 r5:00000100 r4:80ccc784 [ 9.325889] [] (urandom_read) from [] (__vfs_read+0x40/0x15c) // vfs_read调用字符设备的read实际:urandom_read, [ 9.333357] r10:00000003 r9:00000100 r8:c10d7f60 r7:80c24508 r6:00000100 r5:8043b3d0 [ 9.341165] r4:c2a91200 [ 9.343692] [] (__vfs_read) from [] (vfs_read+0x98/0x104) [ 9.350812] r9:00000100 r8:c10d7f60 r7:7ee32760 r6:00000100 r5:00000000 r4:c2a91200 [ 9.358539] [] (vfs_read) from [] (ksys_read+0x64/0xc0) [ 9.365485] r9:00000100 r8:7ee32760 r7:80c24508 r6:c2a91201 r5:7ee32760 r4:c2a91200 [ 9.373211] [] (ksys_read) from [] (sys_read+0x10/0x14) [ 9.380157] r9:c10d6000 r8:80101204 r7:00000003 r6:00000015 r5:7ee32760 r4:00000100 [ 9.387885] [] (sys_read) from [] (ret_fast_syscall+0x0/0x58) [ 9.395347] Exception stack(0xc10d7fa8 to 0xc10d7ff0) [ 9.400385] 7fa0: 00000100 7ee32760 00000015 7ee32760 00000100 00000000 [ 9.408545] 7fc0: 00000100 7ee32760 00000015 00000003 00fcd90c 00000000 76f25000 7ee3273c [ 9.416702] 7fe0: 745ff4d0 7ee32700 00000000 74c07a24 BusyBox v1.31.1 (2022-11-20 20:01:27 CST) built-in shell (ash) Enter 'help' for a list of built-in commands. BusyBox v1.2.1 Protect Shell (psh svn326548) Build Time: Mar 17 2021:10:03:53 Enter 'help' for a list system commands. Random number is:18050941 # [ 24.626902] random: crng init done // crng初始化完成 [ 24.630304] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.19.91 #2-ga8b08bf6-dirty [ 24.639067] Hardware name: Novatek Video Platform [ 24.643755] Backtrace: [ 24.646204] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 24.653756] r7:80c5bc68 r6:600e0193 r5:00000000 r4:80c890bc [ 24.659404] [] (show_stack) from [] (dump_stack+0x90/0xac) [ 24.666616] [] (dump_stack) from [] (crng_reseed.constprop.21+0x11c/0x1e8) [ 24.675208] r7:80c5bc68 r6:80c5bc68 r5:80c24508 r4:80c5bb2c [ 24.680858] [] (crng_reseed.constprop.21) from [] (credit_entropy_bits+0x178/0x31c) [ 24.690230] r9:0000004c r8:809bb505 r7:00000001 r6:80c5bb2c r5:80c5bb5c r4:00000080 [ 24.697958] [] (credit_entropy_bits) from [] (add_interrupt_randomness+0x1c0/0x1e4) [ 24.707333] r10:80c01ef0 r9:80169c44 r8:436c1000 r7:80c5bb80 r6:fffebd4b r5:80c24508 [ 24.715140] r4:c4232388 [ 24.717672] [] (add_interrupt_randomness) from [] (handle_irq_event_percpu+0x40/0x84) // 硬件随机数据生成器,在random中注册的回调; [ 24.727218] r10:80b5ca38 r9:80c00000 r8:c3c24000 r7:00000030 r6:80c08500 r5:00000001 [ 24.735027] r4:80c24508 [ 24.737555] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x40/0x64) [ 24.746405] r6:80c08568 r5:80c08568 r4:80c08500 [ 24.751013] [] (handle_irq_event) from [] (handle_fasteoi_irq+0xc4/0x138) // CPU在每次处理完中断后发出EOI(end of interrupt), 由GIC自实现中断流控; [ 24.759518] r7:00000030 r6:80c08568 r5:80c24a78 r4:80c08500 [ 24.765166] [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x20/0x30) [ 24.773757] r7:00000030 r6:80b71988 r5:00000000 r4:00000000 [ 24.779404] [] (generic_handle_irq) from [] (__handle_domain_irq+0xa8/0xbc) [ 24.788087] [] (__handle_domain_irq) from [] (gic_handle_irq+0x54/0x80) [ 24.796420] r9:80c00000 r8:80c24508 r7:c4803100 r6:80c01ef0 r5:80c24a78 r4:c4802100 [ 24.804147] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0xa8) // 中断回调, [ 24.811609] Exception stack(0x80c01ef0 to 0x80c01f38) [ 24.816647] 1ee0: 00000000 00058e50 c422f48c 80117dc0 [ 24.824807] 1f00: 00000001 80c00000 80c24530 80c24500 80c24508 c45fcb00 80b5ca38 80c01f4c [ 24.832966] 1f20: 80c01f50 80c01f40 80108be8 80108bd8 600e0013 ffffffff [ 24.839564] r7:80c01f24 r6:ffffffff r5:600e0013 r4:80108bd8 [ 24.845215] [] (arch_cpu_idle) from [] (default_idle_call+0x30/0x34) [ 24.853290] [] (default_idle_call) from [] (do_idle+0xd8/0x128) [ 24.860928] [] (do_idle) from [] (cpu_startup_entry+0x20/0x28) [ 24.868479] r7:80c24500 r6:ffffffff r5:00000002 r4:000000c7 [ 24.874127] [] (cpu_startup_entry) from [] (rest_init+0xbc/0xdc) [ 24.881858] [] (rest_init) from [] (start_kernel+0x478/0x518) [ 24.889320] r5:80c9e48c r4:00000000 [ 24.892884] [] (start_kernel) from [] ( (null))源码跟踪urandom_read # kernel/drivers/char/random.c static ssize_t urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) if (!crng_ready() && maxwarn > 0) { // crng未初始化完成 maxwarn--; if (__ratelimit(&urandom_warning)) // 在多次(10次)尝试仍失败时,问题打印 printk(KERN_NOTICE "random: %s: uninitialized " "urandom read (%zd bytes read)\n", current->comm, nbytes); nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3)); extract_crng_user(buf, nbytes); int large_request = (nbytes > 256); while (nbytes) { if (large_request && need_resched()) { // 单次获取超过256字节时,主动调度释放CPU; schedule(); extract_crng(tmp); crng = &primary_crng; _extract_crng(crng, out); if (crng_ready() && // crng初始化完成,且超过老化时间时,重新写入熵值; (time_after(crng_global_init_time, crng->init_time) || time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))) crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL); if (crng == &primary_crng && crng_init .read = urandom_read, // read为字符/dev/urandom的读实现 .write = random_write, .unlocked_ioctl = random_ioctl, }; # kernel/drivers/char/mem.c static const struct memdev devlist[] = { ... [8] = { "random", 0666, &random_fops, 0 }, [9] = { "urandom", 0666, &urandom_fops, 0 }, }; static int __init chr_dev_init(void) register_chrdev(MEM_MAJOR, "mem", &memory_fops); // 注册字符设备mem, 主设备 mem_class = class_create(THIS_MODULE, "mem"); // 创建设备类 for (minor = 1; minor int ret; if (flags & ~(GRND_NONBLOCK|GRND_RANDOM)) return -EINVAL; if (count > INT_MAX) // 限制最大读取长度(int值长度) count = INT_MAX; if (flags & GRND_RANDOM) // 如果指定使用random,替代urandom return _random_read(flags & GRND_NONBLOCK, buf, count); if (!crng_ready()) { // 如果crng未准备完成,等待 if (flags & GRND_NONBLOCK) // 如果传入的flags指定不阻塞,直接返回 return -EAGAIN; ret = wait_for_random_bytes(); // 否则挂起等待 if (likely(crng_ready())) // 如果crng准备完成,退出 return 0; return wait_event_interruptible(crng_init_wait, crng_ready()); // 提交至等待对列crng_init_wait, 条件为crng_ready() if (unlikely(ret)) return ret; } return urandom_read(NULL, buf, count, NULL); // 读取随机数 }第二处进入条件为: 直接调用系统接口getrandom。 其中wait_event_interruptible为同步阻塞,所以系统调用getrandom历可能将线程挂起直至条件crng_ready()满足; 结合问题分析:在随机字节池未填满的前提下(crng_ready()不满足),Hicore陷入内核态后直接被挂起,不可能打印random: hicore: uninitialized urandom read (256 bytes read)(该打印在urandom_read中执行, 在唤醒后被调用)。 结论一综上,有以下结论: 1. `/dev/urandom`不会引起进程阻塞,另外Hicore有调用`read("dev/urandom", xxx, 256)`,进而有的内核打印信息`random: hicore: uninitialized urandom read (256 bytes read)`; 2. `getrandom`可能会引起进程阻塞,如果进程被挂起,可能是调用了系统接口`getrandom`;结合问题,random: hicore: uninitialized urandom read (256 bytes read)由用户使用urandom调用read(“dev/urandom”, xxx, 256)而有的内核打印信息。该流程无阻塞发生完整结束。 源码跟踪crng_reseed # kernel/drivers/char/random.c #define crng_ready() (likely(crng_init > 1)) static void crng_reseed(struct crng_state *crng, struct entropy_store *r) ... if (crng == &primary_crng && crng_init unsigned int anfrac = min(pnfrac, pool_size/2); unsigned int add = ((pool_size - entropy_count)*anfrac*3) >> s; entropy_count += add; pnfrac -= anfrac; } while (unlikely(entropy_count crng_reseed(&primary_crng, r);调用条件为: 用户主动调用ioctl(“/dev/random”, RNDADDTOENTCNT(或RNDADDENTROPY), xxx); RNDADDTOENTCNT, 仅增加熵池数量;RNDADDENTROPY, 向熵池增加若干随机字(单位:字节) 1; 熵池输入源一:add_timer_randomness; input_handle_event, input子系统event事件: 键盘、触摸等输入事件; 熵池输入源二:add_disk_randomness; blk_update_bidi_request, 块设备的的BIO请求:读、写磁盘(包括eMMC)动作; 熵池输入源三:add_interrupt_randomness; handle_irq_event_percpu, 所有中断的必经之路:所有硬中断; 熵池输入源三:add_hwgenerator_randomness; hwrng_fillfn, 内核线程"hwrng": 持续写入,最大延时1s; 其他: xfer_secondary_pool, 从主熵池推送到次熵池;push_to_pool, 从‘主熵池’推送到‘输出熵池’,工作对列回调;由上可知credit_entropy_bits做为熵池填充的入口,与crng初始化完成有直接关系; 第二处:_extract_crng, static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA20_BLOCK_SIZE]) if (crng_ready() && (time_after(crng_global_init_time, crng->init_time) || time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))) crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);必要条件 crng_ready(),crng初始化完成;调用条件为 crng_reseed, 当入参entropy_store(熵池抽象类),即未指定熵池对象,递归一次;extract_crng, 当入参entropy_store(熵池抽象类)为input_pool,即指定主crng; crng_backtrack_protect, 规避chacha20-key回溯保护机制;urandom_read, 用户主动调用read(“/dev/urandom”, …);get_random_(bytes/u32/u64), 导出符号:涉及模块过多;以上调用条件都基于必要条件:crng_ready()。 第三处:random_ioctl, /dev/randomr操作 static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg) case RNDRESEEDCRNG: if (crng_init |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |