linux性能分析工具专题

您所在的位置：网站首页 › 查看进程文件的软件叫什么 › linux性能分析工具专题

linux性能分析工具专题

2024-07-14 17:24| 来源: 网络整理| 查看: 265

文章目录概述perf概念perf的工具集合介绍perf的事件介绍---perf list参看常用perf性能查看工具使用perf stat---运行一个命令并且统计过程事件perf top---输出系统某个事件热度函数或者指令排序perf record/report---record收集性能数据记录到文件，report查看perf annotate---代码指令级解析record文件精确定位perf bench---添加负载的工具（待更新）调试技巧参考资料备注：本文调试版本有两个环境，主要展示以虚拟机ubuntu信息为主，配合树莓派。主要是发现虚拟机很多事件不支持，个人推测是由于虚拟的系统，一些硬件事件等模拟受限。因此后面补充追加了树莓派测试信息。

perf version 4.4.177（执行环境，虚拟机 64ubuntu环境：Linux ubuntu 4.4.0-148-generic #174~14.04.1-Ubuntu SMP Thu May 9 08:17:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux） perf version 4.9.82 （执行环境，树莓派4B：Linux raspberrypi 4.19.127-v7l #1 SMP Sun Aug 9 00:56:42 PDT 2020 armv7l GNU/Linux）

概述 perf概念

perf是一款强大的linux性能分析工具（基于linux内核提供的性能事件perf_event口）。能够统计各类硬件事件如指令执行、cache-miss、分支错误预测，用来寻找应用的热点。并且可以针对单个task、单个cpu、或者单个workload进行统计前面这些硬件事件和其他的软件事件；

perf依赖事件进行统计，这里的事件是通过采样机制，并不是clock级别的统计；根据使用perf工具的不同按测量事件的类型进行统计。

perf的工具集合介绍

perf自身提供了大量的命令用来收集和分析性能，追踪信息；（每个命令都可以进一步查看帮助，如perf stat -h）其中比较常用的： perf stat:统计一个执行命令的各种事件计数。 perf top:进行指定事件排序（可到函数或者指令级别）。 perf record/report/annotate:一组详细分析组合命令，record记录性能文件，report通过文件输出基础报告，annotate配合代码进行定位输出；还有一些针对性性能检查工具：如针对锁的 lock;针对调度的sched;针对slab分配器性能kmem;自定义检查点 probe。

各类命令如下：

goodboy@ubuntu:~$ perf -h usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file bench General framework for benchmark suites buildid-cache Manage build-id cache. buildid-list List the buildids in a perf.data file data Data file related processing diff Read perf.data files and display the differential profile evlist List the event names in a perf.data file inject Filter to augment the events stream with additional information kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) script Read perf.data (created by perf record) and display trace output stat Run a command and gather performance counter statistics test Runs sanity tests. timechart Tool to visualize total system behavior during a workload top System profiling tool. trace strace inspired tool probe Define new dynamic tracepoints See 'perf help COMMAND' for more information on a specific command. perf的事件介绍—perf list参看

前面提到perf自身是基于内核提供的事件统计机制的，用perf list命令查看，这些事件主要有由以下三种构成： 1、软件事件（内核统计和操作系统相关性能事件。如系统调用次数、上下文切换次数、任务迁移次数、缺页例外次数等） 2、性能检测单元Performance Monitoring Unit（PMU）硬件事件（如指定cycle数） 3、硬件事件（如各类中断事件）（如 L1 cache miss）

List of pre-defined events (to be used in -e): alignment-faults [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] page-faults OR faults [Software event] L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-load-misses [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-loads [Hardware cache event] cycles-ct OR cpu/cycles-ct/ [Kernel PMU event] ... 常用perf性能查看工具使用 perf stat—运行一个命令并且统计过程事件

perf stat主要在程序执行的过程中统计支持的事件计数，简单的在屏幕输出。可以使用perf stat cmd方式执行cmd命令，在执行结束后会输出各类事件的统计。例如，测试从zero文件读取输入写到空设备中，连续写1000000个block：

root@ubuntu:/sys/kernel/debug/tracing# perf stat -B dd if=/dev/zero of=/dev/null count=1000000 1000000+0 records in 1000000+0 records out 512000000 bytes (512 MB) copied, 1.03206 s, 496 MB/s Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000': 1020.507771 task-clock (msec) # 0.987 CPUs utilized 48 context-switches # 0.047 K/sec 0 cpu-migrations # 0.000 K/sec 67 page-faults # 0.066 K/sec cycles stalled-cycles-frontend stalled-cycles-backend instructions branches branch-misses 1.033620037 seconds time elapsed

树莓派上查看：在这里插入图片描述默认典型事件说明：（1）task-clock：任务真正占用的处理器时间，单位ms。(CPU占用率 = task-clock / time elapsed) （2）context-switches：上下文的切换次数。（3）CPU-migrations：处理器迁移次数，为了维持多处理器负载均衡，特定条件下会将某个任务迁移到另一个CPU。（4）page-faults：缺页异常的次数。当应用程序请求的页面尚未建立、请求的页面不在内存中，或者请求的页面虽然在内存中，但物理地址和虚拟地址的映射关系尚未建立时，都会触发一次缺页异常。另外TLB不命中，页面访问权限不匹配等情况也会触发缺页异常。（5）cycles：消耗的处理器周期数。（6）instructions：执行了多少条指令。IPC为平均每个cpu cycle执行了多少条指令。（7）branches：遇到的分支指令数。branch-misses是预测错误的分支指令数。

常用选项介绍（用perf stat -h查看全部）

-a, --all-cpus system-wide collection from all CPUs //全部cpu统计 -C, --cpu list of cpus to monitor in system-wide // 指定某个CPU事件 -d, --detailed detailed run - start a lot of events //打印更详细信息 -e, --event event selector. use 'perf list' to list available events //指定性能事件多个用，分 -I, --interval-print print counts at regular interval in ms (>= 10) // 每隔n毫秒打印一次 -p, --pid stat events on existing process id // 指定某个pid的进程 -r, --repeat repeat command and print average + stddev (max: 100, forever: 0) //重复运行 -t, --tid stat events on existing thread id // 指定某个tid的线程 perf top—输出系统某个事件热度函数或者指令排序

perf top工具的使用类似linux的top命令，实时的输出函数采样按某一统计事件的排序结果，默认事件为是cycles（消耗的处理器周期数），默认按降序排序；perf top会统计全部用户态和内核态的函数，默认是全部CPU，也可以指定某个CPU监控器。

//ubuntu执行情况 mples: 1K of event 'cpu-clock', Event count (approx.): 301398414 Overhead Shared Object Symbol 8.80% [kernel] [k] _raw_spin_unlock_irqrestore 6.32% [kernel] [k] clear_page_orig 5.91% [kernel] [k] kallsyms_expand_symbol.constprop.1 4.58% [kernel] [k] mpt_put_msg_frame 4.00% [kernel] [k] update_iter 3.75% [kernel] [k] trace_graph_entry 2.83% [kernel] [k] ftrace_graph_caller 1.77% [kernel] [k] prepare_ftrace_return 1.75% [kernel] [k] finish_task_switch 1.75% libelf-0.158.so [.] gelf_getsym 1.67% perf [.] d_demangle_callback 1.30% libc-2.19.so [.] _int_malloc 1.02% perf [.] rb_next

树莓派执行如下：在这里插入图片描述常用选项介绍（用perf top -h查看全部）

-a, --all-cpus system-wide collection from all CPUs //全部cpu统计 -c, --count event period to sample // 指定采样周期 -C, --cpu list of cpus to monitor // 指定某个CPU事件 -e, --event event selector. use 'perf list' to list available events // 指定事件 -K, --hide_kernel_symbols hide kernel symbols // 隐藏内核函数 -U, --hide_user_symbols hide user symbols // 隐藏用户态函数 -p, --pid profile events on existing process id // 仅分析目标进程及其创建的线程 -t, --tid profile events on existing thread id // 仅分析目标线程 -g enables call-graph recording and display // 展示调用关系（通过光标上下移动，enter展开） perf record/report—record收集性能数据记录到文件，report查看

可以通过perf record cmd来针对cmd命令进行统计。收集一段时间内的性能事件到文件 perf.data(默认)，随后需要用perf report命令分析。可以统计单个线程、进程、或者CPU事件。默认统计事件也是按照cycles（消耗的处理器周期数），默认的平均统计频率为1秒1000次，也就是1000Hz；

举例，用1000统计频率，统计一个sleep 5秒过程中，全部CPU上的事件：

root@ubuntu:/home/wy/study/str# perf record -a -F 1000 sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.898 MB perf.data (4997 samples) ] root@ubuntu:/home/uy/study/str# ls perf.data root@ubuntu:/home/wuya/study/str# perf report // 输入后显示内容部分如下： Samples: 4K of event 'cpu-clock', Event count (approx.): 4997000000 Overhead Command Shared Object Symbol 98.96% swapper [kernel.kallsyms] [k] native_safe_halt 0.14% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.08% tpvmlp [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.06% Xorg [kernel.kallsyms] [k] prepare_ftrace_return 0.04% Xorg [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.04% Xorg [kernel.kallsyms] [k] trace_graph_entry 0.04% compiz [kernel.kallsyms] [k] trace_graph_entry 0.04% compiz [vdso] [.] __vdso_clock_gettime 0.04% compiz libcompiz_core.so.0.9.11.3 [.] CompScreen::handleEvent 0.04% kworker/0:2 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore

常用选项介绍（用perf report -h查看全部）

-a, --all-cpus system-wide collection from all CPUs //全部cpu统计 -c, --count event period to sample // 指定采样周期 -C, --cpu list of cpus to monitor // 指定某个CPU事件 -e, --event event selector. use 'perf list' to list available even // 指定事件 -F, --freq profile at this frequency // 指定统计频率，每秒n次 -g enables call-graph recording // 开启图形调用栈记录（可以看到子函数统计情况） -o, --output output file name // 指定输出文件名称 -P, --period Record the sample period //指定记录频率 -p, --pid record events on existing process id // 指定记录进程pid perf annotate—代码指令级解析record文件精确定位

perf annotate提供指令级别的record文件定位。使用调试信息-g编译的文件能够显示汇编和本身源码信息。但要注意， annotate命令并不能够解析内核image中的符号，必须要传递未压缩的内核image给annotate才能正常的解析内核符号，比如：perf annotate -k /tmp/vmlinux -d symbol 举例：main.c内容如下：

#include #include void func_a() { unsigned int num = 1; int i; for (i = 0;i

【本文地址】

linux性能分析工具专题

linux性能分析工具专题

今日新闻

推荐新闻