YOLOv5

您所在的位置:网站首页 方便面预热3分钟后加入是什么意思 YOLOv5

YOLOv5

2024-07-16 00:12| 来源: 网络整理| 查看: 265

【前言】 由于v5Lite仓库遗漏了不少历史问题,最大的问题是毕业后卷起来了,找不到时间更新。 上面是这篇博客的背景,那么先说下结论,使用 v5lite-e 模型,在 树莓派4B(4G内存) 上,有三种过程得到的三种结果:

静态图推理,可达15帧(实际14.5帧),帧率的统计共包括【图解码,前处理,向前推理,后处理,图保存】这五个过程;调用摄像头实时推理,不使用窗口显示结果,可达15帧(满15),帧率的统计包括【取帧,帧解码,前处理,向前推理,后处理】这五个过程;调用摄像头实时推理,使用窗口显示结果,可达13帧,帧率的统计包括【取帧,帧解码,前处理,向前推理,后处理,窗口显示】这六个过程;以上结果均为树莓派预热3分钟后的测试结果。

那么接下来怎么做?Follow me

一、树莓派64位系统

这个是老生常谈的问题了,在很早之前就已经提示过网友这个细节: http://downloads.raspberrypi.org/raspios_arm64/images/raspios_arm64-2020-08-24/

https://link.zhihu.com/?target=https%3A//shumeipai.nxez.com/download%23os

上一为老版,下一为新版,链接共包含64位 AArch64 架构、相关 A64 指令集的ARMv8-A架构集成的64位系统,求稳请使用老版,追求性能请使用新版,本文默认使用老版64位系统。

至于怎么刷系统,网上教程太多了,这里不展开篇幅。

二、MNN框架编译

没错,这篇博客使用的推理框架是阿里的MNN(本文使用2.7.0版本),正常编译的话,网上已经有非常多的教程了,但是好巧不巧,今天我们的主角是树莓派,只能好好折腾一下。

由于在博主的树莓派只有4G的运行内存,资源比较紧张,如果追求快,板上就肯定上不了fp32的模型,因此只能使用fp16或者bf16进行推理,这时候我们翻找源码中的CMakeLists,有这么两个参数,叫做MNN_SUPPORT_BF16及MNN_ARM82,这两个构建选项很关键,众所周知树莓派4B是基于ARMv8的架构,为了发挥其计算优势,博主刷上了aarch64的系统,这个时候使用bf16的半精度推理方式,无疑有极大的加成。

事不宜迟,我们开始编译推理需要的几个库,使用下面命令:

$ cmake .. -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TOOL=ON -DMNN_BUILD_QUANTOOLS=ON -DMNN_EVALUATION=ON -DMNN_SUPPORT_BF16=ON -DMNN_ARM82=ON $ make -j

这个时候会报一个错,如下:

[ 15%] Building ASM object CMakeFiles/MNNARM64.dir/source/backend/cpu/arm/arm64/bf16/ARMV86_MNNPackedMatMulRemain_BF16.S.o /root/mnn/MNN/source/backend/cpu/arm/arm64/bf16/ARMV86_MNNPackedMatMulRemain_BF16.S: Assembler messages: /root/mnn/MNN/source/backend/cpu/arm/arm64/bf16/ARMV86_MNNPackedMatMulRemain_BF16.S:158: Fatal error: macros nested too deeply make[2]: *** [CMakeFiles/MNNARM64.dir/build.make:383: CMakeFiles/MNNARM64.dir/source/backend/cpu/arm/arm64/bf16/ARMV86_MNNPackedMatMulRemain_BF16.S.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:430: CMakeFiles/MNNARM64.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....

这是因为启动bf16进行构建时,源码的汇编指令嵌套过深,会导致编译时定义的宏无法展开,这个时候我们需要将指令集中所有关于FMAX和FMIN两个变量的嵌套调用展开,按照以下这种形式修改:

那么编译结束后,我们会使用到后面这三个动态库,一个是包含半精度运行方式的 libMNN.so,一个是进行图像处理的libMNNOpenCV.so,另一个是libMNNExpress.so.

三、导出模式更改

这个时候,博主提供新的三种导出方式,具体哪种请根据自己的情况选择。

首先是第一种,自带ancher和grid匹配的方式,这是因为在群里经常被问到为什么检测框密密麻麻,有很大概率是后处理出了问题,为了规避这种情况,直接把anchor那些乱七八糟的东西给导出来,省去不必要的麻烦,如下:

def mnnd_forward(self, x): z = [] # inference output for i in range(self.nl): x[i] = self.m[i](x[i]) bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() if self.grid[i].shape[2:4] != x[i].shape[2:4]: self.grid[i] = self._make_grid(nx, ny).to(x[i].device) y = x[i].sigmoid() xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2) # wh y = torch.cat((xy, wh, y[..., 4:]), -1) z.append(y.view(bs, -1, self.no)) return torch.cat(z, 1)

接着是第二种,一种伪造end2end的导出模式,但实际上这种还是和end2end有一定区别,mnn暂未提供类似torchvison.nms处理张量后返回索引id的算子(官方有nms_,和本文想要的又有点不同),因为在这里只能精简下大家常用的end2end方式,将输出头的shape锁死,这样就可以正常导出mnn模型,如下:

def mnne_forward(self, x): z = [] # inference output for i in range(self.nl): x[i] = self.m[i](x[i]) bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() if self.grid[i].shape[2:4] != x[i].shape[2:4]: self.grid[i] = self._make_grid(nx, ny).to(x[i].device) y = x[i].sigmoid() xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2) # wh y = torch.cat((xy, wh, y[..., 4:]), -1) z.append(y.view(bs, -1, self.no)) prediction = torch.cat(z, 1) min_wh, max_wh = 2, 4096 output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0] for index, pre in enumerate(prediction): obj_conf = pre[:,4:5] cls_conf = pre[:,5:] cls_conf = obj_conf * cls_conf box = xywh2xyxy(pre[:, :4]) conf, j = cls_conf.max(1, keepdim=True) pre = torch.cat((box, conf, j.float()), 1) output[index] = pre.view(-1, 6) return output

最后是第三种,这种就是大家常用的end2end方式了,非常简单,网上也有一堆可以抄的代码,但这种目前支持的三方推理库较少,支持比较好的为onnxruntime,如下:

def end2end_forward(self, x): import torchvision z = [] conf_thres = 0.25 iou_thres = 0.50 # inference output for i in range(self.nl): x[i] = self.m[i](x[i]) bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() if self.grid[i].shape[2:4] != x[i].shape[2:4]: self.grid[i] = self._make_grid(nx, ny).to(x[i].device) y = x[i].sigmoid() xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2) # wh y = torch.cat((xy, wh, y[..., 4:]), -1) z.append(y.view(bs, -1, self.no)) prediction = torch.cat(z, 1) min_wh, max_wh = 2, 4096 xc = prediction[..., 4] > conf_thres # candidates output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0] for index, pre in enumerate(prediction): pre = pre[xc[index]] # confidence obj_conf = pre[:,4:5] cls_conf = pre[:,5:] cls_conf = obj_conf * cls_conf box = xywh2xyxy(pre[:, :4]) conf, j = cls_conf.max(1, keepdim=True) pre = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] boxes = pre[:, :4] scores = pre[:, 4] i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS output[index] = pre[i] return output

如果喜欢使用mnn推理,又是新手,建议使用第二种方式,但本文默认使用第一种。

四、模型导出和精度排查

这个真的要好好说一说,模型转化为onnx后又不验证下onnx是否能正常使用,精度和原模型对齐没,等使用三方推理框架部署上去后发现有问题,又觉得三方框架转化的模型无误,容易让一些新人陷入自我怀疑的死胡同~

以第一种方式为例,转化onnx的指令如下:

$ python export.py --mnnd --weight weights/v5lite-e.pt

得到.onnx文件后用onnxsim刷一遍,抖掉胶水:

$ python -m onnxsim weights/v5lite-e-mnnd.onnx weights/v5lite-e-mnnd_sim.onnx

同理,其他模型也是这么操作,mnnd方式导出的模型检测头如下:

mnne方式带出的模型检测头如下:

得到转化后的onnx模型,我们验证下导出来的精度对齐没,使用验证脚本直接 python check.py

得到的对比结果如下: 左为原pt模型,右为导出的onnx模型,这时候就可以保证onnx确实没有问题。

五、三方模型转化和量化

三方库转化需要使用到 MNNConvert 工具,只要前面处理好了,这里基本都是一步过,在此之前我们先验证下onnx有没有官方不支持的算子:

$ python ./tools/script/testMNNFromOnnx.py YOLOv5-Lite/v5Lite-e-mnnd_sim.onnx

如果没报错,那就下一步,开始转化:

./build/MNNConvert -f ONNX --modelFile YOLOv5-Lite/mnnd/v5lite-e-mnnd_sim.onnx --MNNModel YOLOv5-Lite/mnnd/v5lite-e-mnnd.mnn --optimizeLevel 1 --optimizePrefer 2 --bizCode MNN --saveStaticModel --testdir val_test

讲几个比较重要的参数:

–optimizeLevel arg 图优化级别,默认为1: 0: 不执行图优化,仅针对原始模型是MNN的情况;1: 保证优化后针对任何输入正确;2: 保证优化后对于常见输入正确,部分输入可能出错; –optimizePrefer arg 图优化选项,默认为0: 0:正常优化1:优化后模型尽可能小;2:优化后模型尽可能快;

转fp16模型,只需要在后面加上 –fp16 标识符 转化后的mnn模型检测头如下: 转成int8量化模型的话,稍微要久一些,如下:

$ root@ubuntu:/home/xx/MNN# ./build/quantized.out YOLOv5-Lite/mnnd/v5lite-e.mnn YOLOv5-Lite/mnnd/v5lite-e-i8.mnn YOLOv5-Lite/v5Lite-e.json

该过程大约需要10分钟,如果转成功了,会输出以下结果:

[04:46:39] /home/xx/MNN/tools/quantization/quantized.cpp:23: >>> modelFile: YOLOv5-Lite/v5lite-e-mnnd.mnn [04:46:39] /home/xx/MNN/tools/quantization/quantized.cpp:24: >>> preTreatConfig: YOLOv5-Lite/v5Lite-e-mnnd.json [04:46:39] /home/xx/MNN/tools/quantization/quantized.cpp:25: >>> dstFile: YOLOv5-Lite/v5lite-e-mnnd-i8.mnn [04:46:39] /home/xx/MNN/tools/quantization/quantized.cpp:53: Calibrate the feature and quantize model... [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:158: Use feature quantization method: KL [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:159: Use weight quantization method: MAX_ABS [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:179: feature_clamp_value: 127 [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:180: weight_clamp_value: 127 [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:193: skip quant op name: The device support i8sdot:1, support fp16:1, support i8mm: 1 [04:46:39] /home/xx/MNN/tools/quantization/Helper.cpp:111: used image num: 5000 [04:46:39] /home/xx/MNN/tools/quantization/calibration.cpp:665: fake quant weights done. ComputeFeatureRange: 100.00 % CollectFeatureDistribution: 100.00 % computeDistance: 100.00 % Debug info: 191 input_tensor_0: cos distance: 0.999921, overflow ratio: 0.000082 191 output_tensor_0: cos distance: 0.997810, overflow ratio: 0.000652 192 output_tensor_0: cos distance: 0.998180, overflow ratio: 0.000004 194 output_tensor_0: cos distance: 0.998478, overflow ratio: 0.000049 196 output_tensor_0: cos distance: 0.993538, overflow ratio: 0.000316 197 output_tensor_0: cos distance: 0.997754, overflow ratio: 0.000001 219 input_tensor_0: cos distance: 0.998476, overflow ratio: 0.000000 219 output_tensor_0: cos distance: 0.997667, overflow ratio: 0.000007 221 output_tensor_0: cos distance: 0.987334, overflow ratio: 0.000064 222 output_tensor_0: cos distance: 0.995765, overflow ratio: 0.000005 244 input_tensor_0: cos distance: 0.995262, overflow ratio: 0.000000 244 output_tensor_0: cos distance: 0.994176, overflow ratio: 0.000011 246 output_tensor_0: cos distance: 0.980969, overflow ratio: 0.000172 247 output_tensor_0: cos distance: 0.991131, overflow ratio: 0.000001 269 input_tensor_0: cos distance: 0.993674, overflow ratio: 0.000000 269 output_tensor_0: cos distance: 0.995367, overflow ratio: 0.000009 271 output_tensor_0: cos distance: 0.981729, overflow ratio: 0.000007 272 output_tensor_0: cos distance: 0.994556, overflow ratio: 0.000001 280 input_tensor_0: cos distance: 0.995445, overflow ratio: 0.000001 280 output_tensor_0: cos distance: 0.988656, overflow ratio: 0.000009 281 output_tensor_0: cos distance: 0.994097, overflow ratio: 0.000001 283 output_tensor_0: cos distance: 0.986572, overflow ratio: 0.000001 285 output_tensor_0: cos distance: 0.984723, overflow ratio: 0.000215 286 output_tensor_0: cos distance: 0.995883, overflow ratio: 0.000002 308 input_tensor_0: cos distance: 0.991681, overflow ratio: 0.000002 308 output_tensor_0: cos distance: 0.994107, overflow ratio: 0.000003 310 output_tensor_0: cos distance: 0.983974, overflow ratio: 0.000366 311 output_tensor_0: cos distance: 0.993694, overflow ratio: 0.000000 333 input_tensor_0: cos distance: 0.993330, overflow ratio: 0.000001 333 output_tensor_0: cos distance: 0.992975, overflow ratio: 0.000003 335 output_tensor_0: cos distance: 0.985026, overflow ratio: 0.000022 336 output_tensor_0: cos distance: 0.994901, overflow ratio: 0.000003 358 input_tensor_0: cos distance: 0.993783, overflow ratio: 0.000000 358 output_tensor_0: cos distance: 0.993021, overflow ratio: 0.000002 360 output_tensor_0: cos distance: 0.984969, overflow ratio: 0.000094 361 output_tensor_0: cos distance: 0.993611, overflow ratio: 0.000006 383 input_tensor_0: cos distance: 0.994139, overflow ratio: 0.000007 383 output_tensor_0: cos distance: 0.992251, overflow ratio: 0.000001 385 output_tensor_0: cos distance: 0.984115, overflow ratio: 0.000020 386 output_tensor_0: cos distance: 0.993809, overflow ratio: 0.000006 408 input_tensor_0: cos distance: 0.995448, overflow ratio: 0.000005 408 output_tensor_0: cos distance: 0.992664, overflow ratio: 0.000002 410 output_tensor_0: cos distance: 0.984901, overflow ratio: 0.000113 411 output_tensor_0: cos distance: 0.994208, overflow ratio: 0.000005 433 input_tensor_0: cos distance: 0.994892, overflow ratio: 0.000001 433 output_tensor_0: cos distance: 0.993541, overflow ratio: 0.000002 435 output_tensor_0: cos distance: 0.985944, overflow ratio: 0.000043 436 output_tensor_0: cos distance: 0.995184, overflow ratio: 0.000004 458 input_tensor_0: cos distance: 0.995215, overflow ratio: 0.000003 458 output_tensor_0: cos distance: 0.993813, overflow ratio: 0.000001 460 output_tensor_0: cos distance: 0.985733, overflow ratio: 0.000841 461 output_tensor_0: cos distance: 0.993928, overflow ratio: 0.000001 469 input_tensor_0: cos distance: 0.996452, overflow ratio: 0.000002 469 output_tensor_0: cos distance: 0.988663, overflow ratio: 0.000096 470 output_tensor_0: cos distance: 0.992134, overflow ratio: 0.000001 472 output_tensor_0: cos distance: 0.989637, overflow ratio: 0.000000 474 output_tensor_0: cos distance: 0.968360, overflow ratio: 0.001480 475 output_tensor_0: cos distance: 0.975570, overflow ratio: 0.000000 497 input_tensor_0: cos distance: 0.987168, overflow ratio: 0.000000 497 output_tensor_0: cos distance: 0.980892, overflow ratio: 0.000000 499 output_tensor_0: cos distance: 0.971500, overflow ratio: 0.000056 500 output_tensor_0: cos distance: 0.973503, overflow ratio: 0.000002 508 input_tensor_0: cos distance: 0.972393, overflow ratio: 0.000000 508 output_tensor_0: cos distance: 0.991235, overflow ratio: 0.000002 510 output_tensor_0: cos distance: 0.993184, overflow ratio: 0.000170 523 output_tensor_0: cos distance: 0.996924, overflow ratio: 0.000000 525 output_tensor_0: cos distance: 0.996898, overflow ratio: 0.000100 544 output_tensor_0: cos distance: 0.994193, overflow ratio: 0.000002 557 output_tensor_0: cos distance: 0.989506, overflow ratio: 0.000000 564 output_tensor_0: cos distance: 0.998756, overflow ratio: 0.000002 617 input_tensor_0: cos distance: 0.997003, overflow ratio: 0.000001 617 input_tensor_1: cos distance: 1.000000, overflow ratio: 1.000000 617 output_tensor_0: cos distance: 0.997003, overflow ratio: 0.000001 619 input_tensor_1: cos distance: 1.000000, overflow ratio: 1.000000 619 output_tensor_0: cos distance: 0.989684, overflow ratio: 0.000001 620 input_tensor_1: cos distance: 0.970060, overflow ratio: 0.500000 620 output_tensor_0: cos distance: 0.970980, overflow ratio: 0.000000 622 input_tensor_1: cos distance: 1.000000, overflow ratio: 1.000000 622 output_tensor_0: cos distance: 0.970980, overflow ratio: 0.000000 629 input_tensor_0: cos distance: 0.997438, overflow ratio: 0.000004 629 output_tensor_0: cos distance: 0.997438, overflow ratio: 0.000004 631 output_tensor_0: cos distance: 0.989391, overflow ratio: 0.000004 633 input_tensor_1: cos distance: 0.999999, overflow ratio: 0.166672 633 output_tensor_0: cos distance: 0.989400, overflow ratio: 0.000009 647 output_tensor_0: cos distance: 0.998979, overflow ratio: 0.000001 700 input_tensor_0: cos distance: 0.997080, overflow ratio: 0.000001 700 output_tensor_0: cos distance: 0.997080, overflow ratio: 0.000001 702 output_tensor_0: cos distance: 0.990428, overflow ratio: 0.000000 703 input_tensor_1: cos distance: 0.966962, overflow ratio: 0.550027 703 output_tensor_0: cos distance: 0.968651, overflow ratio: 0.000000 705 input_tensor_1: cos distance: 1.000000, overflow ratio: 1.000000 705 output_tensor_0: cos distance: 0.968651, overflow ratio: 0.000000 712 input_tensor_0: cos distance: 0.997000, overflow ratio: 0.000045 712 output_tensor_0: cos distance: 0.997000, overflow ratio: 0.000045 714 output_tensor_0: cos distance: 0.987340, overflow ratio: 0.000045 716 input_tensor_1: cos distance: 0.953653, overflow ratio: 0.333344 716 output_tensor_0: cos distance: 0.961789, overflow ratio: 0.000000 730 output_tensor_0: cos distance: 0.999155, overflow ratio: 0.000003 783 input_tensor_0: cos distance: 0.996334, overflow ratio: 0.000000 783 output_tensor_0: cos distance: 0.996334, overflow ratio: 0.000000 785 output_tensor_0: cos distance: 0.988708, overflow ratio: 0.000000 786 input_tensor_1: cos distance: 0.918444, overflow ratio: 0.699968 786 output_tensor_0: cos distance: 0.909960, overflow ratio: 0.000000 788 input_tensor_1: cos distance: 1.000000, overflow ratio: 1.000000 788 output_tensor_0: cos distance: 0.909960, overflow ratio: 0.000000 795 input_tensor_0: cos distance: 0.998407, overflow ratio: 0.000055 795 output_tensor_0: cos distance: 0.998407, overflow ratio: 0.000055 797 output_tensor_0: cos distance: 0.993247, overflow ratio: 0.000055 799 input_tensor_1: cos distance: 1.000000, overflow ratio: 0.166672 799 output_tensor_0: cos distance: 0.995202, overflow ratio: 0.000004 814 input_tensor_0: cos distance: 1.000004, overflow ratio: 0.005462 814 output_tensor_0: cos distance: 0.999964, overflow ratio: 0.000023 817 input_tensor_0: cos distance: 0.995616, overflow ratio: 0.000054 817 output_tensor_0: cos distance: 0.990109, overflow ratio: 0.000026 820 output_tensor_0: cos distance: 0.990457, overflow ratio: 0.000000 823 input_tensor_0: cos distance: 0.996000, overflow ratio: 0.000011 823 output_tensor_0: cos distance: 0.991474, overflow ratio: 0.000004 826 output_tensor_0: cos distance: 0.981345, overflow ratio: 0.000002 829 output_tensor_0: cos distance: 0.994768, overflow ratio: 0.003006 832 output_tensor_0: cos distance: 0.988064, overflow ratio: 0.000000 835 output_tensor_0: cos distance: 0.992485, overflow ratio: 0.000000 838 output_tensor_0: cos distance: 0.983760, overflow ratio: 0.000004 841 output_tensor_0: cos distance: 0.997717, overflow ratio: 0.000567 844 output_tensor_0: cos distance: 0.987962, overflow ratio: 0.000000 847 output_tensor_0: cos distance: 0.992817, overflow ratio: 0.000002 850 output_tensor_0: cos distance: 0.984299, overflow ratio: 0.000002 Geometry_UnaryOp39 output_tensor_0: cos distance: 0.998984, overflow ratio: 0.000020 Geometry_UnaryOp41 output_tensor_0: cos distance: 0.999521, overflow ratio: 0.000100 Geometry_UnaryOp44 input_tensor_0: cos distance: 0.998648, overflow ratio: 0.000010 Geometry_UnaryOp44 output_tensor_0: cos distance: 0.999641, overflow ratio: 0.003844 Geometry_UnaryOp50 input_tensor_0: cos distance: 0.998955, overflow ratio: 0.000024 Geometry_UnaryOp50 output_tensor_0: cos distance: 0.999617, overflow ratio: 0.002672 Geometry_UnaryOp56 input_tensor_0: cos distance: 0.999096, overflow ratio: 0.000000 Geometry_UnaryOp56 output_tensor_0: cos distance: 0.999682, overflow ratio: 0.001699 [04:57:36] /home/xx/MNN/tools/quantization/quantized.cpp:58: Quantize model done!

总体而言,int8量化的结果还是很不错的,每层网络fp32和int8的余弦距离基本都在99%左右。

转化结束后,所有模型大小如下:

如果对模型体积有精神洁癖,又不在乎那几个点的损失,强烈推荐使用int8模型,才900多k,在armv8上推理也比fp16的模型快17%左右。

六、推理及结果

这里真没啥好讲的,友友们看我提供在仓库里面的代码即可,这里简单挑两个细节说说就行,一个是mnnd的nms方式,是不用进行anchor配齐和grid匹配等操作,因为导出就已经处理掉了。

另一个是mnne的nms方式,这个更狠,每行张量的最后一个元素就是我们需要的class_id,很适合那些刚入门又不太想花时间去折腾C++的友友们。

展示下mnnd的推理耗时和结果,首先是静态图的方式:

咦… 等等,貌似有点不太对劲,精度不对啊,咋回事?

这个时候我们又要排查一圈,好在我们已经确定onnx模型是正确的,那问题出在哪?

找了几分钟,原来出在了 backendConfig.precision 这个参数上,我们重新翻回官方的指导手册,看到如下解释:

如果我们要使用回正常的精度,这时候就需要更改这个参数:

backendConfig.precision = MNN::BackendConfig::Precision_Normal; // 全精度 backendConfig.precision = MNN::BackendConfig::Precision_Low_BF16; // 半精度

实测使用全精度后,在树莓派推理静态图片只有12帧,但使用半精度后,基本每个obj的score都降低了2-6个百分点,那如果我们又想快,又想保证精度,咋办?

这时候我们可以在box.score这里做一些骚操作:

box[i].score += 0.04

此时得到一系列的对比如下:

为啥是这样?别问,照做就行了,你个小憨憨… 以上帧率计算包括了【图解码,前处理,向前推理,后处理,图保存】这五个过程,而并不是简单的向前推理。

接着是调用摄像头,带窗口显示的推理方式: 可以看到还是能勉强满足实时性的要求,帧率计算包括了【取帧,帧解码,前处理,向前推理,后处理,窗口显示】这六个过程。

以上测试数据均为预热三分钟后的结果!

七、优化 是否还能更快? 友友们按照官方提供的教程和文档,理论上还可以再加速10%-20%,但需要使用到量化后的模型,此处不提供教程,想留个悬念给各位,也非常欢迎你们能PR这个idea。

官方手册:常见问题与解答 - MNN-Doc 2.1.1 documentatio

我还想更快,怎么搞?

具体问题具体分析,举个例子,去年毕设中遇到这么一个业务场景,需要检测电梯中的电动车,那既然是电梯内,一般都属于大尺度目标,近距离场景的检测,这样我们可以把小尺度目标的feature进行cell skip,如下: 也就是对于小尺度的feature,我不需要每个cell都去预测,毕竟目标这么大,总有一些不长眼的cell采坑,为了降低计算量,每隔一个cell我才预测一次,但是这样的加速方法就是在赌你的枪有没有子弹,燕大侠看了都摇头,对于小尺度和遮挡严重的场景不适用,因此没有放入仓库中,伪代码如下:

// 对提取出来的三个图层进行 decode,函数如下 decode_infer(MNN::Tensor & data, int stride, const yolocv::YoloSize &frame_size, int net_size, int num_classes, const std::vector &anchors, float threshold, int skip) { std::vector result; int batchs, channels, height, width, pred_item ; batchs = data.shape()[0]; channels = data.shape()[1]; height = data.shape()[2]; width = data.shape()[3]; pred_item = data.shape()[4]; auto data_ptr = data.host(); for(int bi=0; bi


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3