Jetson nano + yolov5 + TensorRT加速+调用usb摄像头

您所在的位置:网站首页 st摄像头 Jetson nano + yolov5 + TensorRT加速+调用usb摄像头

Jetson nano + yolov5 + TensorRT加速+调用usb摄像头

#Jetson nano + yolov5 + TensorRT加速+调用usb摄像头| 来源: 网络整理| 查看: 265

目录 前言一、环境安装1、安装虚拟环境virtualenv(可选)2、设置cuda环境变量,解决nvcc -V找不到命令3、更新4、安装pytorch 和 torchvision5、安装yolov5必须环境 二、TensorRt加速三、调用usb摄像头总结

前言

因为在做毕业设计,需要将yolov5移植到嵌入式开发板。以前在Firefly-RK3399的Tengine框架上部署过Mobilenet-SSD,也玩过树莓派。这次尝试使用Jetson nano部署yolov5(4.0版本)。可惜更新换代太快,网上的教程比较旧,上个月又出了yolov5(5.0版),踩了很多坑,让我下定决心要更新一版教程。 历时半个月,尝试了网上大多数方案,重装了八次,总结出以下最优方案。(2021-5-8)

本文忽略镜像烧写过程,因为太简单了。

一、环境安装 1、安装虚拟环境virtualenv(可选)

我除了第一次害怕环境出错外,后面几次都没有安装。主要是怕麻烦。

sudo pip3 install virtualenv virtualenvwrapper

修改环境变量

nano ~/.bashrc

添加下面三行到最下面

export WORKON_HOME=$HOME/.virtualenvs export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3 export /usr/local/bin/virtualenvwrapper.sh

接着创建并进入虚拟环境,与原生的OpenCV建立链接

#创建一个叫yolov5的环境,你也可以叫其他名: mkvirtualenv yolov5 #激活环境: workon yolov5 #找到原生的OpenCV位置(寻找 .so 档案,大家应该都一样会在 /usr/lib/python3.6/dist-packages/cv2/python-3.6/ 里面): sudo find / -name cv2 #建立链接 ln -s /usr/lib/python3.6/dist-packages/cv2/python-3.6/cv2.cpython-36m-aarch64-linux-gnu.so ./virtualenvs/你的虚拟环境名/lib/python3.6/site-packages

附上virtualenv常用命令

#创建环境 mkvirtualenv name #在~/.virtualenvs下安装name的需拟环境 #激活工作环境(workon后不加任何东西可以列出所有虚拟环境) workon name #退出当前虚拟环境 deactivate #删除虚拟环境,需先退出 rmvirtualenv name #列出所有虚拟环境 lsvirtualenv #列出所有已安装的包 lssitepackages #如果想看主环境的已安装包用(在主环境下执行) dpkg -l | grep nvinfer 2、设置cuda环境变量,解决nvcc -V找不到命令 cd ~ vim .bashrc

末尾添加下面三行代码

export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export CUDA_ROOT=/usr/local/cuda

激活

source .bashrc 3、更新 sudo apt-get update 4、安装pytorch 和 torchvision

最好对照官网的教程操作,重点留意版本号 https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-8-0-now-available/72048 这里我把官方教程搬运过来讲解:

#先把官方的torch包下载下来,这里我是魔法上网下载的,因为实在太慢了,如果换成清华源的话找都找不到。自己下的话要注意版本号,我官方镜像自带的是python3.6.9所以是cp36,还要注意架构是Aarch64,不要下错了。就算下错了也没关系,笑死,根本装不上。 wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch-1.8.0-cp36-cp36m-linux_aarch64.whl #安装需要的包 sudo apt-get install python3-pip libopenblas-base libopenmpi-dev pip3 install Cython pip3 install numpy #安装刚才下载的torch包 pip3 install torch-1.8.0-cp36-cp36m-linux_aarch64.whl

验证是否成功

python3 >>>import torch >>>print(torch.__version__) #出现版本号就算成功

接下来安装torchvision。下面是对应版本号 在这里插入图片描述

#先安装必要的包 sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev #原谅我这里又用了魔法上网(version是pytorch对应torchvision的版本号,我这里填的是 v0.9.0) git clone --branch https://github.com/pytorch/vision torchvision #进入到上一步下载的文件夹中 cd torchvision #设置临时变量(x就是你的版本号,我这里是9) export BUILD_VERSION=0.x.0 #开始安装(注意这里--user是安装在主环境,如果是虚拟环境要改为 --你的环境名) python3 setup.py install --user

验证是否成功

python3 >>>import torchvision >>>print(torchversion.__version__) #出现版本号就算成功 5、安装yolov5必须环境

接着安装scipy,首先要安装前面这三样,不然会出错

sudo apt-get install liblapack-dev sudo apt-get install libblas-dev sudo apt-get install gfortran pip3 install scipy #接着安装剩下的包(超级慢) pip3 install matplotlib pillow pyyaml tensorboard tqdm #安装seaborn时会自动安装matplotlib和pandas pip3 install seaborn

恭喜,完成上面这几部步就可以跑yolov5了,我试过3.0 & 4.0 & 5.0 版本都可以跑。自己下载https://github.com/ultralytics/yolov5.git运行python3 detect.py试一下,这里不详细说明了。 执行上面基本忽略使用魔法的时间就需要3.5个小时,希望你还有耐心听我讲完tensorrt的加速步骤。

二、TensorRt加速

我用的是tensorrtx,附上下载地址:https://github.com/wang-xinyu/tensorrtx.git 你也可以尝试jetson-inference,我试过可以,不过安装过程繁琐,可以参考这位大神的教程:https://blog.csdn.net/qianbin3200896/article/details/108949723 如果你想要jetson-inference,而同时又用了虚拟环境,需要注意建立jetson和虚拟环境的软链接,参考这位大神的5.4的方法三。https://blog.csdn.net/u011119817/article/details/99679350

言归正传, tensorrtx上有他自己官方的教程,这里我搬运过来:

#1、需要安装pycuda pip3 install pycuda #2、将tensorrtx/yolov5/gen_wts.py复制到之前(上面的第5步)你下载的yolov5文件夹中。如果你上一步没有下载,也可以现在下载 git clone https://github.com/ultralytics/yolov5.git #3、下载官方的权重文件'yolov5s.pt',也可以用自己的。我用的是yolov5(4.0版本)训练出来的模型。 #4、执行gen_wts.py生成.wts文件。 python gen_wts.py yolov5s.pt #5、接下来去到目录tensorrtx下的yolov5文件夹 #老规矩,创建一个build文件,并生成生成makeFile mkdir build cd build cmake .. #6、将yololayer.h里的CLASS_NUM修改成你的。因为官方用的是coco数据集,所以默认是80。 #7、执行makeFile。(每次修改为CLASS_NUM都要make一次) make #8、将第4步生成的.wts文件复制到tensorrtx/yolov5里。 #9、生成.engine文件(我用的是yolov5s,所以结尾用s) sudo ./yolov5 -s ../yolov5s.wts yolov5s.engine s 如果你训练时是自定义depth_multiple 和 width_multiple就这样写: sudo ./yolov5 -s ../yolov5.wts yolov5.engine c 0.17 0.25 在tensorrtx 5.0里也更新了yolov5的P6模型: sudo ./yolov5 -s ../yolov5.wts yolov5.engine s6 #10、用他自带的图片(在samples里有两张图片)测试一下 sudo ./yolov5 -d yolov5s.engine ../samples

你也可以用自带的python文件来测试,不过我执行报错

python yolov5_trt.py 三、调用usb摄像头

我写的是c++,如果你用python请参考这位大神https://blog.csdn.net/weixin_45569617/article/details/108145046

在c++中,tensorrtx官方并没有给出调用摄像头的代码,我在github找到一位大神的代码,不过版本比较旧,于是在他的基础上稍作修改。里面我删掉了生成.engine文件的代码,需要你执行完上面的步骤生成了.engine再看下面的代码。 用下面代码替换掉tensorrtx\yolov5\yolov5.cpp

#include #include #include "cuda_utils.h" #include "logging.h" #include "common.hpp" #include "utils.h" #include "calibrator.h" #define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32 #define DEVICE 0 // GPU id #define NMS_THRESH 0.4 #define CONF_THRESH 0.5 #define BATCH_SIZE 1 // stuff we know about the network and the input/output blobs static const int INPUT_H = Yolo::INPUT_H; static const int INPUT_W = Yolo::INPUT_W; static const int CLASS_NUM = Yolo::CLASS_NUM; static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1 const char* INPUT_BLOB_NAME = "data"; const char* OUTPUT_BLOB_NAME = "prob"; static Logger gLogger; char *my_classes[]={ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" }; static int get_width(int x, float gw, int divisor = 8) { //return math.ceil(x / divisor) * divisor if (int(x * gw) % divisor == 0) { return int(x * gw); } return (int(x * gw / divisor) + 1) * divisor; } static int get_depth(int x, float gd) { if (x == 1) { return 1; } else { return round(x * gd) > 1 ? round(x * gd) : 1; } } ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) { INetworkDefinition* network = builder->createNetworkV2(0U); // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W }); assert(data); std::map weightMap = loadWeights(wts_name); /* ------ yolov5 backbone------ */ auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0"); auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1"); auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2"); auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3"); auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4"); auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5"); auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6"); auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7"); auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8"); /* ------ yolov5 head ------ */ auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9"); auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10"); auto upsample11 = network->addResize(*conv10->getOutput(0)); assert(upsample11); upsample11->setResizeMode(ResizeMode::kNEAREST); upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions()); ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) }; auto cat12 = network->addConcatenation(inputTensors12, 2); auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13"); auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14"); auto upsample15 = network->addResize(*conv14->getOutput(0)); assert(upsample15); upsample15->setResizeMode(ResizeMode::kNEAREST); upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions()); ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) }; auto cat16 = network->addConcatenation(inputTensors16, 2); auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17"); // yolo layer 0 IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]); auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18"); ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) }; auto cat19 = network->addConcatenation(inputTensors19, 2); auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20"); //yolo layer 1 IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]); auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21"); ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) }; auto cat22 = network->addConcatenation(inputTensors22, 2); auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23"); IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]); auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2); yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME); network->markOutput(*yolo->getOutput(0)); // Build engine builder->setMaxBatchSize(maxBatchSize); config->setMaxWorkspaceSize(16 * (1


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3