在termux中利用安卓的vulkan库加速NCNN

您所在的位置:网站首页 手机怎么打开gpu加速 在termux中利用安卓的vulkan库加速NCNN

在termux中利用安卓的vulkan库加速NCNN

2024-03-05 07:48| 来源: 网络整理| 查看: 265

termux是安卓手机上的一款终端模拟软件,运行在安卓的linux内核之上。在termux中我们也可以利用proot构建完整的linux发行版(如Ubuntu, Debian等),这样就可以在上面运行支持ARM64架构的软件啦,比如常用的Chromium, VS Code, WPS Office,配合xfce等桌面,用pad代替电脑简单办公不是梦!但是,linux的发行版一般基于GNU/Linux ,其使用的标准库glibc也与安卓系统内核使用的Bionic Libc有所不同,因此安卓系统自带的库文件(libOpenCL.so, libvulkan.so...)在linux发行版中是无法使用的。ps. 最近有一个有意思的项目libhybris可以帮助我们在GNU/Linux中加载安卓的库文件(主要是厂商闭源的HAL库),虽然项目还不是很成熟,可能会面临大量的报错,但这是在GNU/Linux下复用安卓基础设施是很有意思的尝试。

言归正传,在ncnn的wiki中其实包括了“Build for termux on android”一章,只不过这里面用的也是利用proot安装Ubuntu的做法(安装完Ubuntu后就和Build for linux一样),由于运行环境的不同,在proot里是没有办法调用设备的GPU来加速计算的,只能使用CPU来进行计算。要想利用设备自带的OpenCL/Vulkan来加速模型的推理,我们只能尝试在termux中编译ncnn,利用安卓内核来调用GPU。

1 Preliminary

首先从termux项目的github主页(https://github.com/termux/termux-app)release中下载最新的apk并安装,对安卓版本的要求是最低7.0。打开后,首先可以用termux-change-repo命令换到清华源。termux中自带的包管理器是pkg,用法和apt基本一致(同时termux也有apt,pkg向下兼容apt)。如使用

pkg install openssh passwd

安装ssh并设置密码后,就可以在电脑上使用终端连接手机,注意ssh默认的端口是8022,因为没有root权限手机是无法操作端口号=0则为GPU id,手机上则使用0。

作者这里使用的是一台2016年的老手机,麒麟655的Soc,8核A53的CPU+Mali T830MP2的GPU,性能上还远不如RK3399(A72大核+Mali T860MP4),但也可以运行vulkan模式和像vgg16这样的大网络。简单贴一下性能,在CPU4核模式下:

~/ncnn/benchmark $ ../build/benchmark/benchncnn 8 4 0 -1 1 loop_count = 8 num_threads = 4 powersave = 0 gpu_device = -1 cooling_down = 1 squeezenet min = 84.65 max = 86.07 avg = 85.24 squeezenet_int8 min = 82.10 max = 83.10 avg = 82.55 mobilenet min = 113.70 max = 117.65 avg = 115.38 mobilenet_int8 min = 86.20 max = 88.93 avg = 87.09 mobilenet_v2 min = 138.55 max = 156.88 avg = 152.12 mobilenet_v3 min = 88.80 max = 93.86 avg = 92.43 shufflenet min = 108.49 max = 112.89 avg = 109.60 shufflenet_v2 min = 66.64 max = 67.58 avg = 67.19 mnasnet min = 92.42 max = 145.46 avg = 122.95 proxylessnasnet min = 110.89 max = 113.93 avg = 112.48 efficientnet_b0 min = 180.27 max = 269.97 avg = 229.80 efficientnetv2_b0 min = 211.38 max = 213.64 avg = 212.44 regnety_400m min = 189.46 max = 193.84 avg = 191.28 blazeface min = 23.03 max = 26.67 avg = 23.75 googlenet min = 259.65 max = 264.35 avg = 260.75 googlenet_int8 min = 229.00 max = 251.55 avg = 234.80 resnet18 min = 199.28 max = 208.02 avg = 202.51 resnet18_int8 min = 158.06 max = 163.77 avg = 160.61 alexnet min = 269.18 max = 274.16 avg = 272.43 vgg16 min = 1155.47 max = 1235.34 avg = 1184.81 vgg16_int8 min = 747.65 max = 758.80 avg = 751.62 resnet50 min = 505.84 max = 580.00 avg = 531.49 resnet50_int8 min = 438.09 max = 440.49 avg = 438.78 squeezenet_ssd min = 225.10 max = 499.12 avg = 274.91 squeezenet_ssd_int8 min = 197.45 max = 216.89 avg = 202.77 mobilenet_ssd min = 234.62 max = 240.82 avg = 238.23 mobilenet_ssd_int8 min = 178.87 max = 180.79 avg = 179.63 mobilenet_yolo min = 489.46 max = 499.22 avg = 493.67 mobilenetv2_yolov3 min = 310.20 max = 452.35 avg = 335.04 yolov4-tiny min = 447.58 max = 536.35 avg = 495.77 nanodet_m min = 160.12 max = 168.65 avg = 163.86 yolo-fastest-1.1 min = 80.50 max = 89.89 avg = 82.72 yolo-fastestv2 min = 68.25 max = 80.10 avg = 76.25

在vulkan模式下:

~/ncnn/benchmark $ ../build/benchmark/benchncnn 8 4 0 0 1 [0 Mali-T830] queueC=0[2] queueG=0[2] queueT=0[2] [0 Mali-T830] bugsbn1=0 bugbilz=0 bugcopc=0 bugihfa=0 [0 Mali-T830] fp16-p/s/a=1/0/0 int8-p/s/a=1/0/0 [0 Mali-T830] subgroup=16 basic=0 vote=0 ballot=0 shuffle=0 loop_count = 8 num_threads = 4 powersave = 0 gpu_device = 0 cooling_down = 1 squeezenet min = 98.36 max = 101.33 avg = 100.14 squeezenet_int8 min = 93.73 max = 149.93 avg = 127.12 mobilenet min = 148.93 max = 150.30 avg = 149.69 mobilenet_int8 min = 134.03 max = 147.65 avg = 137.13 mobilenet_v2 min = 99.72 max = 101.14 avg = 100.44 mobilenet_v3 min = 90.05 max = 94.20 avg = 91.74 shufflenet min = 61.54 max = 65.67 avg = 62.83 shufflenet_v2 min = 76.51 max = 80.57 avg = 78.73 mnasnet min = 100.05 max = 101.79 avg = 100.77 proxylessnasnet min = 104.45 max = 107.86 avg = 106.07 efficientnet_b0 min = 147.30 max = 151.59 avg = 149.21 efficientnetv2_b0 min = 303.89 max = 417.94 avg = 338.40 regnety_400m min = 119.83 max = 122.73 avg = 121.04 blazeface min = 55.65 max = 59.06 avg = 57.05 googlenet min = 324.63 max = 327.42 avg = 325.84 googlenet_int8 min = 254.78 max = 399.30 avg = 329.25 resnet18 min = 337.40 max = 347.44 avg = 342.30 resnet18_int8 min = 178.98 max = 189.48 avg = 185.73 alexnet min = 277.58 max = 299.94 avg = 284.57 vgg16 min = 2690.19 max = 3180.77 avg = 2957.29 vgg16_int8 min = 748.23 max = 765.36 avg = 754.77 resnet50 min = 857.16 max = 1004.09 avg = 936.61 resnet50_int8 min = 435.07 max = 440.28 avg = 437.63 squeezenet_ssd min = 390.19 max = 456.46 avg = 409.24 squeezenet_ssd_int8 min = 196.69 max = 202.96 avg = 198.65 mobilenet_ssd min = 340.55 max = 378.76 avg = 363.61 mobilenet_ssd_int8 min = 179.10 max = 181.20 avg = 180.20 mobilenet_yolo min = 735.47 max = 819.14 avg = 778.76 mobilenetv2_yolov3 min = 370.43 max = 407.27 avg = 392.68 yolov4-tiny min = 757.71 max = 844.02 avg = 789.01 nanodet_m min = 154.20 max = 193.19 avg = 168.09 yolo-fastest-1.1 min = 67.22 max = 74.97 avg = 71.27 yolo-fastestv2 min = 60.79 max = 76.87 avg = 68.21

这个故事告诉我们…如果你是弱鸡的GPU,那还不如老老实实的用CPU吧!



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3