AI视觉实战1：实时人脸检测

您所在的位置：网站首页 › android实战监测数据的猫 › AI视觉实战1：实时人脸检测

AI视觉实战1：实时人脸检测

2023-05-06 03:40| 来源: 网络整理| 查看: 265

1. 背景

AI在视觉领域最常用的就是人脸检测、人脸识别、活体检测、人体与行为分析、图像识别、图像增强等，而且目前都是比较成熟的技术，不论商业化的Paas平台还是开源的模型，都几乎一抓一大把。一般的，AI开发过程有以下几步：

特征分析数据采集数据标注模型训练模型推理

推理可以在云端也可以在客户端，端云各有各的场景，比如一般把人脸检测放到客户端，把人脸识别放到云端。本系列我们主要介绍视觉方向模型推理的工程实践。

2. 项目介绍

我们基于谷歌开源项目mediapipe提供的的模型，在客户端部署运行进行推理，mediapipe提供了一下能力：

人脸检测(Face Detection)三维人脸网络模型(Face Mesh)虹膜检测(Iris)手势(Hands)姿态(Pose)全身姿态(Holistic)头发分隔(Hair Segmentation)对象检测(Object Detection)物体追踪(Box Tracking)即时移动检测(Instant Motion Tracking)ObjectronKNIFT...

mediapipe提供了bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu编译出来即可运行。我们这里移动端开发框架我们基于开源项目github.com/terryky/and… NDK运行和测量TensorFlow Lite GPU Delegate的性能。整体基于NativeActivity框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。移动端开发框架我们基于开源项目github.com/terryky/and… NDK运行和测量TensorFlow Lite GPU Delegate的性能。整体基于NativeActivity框架在进行摄像头采集后画面渲染和性能数据渲染。本文我们跑通实时人脸识别模型。

3. 了解NativeActivity

NativeActivity是为单独使用C|C++开发app提供的基类。纯C++开发Android应用，最后还是需要一个Java层的壳子，在Android提供的开发框架中，已经使用java开发好了一个中间类，我们使用C++开发的Native库之所以能运行，就是因为被这个中间类使用JNI的方式调用了，这个中间类就是NativeActivity。这个NativeActivity类的核心功能，就是在特定事件发生时，调用我们使用C++开发的Native库里的回调函数。比如在我们熟悉的生命周期函数NativeActivity.onStart中，调用C++开发的Native库的onStartNative函数：

protected void onStart() { super.onStart(); onStartNative(mNativeHandle); }

Native层Android为我们提供了两个接口：

native_activity.handroid_native_app_glue.h

android_native_app_glue.h封装了native_activity.h，我们直接实现void android_main(struct android_app* state)方法即可。

NativeActivity更多具体信息可以参考Android官方文档：GameActivity | Android 开发者 | Android Developers 。

4. 运行模型

我们选择的模型：storage.googleapis.com/mediapipe-a…

加载模型；摄像头预览纹理转换为RGBA将图像数据feed到模型引擎进行推理解析渲染结果

4.1 加载模型

首先我们要将模型文件读取到内存，我们的模型文件放置在Android工程的asset路径下，将文件加载到内存std::vector m_tflite_model_buf;：

bool asset_read_file (AAssetManager *assetMgr, char *fname, std::vector&buf) { AAsset* assetDescriptor = AAssetManager_open(assetMgr, fname, AASSET_MODE_BUFFER); if (assetDescriptor == NULL) { return false; } size_t fileLength = AAsset_getLength(assetDescriptor); buf.resize(fileLength); int64_t readSize = AAsset_read(assetDescriptor, buf.data(), buf.size()); AAsset_close(assetDescriptor); return (readSize == buf.size()); } asset_read_file (m_app->activity->assetManager, (char *)BLAZEFACE_MODEL_PATH, m_tflite_model_buf);

tflite提供了FlatBufferModel::BuildFromBuffer加载模型，返回tflite::FlatBufferModel类型的指针：

std::unique_ptr model = FlatBufferModel::BuildFromBuffer(model_buf, model_size)

加载完模型，通过模型创建推理引擎解释器tflite::Interpreter,tflite提供了InterpreterBuilder工具来构建tflite::Interpreter：

class InterpreterBuilder { public: InterpreterBuilder(const FlatBufferModel& model, const OpResolver& op_resolver);

需要传入模型model及OpResolver，OpResolver是个抽象接口，返回给定操作码或自定义操作名的tflite注册器。这是将flatbuffer模型中引用的操作被映射到可执行函数指针(TfLiteRegistrations)的机制。InterpreterBuilder重载了括号操作符：

TfLiteStatus operator()(std::unique_ptr* interpreter); TfLiteStatus operator()(std::unique_ptr* interpreter, int num_threads);

构建完InterpreterBuilder后创建tflite::Interpreter:

std::unique_ptr model; std::unique_ptr interpreter; tflite::ops::builtin::BuiltinOpResolver resolver; InterpreterBuilder(*model, resolver)(&interpreter)

InterpreterBuilder重载的括号操作符有两个，第二个有个线程数量的参数，我们也可以通过tflite::Interpreter的SetNumThreads手动设置：

int num_threads = std::thread::hardware_concurrency(); char *env_tflite_num_threads = getenv ("FORCE_TFLITE_NUM_THREADS"); if (env_tflite_num_threads) { num_threads = atoi (env_tflite_num_threads); DBG_LOGI ("@@@@@@ FORCE_TFLITE_NUM_THREADS=%d\n", num_threads); } DBG_LOG ("@@@@@@ TFLITE_NUM_THREADS=%d\n", num_threads); interpreter->SetNumThreads(num_threads);

接下来分配tensor空间：

// Update allocations for all tensors. This will redim dependent tensors // using the input tensor dimensionality as given. This is relatively // expensive. This *must be* called after the interpreter has been created // and before running inference (and accessing tensor buffers), and *must be* // called again if (and only if) an input tensor is resized. Returns status of // success or failure. TfLiteStatus AllocateTensors();

接下来解析引擎获取模型配置（主要是输入输出张量）：

int tflite_get_tensor_by_name (std::unique_ptr interpreter, int io, const char *name, tflite_tensor_t *ptensor) { memset (ptensor, 0, sizeof (*ptensor)); int tensor_idx; int io_idx = -1; int num_tensor = (io == 0) ? interpreter->inputs ().size() : interpreter->outputs().size(); for (int i = 0; i < num_tensor; i ++) { tensor_idx = (io == 0) ? interpreter->inputs ()[i] : interpreter->outputs()[i]; const char *tensor_name = interpreter->tensor(tensor_idx)->name; if (strcmp (tensor_name, name) == 0) { io_idx = i; break; } } if (io_idx < 0) { DBG_LOGE ("can't find tensor: "%s"\n", name); return -1; } void *ptr = NULL; TfLiteTensor *tensor = interpreter->tensor(tensor_idx); switch (tensor->type) { case kTfLiteUInt8: ptr = (io == 0) ? interpreter->typed_input_tensor (io_idx) : interpreter->typed_output_tensor(io_idx); break; case kTfLiteFloat32: ptr = (io == 0) ? interpreter->typed_input_tensor (io_idx) : interpreter->typed_output_tensor(io_idx); break; case kTfLiteInt64: ptr = (io == 0) ? interpreter->typed_input_tensor (io_idx) : interpreter->typed_output_tensor(io_idx); break; default: DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__); return -1; } ptensor->idx = tensor_idx; ptensor->io = io; ptensor->io_idx = io_idx; ptensor->type = tensor->type; ptensor->ptr = ptr; ptensor->quant_scale = tensor->params.scale; ptensor->quant_zerop = tensor->params.zero_point; for (int i = 0; (i < 4) && (i < tensor->dims->size); i ++) { ptensor->dims[i] = tensor->dims->data[i]; } return 0; } static tflite_tensor_t s_detect_tensor_input; static tflite_tensor_t s_detect_tensor_scores; static tflite_tensor_t s_detect_tensor_bboxes; tflite_get_tensor_by_name (&s_detect_interpreter, 0, "input", &s_detect_tensor_input); tflite_get_tensor_by_name (&s_detect_interpreter, 1, "regressors", &s_detect_tensor_bboxes); tflite_get_tensor_by_name (&s_detect_interpreter, 1, "classificators", &s_detect_tensor_scores);

根据模型配置可以读取支持输入图片宽高：

int det_input_w = s_detect_tensor_input.dims[2]; int det_input_h = s_detect_tensor_input.dims[1];

4.2 摄像头预览纹理转换为RGBA

将摄像头读取的纹理数据转换成RGBA模型才能识别，我们将纹理转换为内存数据：

unsigned char *buf_ui8 = NULL; static unsigned char *pui8 = NULL; if (pui8 == NULL) pui8 = (unsigned char *)malloc(w * h * 4); buf_ui8 = pui8; draw_2d_texture_ex (srctex, 0, win_h - h, w, h, RENDER2D_FLIP_V); glPixelStorei (GL_PACK_ALIGNMENT, 4); glReadPixels (0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buf_ui8);

需要想将摄像头读取的纹理绘制到帧缓存区，再通过OpenGL函数glReadPixels将纹理读取到内存缓存。

注意：glReadPixels是耗时操作

4.3 将图像数据feed到模型引擎进行推理

先通过上面获取的引起输入张量s_detect_tensor_input获取引起分配的输入缓存：

void * get_blazeface_input_buf (int *w, int *h) { *w = s_detect_tensor_input.dims[2]; *h = s_detect_tensor_input.dims[1]; return s_detect_tensor_input.ptr; }

将上面获取的图片内容转换成float，赋给输入张量：

float mean = 128.0f; float std = 128.0f; for (y = 0; y < h; y ++) { for (x = 0; x < w; x ++) { int r = *buf_ui8 ++; int g = *buf_ui8 ++; int b = *buf_ui8 ++; buf_ui8 ++; /* skip alpha */ *buf_fp32 ++ = (float)(r - mean) / std; *buf_fp32 ++ = (float)(g - mean) / std; *buf_fp32 ++ = (float)(b - mean) / std; } }

4.4 解析渲染结果

接下来调用解释器的Invoke()方法执行推理：

if (interpreter->Invoke() != kTfLiteOk) { DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__); return -1; }

接下来解析检测结果：

static int decode_bounds (std::list &face_list, float score_thresh, int input_img_w, int input_img_h) { face_t face_item; float *scores_ptr = (float *)s_detect_tensor_scores.ptr; int i = 0; for (auto itr = s_anchors.begin(); itr != s_anchors.end(); i ++, itr ++) { fvec2 anchor = *itr; float score0 = scores_ptr[i]; float score = 1.0f / (1.0f + exp(-score0)); if (score > score_thresh) { float *p = get_bbox_ptr (i); /* boundary box */ float sx = p[0]; float sy = p[1]; float w = p[2]; float h = p[3]; float cx = sx + anchor.x; float cy = sy + anchor.y; cx /= (float)input_img_w; cy /= (float)input_img_h; w /= (float)input_img_w; h /= (float)input_img_h; fvec2 topleft, btmright; topleft.x = cx - w * 0.5f; topleft.y = cy - h * 0.5f; btmright.x = cx + w * 0.5f; btmright.y = cy + h * 0.5f; face_item.score = score; face_item.topleft = topleft; face_item.btmright = btmright; /* landmark positions (6 keys) */ for (int j = 0; j < kFaceKeyNum; j ++) { float lx = p[4 + (2 * j) + 0]; float ly = p[4 + (2 * j) + 1]; lx += anchor.x; ly += anchor.y; lx /= (float)input_img_w; ly /= (float)input_img_h; face_item.keys[j].x = lx; face_item.keys[j].y = ly; } face_list.push_back (face_item); } } return 0; }

face_t封装了识别结果中的得分、左上、右下坐标：

typedef struct _face_t { float score; fvec2 topleft; fvec2 btmright; fvec2 keys[kFaceKeyNum]; } face_t;

通过坐标我们可以在识别到的“人脸”上绘制一个框：

5. 总结

本文介绍了常见的AI开发步骤，以及常用的AI视觉应用。通过人脸检测功能，了解了tensorflow lite加载模型、输入数据、执行推理、获取结果等常用接口。

【本文地址】

AI视觉实战1：实时人脸检测

AI视觉实战1：实时人脸检测

今日新闻

推荐新闻