2022/09/07 github上更新了cpu版本的推断。


首先肯定是要装好onnxruntime的,我这里是gpu版本的onnxruntime,还有torch,torchvision,opencv-python以及相应的cuda工具包,这里就不赘述了。 思路为以下: 1.对图片进行预处理,转换为onnx模型的输入尺寸。 2.进行推断得到所有框之后,使用non_max_suppression去掉所有不符合条件的框,也就是根据confidence和iou分数来去掉分数不够的框和重叠多余的框。 3.把框框的坐标转换为原始图片尺寸的坐标(因为这个图片已经被预处理转换尺寸过了) 4.根据坐标以及标签名称在原始图片上进行标注并保存(使用opencv和cv2.putText方法)



1.加载onnx模型 # detect.py device = select_device(device) model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half) stride, names, pt = model.stride, model.names, model.pt imgsz = check_img_size(imgsz, s=stride) # check image size

第一行device,很明显是在问你是用cpu还是gpu跑呢,因为我们有cuda,所以肯定是用cuda来跑,那么这个device就要改成cuda。 第二行DetectMultiBackend,yolov5的最新源码(2022.8月份)中是通过这个函数实现多模型格式的装载,这个和以前是不一样的,你在网上找到的一些2021年的yolov5项目中,这一段可能是attempt_load函数。所以需要具体细看这个函数里面到底是怎么实现的。 第三行就是定义模型的一些基本信息,stride一般都是32,names就是模型的标签名,pt就是你是不是用pytorch的pt权重来进行推断,我们既然是onnx,那pt=False,这个很重要。 第四行是检查输入图片的尺寸是否是32的倍数,这个是yolov5训练的时候就会这么要求的,必须为32的倍数,如果不是要进行尺寸转换。 然后让我们细看一下这个DetectMultiBackend:

# common.py elif onnx: # ONNX Runtime LOGGER.info(f'Loading {w} for ONNX Runtime inference...') cuda = torch.cuda.is_available() check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime')) import onnxruntime providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider'] session = onnxruntime.InferenceSession(w, providers=providers) meta = session.get_modelmeta().custom_metadata_map # metadata if 'stride' in meta: stride, names = int(meta['stride']), eval(meta['names'])

这个函数位于models文件夹下的common.py文件里,因为前面过长就不放了,我们只看感兴趣的onnx部分, 首先是判断gpu版本的onnxruntime有没有装,cuda能不能用,如果能用就使用’CUDAExecutionProvider’来进行推断,所以这一段代码就是用来加载onnx模型的。

2.对图片进行预处理 # detect.py if webcam: view_img = check_imshow() cudnn.benchmark = True # set True to speed up constant image size inference dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt) bs = len(dataset) # batch_size else: dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt) bs = 1 # batch_size

因为我们是只对单张图片进行推断,所以肯定是使用了else循环下的LoadImages函数,这里涉及到两个参数img_size和auto: img_size是onnx模型的输入尺寸,一般来说,默认导出的是(640,640),这个值在你导出onnx模型的时候是可以修改的。 auto是你是不是用了pt,我们是onnx,所以auto应该等于False。 接下来看这个函数的具体实现(在utils下的dataloaders.py里):

# dataloaders.py else: # Read image self.count += 1 img0 = cv2.imread(path) # BGR assert img0 is not None, f'Image Not Found {path}' s = f'image {self.count}/{self.nf} {path}: ' # Padded resize img = letterbox(img0, self.img_size, stride=self.stride, auto=self.auto)[0] # Convert img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB img = np.ascontiguousarray(img)


3.进行推断 # detect.py im = torch.from_numpy(im).to(device) im = im.half() if model.fp16 else im.float() # uint8 to fp16/32 im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim t2 = time_sync() dt[0] += t2 - t1 # Inference visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False pred = model(im, augment=augment, visualize=visualize)


# common.py elif self.onnx: # ONNX Runtime im = im.cpu().numpy() # torch to numpy y = self.session.run([self.session.get_outputs()[0].name], {self.session.get_inputs()[0].name: im})[0]




# detect.py pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

这里pred是第三步得到的预测,conf_thres默认值为0.25,iou_thres默认值为0.45,小于前面两个值的框都会被去掉,classes是你指定要对哪一类进行预测,默认是false,agnostic_nms默认是false,应该是一种别的nms的方法,这里就不细讲了,有挺多不同nms的方法,max_det默认是300,应该表示最多的框数。 然后这个函数是在utils的general.py里,具体实现有点复杂,我们可以按需删除我们不需要的功能,比如classes一般用不着,相关的就可以删掉。

# general.py box = xywh2xyxy(x[:, :4]) iou = box_iou(boxes[i], boxes)


5.对图片标注并保存 # detect.py for i, det in enumerate(pred): det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img0.shape).round() #initialize annotator annotator = Annotator(img0, line_width=3) #annotate the image for *xyxy, conf, cls in reversed(det): c = int(cls) # integer class label = f'{names[c]} {conf:.2f}' annotator.box_label(xyxy, label, color=colors(c, True))

这里我略作了修改,看的更清楚。 第一步就是把第四步得到的tensor里面的坐标转换为原始图片的尺寸,使用了scale_coords这个函数,然后标注使用的是Annotator这个类,在你定义好了labels之后就可以使用Annotator进行标注并保存。 其中scale_coords在utils的general.py文件下,Annotator在utils的plots.py下。





#inference only for onnx import onnxruntime import torch import torchvision import cv2 import numpy as np import time w = 'best.onnx' #文件名 请自行修改 cuda = torch.cuda.is_available() providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider'] session = onnxruntime.InferenceSession(w, providers=providers) #warmup to reduce the first inference time but useless in fact. # t1 = time.time() # im = torch.zeros((1,3,640,640), dtype=torch.float, device=torch.device('cuda')) # im = im.cpu().numpy() # torch to numpy # y = session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: im})[0] # t2 = time.time() # print(t2-t1) #preprocess img to array def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32): # Resize and pad image while meeting stride-multiple constraints shape = im.shape[:2] # current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) # Scale ratio (new / old) r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) if not scaleup: # only scale down, do not scale up (for better val mAP) r = min(r, 1.0) # Compute padding ratio = r, r # width, height ratios new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding if auto: # minimum rectangle dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding elif scaleFill: # stretch dw, dh = 0.0, 0.0 new_unpad = (new_shape[1], new_shape[0]) ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios dw /= 2 # divide padding into 2 sides dh /= 2 if shape[::-1] != new_unpad: # resize im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border return im def xywh2xyxy(x): # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x y[:, 1] = x[:, 1] - x[:, 3] / 2 # top left y y[:, 2] = x[:, 0] + x[:, 2] / 2 # bottom right x y[:, 3] = x[:, 1] + x[:, 3] / 2 # bottom right y return y def box_area(box): # box = xyxy(4,n) return (box[2] - box[0]) * (box[3] - box[1]) def box_iou(box1, box2, eps=1e-7): # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2) (a1, a2), (b1, b2) = box1[:, None].chunk(2, 2), box2.chunk(2, 1) inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2) # IoU = inter / (area1 + area2 - inter) return inter / (box_area(box1.T)[:, None] + box_area(box2.T) - inter + eps) def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, agnostic=False, max_det=300): bs = prediction.shape[0] # batch size xc = prediction[..., 4] > conf_thres # candidates # Settings # min_wh = 2 # (pixels) minimum box width and height max_wh = 7680 # (pixels) maximum box width and height max_nms = 30000 # maximum number of boxes into torchvision.ops.nms() redundant = True # require redundant detections merge = False # use merge-NMS output = [torch.zeros((0, 6), device = prediction.device)] * bs for xi, x in enumerate(prediction): # image index, image inference # Apply constraints # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height x = x[xc[xi]] # confidence # If none remain process next image if not x.shape[0]: continue # Compute conf x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf # Box (center x, center y, width, height) to (x1, y1, x2, y2) box = xywh2xyxy(x[:, :4]) # Detections matrix nx6 (xyxy, conf, cls) conf, j = x[:, 5:].max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] # Apply finite constraint # if not torch.isfinite(x).all(): # x = x[torch.isfinite(x).all(1)] # Check shape n = x.shape[0] # number of boxes if not n: # no boxes continue elif n > max_nms: # excess boxes x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence # Batched NMS c = x[:, 5:6] * (0 if agnostic else max_wh) # classes boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS if i.shape[0] > max_det: # limit detections i = i[:max_det] if merge and (1 1] # require redundancy output[xi] = x[i] return output def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None): # Rescale coords (xyxy) from img1_shape to img0_shape if ratio_pad is None: # calculate from img0_shape gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding else: gain = ratio_pad[0][0] pad = ratio_pad[1] coords[:, [0, 2]] -= pad[0] # x padding coords[:, [1, 3]] -= pad[1] # y padding coords[:, :4] /= gain clip_coords(coords, img0_shape) return coords def clip_coords(boxes, shape): # Clip bounding xyxy bounding boxes to image shape (height, width) if isinstance(boxes, torch.Tensor): # faster individually boxes[:, 0].clamp_(0, shape[1]) # x1 boxes[:, 1].clamp_(0, shape[0]) # y1 boxes[:, 2].clamp_(0, shape[1]) # x2 boxes[:, 3].clamp_(0, shape[0]) # y2 else: # np.array (faster grouped) boxes[:, [0, 2]] = boxes[:, [0, 2]].clip(0, shape[1]) # x1, x2 boxes[:, [1, 3]] = boxes[:, [1, 3]].clip(0, shape[0]) # y1, y2 class Annotator: def __init__(self, im, line_width=None): assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to Annotator() input images.' self.im = im self.lw = line_width or max(round(sum(im.shape) / 2 * 0.003), 2) # line width def box_label(self, box, label='', color=(128, 128, 128), txt_color=(255, 255, 255)): # Add one xyxy box to image with label p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3])) cv2.rectangle(self.im, p1, p2, color, thickness=self.lw, lineType=cv2.LINE_AA) if label: tf = max(self.lw - 1, 1) # font thickness w, h = cv2.getTextSize(label, 0, fontScale=self.lw / 3, thickness=tf)[0] # text width, height outside = p1[1] - h >= 3 p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3 cv2.rectangle(self.im, p1, p2, color, -1, cv2.LINE_AA) # filled cv2.putText(self.im, label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2), 0, self.lw / 3, txt_color, thickness=tf, lineType=cv2.LINE_AA) def rectangle(self, xy, fill=None, outline=None, width=1): # Add rectangle to image (PIL-only) self.draw.rectangle(xy, fill, outline, width) def text(self, xy, text, txt_color=(255, 255, 255)): # Add text to image (PIL-only) w, h = self.font.getsize(text) # text width, height self.draw.text((xy[0], xy[1] - h + 1), text, fill=txt_color, font=self.font) def result(self): # Return annotated image as array return np.asarray(self.im) class Colors: def __init__(self): # hex = matplotlib.colors.TABLEAU_COLORS.values() hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB', '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7') self.palette = [self.hex2rgb(f'#{c}') for c in hexs] self.n = len(self.palette) def __call__(self, i, bgr=False): c = self.palette[int(i) % self.n] return (c[2], c[1], c[0]) if bgr else c @staticmethod def hex2rgb(h): # rgb order (PIL) return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4)) colors = Colors() # create instance for 'from utils.plots import colors' img0 = cv2.imread('test.png') #自行修改文件名称 img = letterbox(img0, (640,640), stride=32, auto=False) #only pt use auto=True, but we are onnx img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB img = np.ascontiguousarray(img) im = torch.from_numpy(img).to(torch.device('cuda')) im = im.float() im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim im = im.cpu().numpy() # torch to numpy y = session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: im})[0] #inference onnx model to get the total output #non_max_suppression to remove redundant boxes y = torch.from_numpy(y).to(torch.device('cuda')) pred = non_max_suppression(y, conf_thres = 0.25, iou_thres = 0.45, agnostic= False, max_det=1000) #transform coordinate to original picutre size for i, det in enumerate(pred): det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img0.shape).round() print(det) #标签,请自行修改 names = ['nofall', 'fall'] #initialize annotator annotator = Annotator(img0, line_width=3) #annotate the image for *xyxy, conf, cls in reversed(det): c = int(cls) # integer class label = f'{names[c]} {conf:.2f}' annotator.box_label(xyxy, label, color=colors(c, True)) #自行修改文件名称 cv2.imwrite('test.png', img0)

这里说一下warmup部分,yolov5的detect.py会先给模型传入一个空向量来预加载,这样正式预测的时候延时就会变小,我这测试传入空向量的时间是0.73s,正式预测是0.01s。 但是如果不用warmup直接进行预测,时间也就等于以上两者总和,同时从第二次预测开始也都是0.01s起步,所以这个功能在我这个场景下好像没啥用就注释掉了。




