YOLOv3

2024-04-01 11:49| 来源: 网络整理| 查看: 265

YOLOv3 – Deep Learning Based Object Detection – YOLOv3 with OpenCV ( Python / C++ )

Sunita Nayak August 20, 2018 82 Comments Deep Learning Object Detection OpenCV OpenCV DNN YOLO

August 20, 2018 By 82 Comments

In this post, we will understand what is Yolov3 and learn how to use YOLOv3 — a state-of-the-art object detector — with OpenCV.

YOLOv3 is the latest variant of a popular object detection algorithm YOLO – You Only Look Once. The published model recognizes 80 different objects in images and videos, but most importantly, it is super fast and nearly as accurate as Single Shot MultiBox (SSD).

Starting with OpenCV 3.4.2, you can easily use YOLOv3 models in your own OpenCV application.

This post mainly focuses on inference, but you can also find our tutorial on training YOLOv3 model on your dataset.

YOLO Master Post – Every Model Explained Unlock the full story behind all the YOLO models’ evolutionary journey: Dive into our extensive pillar post, where we unravel the evolution from YOLOv1 to YOLO-NAS. This essential guide is packed with insights, comparisons, and a deeper understanding that you won’t find anywhere else.Don’t miss out on this comprehensive resource — Mastering All Yolo Models for a richer, more informed perspective on the YOLO series.

Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024) How does YOLO work?

We can think of an object detector as a combination of an object locator and an object recognizer.

Traditional computer vision approaches used a sliding window to look for objects at different locations and scales. Because this was such an expensive operation, the aspect ratio of the object was usually assumed to be fixed.

Early Deep Learning based object detection algorithms like the R-CNN and Fast R-CNN used a method called Selective Search to narrow down the number of bounding boxes that the algorithm had to test.

Another approach called Overfeat involved scanning the image at multiple scales using sliding windows-like mechanisms done convolutionally.

This was followed by Faster R-CNN, which used a Region Proposal Network (RPN) to identify bounding boxes that needed testing. By clever design, the features extracted for recognizing objects were also used by the RPN for proposing potential bounding boxes, thus saving a lot of computation.

YOLO, on the other hand, approaches the object detection problem in a completely different way. It forwards the whole image only once through the network. SSD is another object detection algorithm that forwards the image through a deep learning network, but YOLOv3 is much faster than SSD while achieving comparable accuracy. YOLOv3 gives faster than real-time results on a M40, TitanX or 1080 Ti GPUs.

Master Generative AI for CV Get expert guidance, insider tips & tricks. Create stunning images, learn to fine tune diffusion models, advanced Image editing techniques like In-Painting, Instruct Pix2Pix and many more Learn More

Let’s see how YOLO detects the objects in a given image.

First, it divides the image into a 13×13 grid of cells. The size of these 169 cells varies depending on the input size. For a 416×416 input size that we used in our experiments, the cell size was 32×32. Each cell is then responsible for predicting the number of boxes in the image.

For each bounding box, the network also predicts the confidence that the bounding box actually encloses an object, and the probability of the enclosed object being a particular class.

Most of these bounding boxes are eliminated because their confidence is low or because they are enclosing the same object as another bounding box with a very high confidence score. This technique is called non-maximum suppression.

The authors of YOLOv3, Joseph Redmon and Ali Farhadi have made YOLOv3 faster and more accurate than their previous work YOLOv2. YOLOv3 handles multiple scales better. They have also improved the network by making it bigger and taking it towards residual networks by adding shortcut connections.

Why use OpenCV for YOLO?

Here are a few reasons you may want to use OpenCV for YOLO

Easy integration with an OpenCV application: If your application already uses OpenCV and you want to use YOLOv3, you don’t have to worry about compiling and building the extra Darknet code. OpenCV CPU version is 9x faster: OpenCV’s CPU implementation of the DNN module is astonishingly fast. For example, Darknet, when used with OpenMP takes about 2 seconds on a CPU for inference on a single image. In contrast, OpenCV’s implementation runs in a mere 0.22 seconds! Check out table below. Python support: Darknet is written in C and does not officially support Python. In contrast, OpenCV does. There are python ports available for Darknet, though. Speed Test for YOLOv3 on Darknet and OpenCV

The following table shows the performance of YOLOv3 on Darknet vs. OpenCV. The input size in all cases is 416×416. It is not surprising the GPU version of Darknet outperforms everything else. It is also not suprising that Darknet with OpenMP works much better than Darknet without OpenMP because OpenMP enables use of multiple processors.

What is indeed surprising is that OpenCV’s CPU implementation of DNN is 9x faster than Darknet with OpenML.

Note: We ran into problems using OpenCV’s GPU implementation of the DNN. The documentation indicates that it is tested only with Intel’s GPUs, so the code would switch you back to CPU, if you do not have an Intel GPU.

Object Detection using YOLOv3 in C++/Python

Let us now see how to use YOLOv3 in OpenCV to perform object detection.

Step 1 : Download the models

We will start by downloading the models using the script file getModels.sh from the command line.

sudo chmod a+x getModels.sh ./getModels.sh

This will download the yolov3.weights file (containing the pre-trained network’s weights), the yolov3.cfg file (containing the network configuration) and the coco.names file, which contains the 80 different class names used in the COCO dataset.

Step 2 : Initialize the parameters

The YOLOv3 algorithm generates bounding boxes as the predicted detection outputs. Every predicted box is associated with a confidence score. In the first stage, all the boxes below the confidence threshold parameter are ignored for further processing.

The rest of the boxes undergo non-maximum suppression, removing redundant, overlapping bounding boxes. Non-maximum suppression is controlled by a parameter nmsThreshold. You can try to change these values and see how the number of output predicted boxes changes.

Next, the default values for the input width (inpWidth) and height (inpHeight) for the network’s input image are set. We set each of them to 416 to compare our runs to the Darknet’s C code given by YOLOv3’s authors. You can change both of them to 320 to get faster results or 608 to get more accurate results.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Download Code

cropped-favicon-512x512.png – LearnOpenCV – LearnOpenCV

Click here to download the source code to this post Python # Initialize the parameters confThreshold = 0.5 #Confidence threshold nmsThreshold = 0.4 #Non-maximum suppression threshold inpWidth = 416 #Width of network's input image inpHeight = 416 #Height of network's input image C++ // Initialize the parameters float confThreshold = 0.5; // Confidence threshold float nmsThreshold = 0.4; // Non-maximum suppression threshold int inpWidth = 416; // Width of network's input image int inpHeight = 416; // Height of network's input image Step 3 : Load the model and classes

The file coco.names contains all the objects for which the model was trained. We read class names.

Next, we load the network, which has two parts —

yolov3.weights : The pre-trained weights. yolov3.cfg : The configuration file.

We set the DNN backend to OpenCV here and the target to CPU. You could try setting the preferable target to cv.dnn.DNN_TARGET_OPENCL to run it on a GPU. But keep in mind that the current OpenCV version is tested only with Intel’s GPUs; it would automatically switch to CPU, if you do not have an Intel GPU.

Python # Load names of classes classesFile = "coco.names"; classes = None with open(classesFile, 'rt') as f: classes = f.read().rstrip('\n').split('\n') # Give the configuration and weight files for the model and load the network using them. modelConfiguration = "yolov3.cfg"; modelWeights = "yolov3.weights"; net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights) net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV) net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU) C++ // Load names of classes string classesFile = "coco.names"; ifstream ifs(classesFile.c_str()); string line; while (getline(ifs, line)) classes.push_back(line); // Give the configuration and weight files for the model String modelConfiguration = "yolov3.cfg"; String modelWeights = "yolov3.weights"; // Load the network Net net = readNetFromDarknet(modelConfiguration, modelWeights); net.setPreferableBackend(DNN_BACKEND_OPENCV); net.setPreferableTarget(DNN_TARGET_CPU); Step 4 : Read the input

In this step, we read the image, video stream, or webcam. In addition, we also open the video writer to save the frames with detected output bounding boxes.

Python outputFile = "yolo_out_py.avi" if (args.image): # Open the image file if not os.path.isfile(args.image): print("Input image file ", args.image, " doesn't exist") sys.exit(1) cap = cv.VideoCapture(args.image) outputFile = args.image[:-4]+'_yolo_out_py.jpg' elif (args.video): # Open the video file if not os.path.isfile(args.video): print("Input video file ", args.video, " doesn't exist") sys.exit(1) cap = cv.VideoCapture(args.video) outputFile = args.video[:-4]+'_yolo_out_py.avi' else: # Webcam input cap = cv.VideoCapture(0) # Get the video writer initialized to save the output video if (not args.image): vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 30, (round(cap.get(cv.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv.CAP_PROP_FRAME_HEIGHT)))) C++ outputFile = "yolo_out_cpp.avi"; if (parser.has("image")) { // Open the image file str = parser.get("image"); ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_yolo_out_cpp.jpg"); outputFile = str; } else if (parser.has("video")) { // Open the video file str = parser.get("video"); ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_yolo_out_cpp.avi"); outputFile = str; } // Open the webcaom else cap.open(parser.get("device")); catch(...) { cout frame; // Stop the program if reached end of video if (frame.empty()) { cout confThreshold) { int centerX = (int)(data[0] * frame.cols); int centerY = (int)(data[1] * frame.rows); int width = (int)(data[2] * frame.cols); int height = (int)(data[3] * frame.rows); int left = centerX - width / 2; int top = centerY - height / 2; classIds.push_back(classIdPoint.x); confidences.push_back((float)confidence); boxes.push_back(Rect(left, top, width, height)); } } } // Perform non maximum suppression to eliminate redundant overlapping boxes with // lower confidences vector indices; NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices); for (size_t i = 0; i < indices.size(); ++i) { int idx = indices[i]; Rect box = boxes[idx]; drawPred(classIds[idx], confidences[idx], box.x, box.y, box.x + box.width, box.y + box.height, frame); } }

The Non Maximum Suppression is controlled by the nmsThreshold parameter. If nmsThreshold is set too low, e.g. 0.1, we might not detect overlapping objects of same or different classes. But if it is set too high e.g. 1, then we get multiple boxes for the same object. So we used an intermediate value of 0.4 in our code above. The gif below shows the effect of varying the NMS threshold.

object detection using YOLOv3 in action - Image showing the effect of varying the NMS threshold, with bounding boxes drawn. assigned class and confidence scores labelled.

Figure 1: Effect of changing the parameter nmsThreshold

Step 4c : Draw the predicted boxes

Finally, we draw the boxes that were filtered through the non maximum suppression, on the input frame with their assigned class label and confidence scores.

Python # Draw the predicted bounding box def drawPred(classId, conf, left, top, right, bottom): # Draw a bounding box. cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3) label = '%.2f' % conf # Get the label for the class name and its confidence if classes: assert(classId < len(classes)) label = '%s:%s' % (classes[classId], label) #Display the label at the top of the bounding box labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1) top = max(top, labelSize[1]) cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED) C++ // Draw the predicted bounding box void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame) { //Draw a rectangle displaying the bounding box rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 178, 50), 3); //Get the label for the class name and its confidence string label = format("%.2f", conf); if (!classes.empty()) { CV_Assert(classId < (int)classes.size()); label = classes[classId] + ":" + label; } //Display the label at the top of the bounding box int baseLine; Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine); top = max(top, labelSize.height); rectangle(frame, Point(left, top - round(1.5*labelSize.height)), Point(left + round(1.5*labelSize.width), top + baseLine), Scalar(255, 255, 255), FILLED); putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0,0,0),1); } Subscribe & Download Code If you liked this article and would like to download code (C++ and Python) and example images used in this post, please click here. Alternately, sign up to receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Download Example Code

References:

YOLOv3 Tech Report

We used video clips from the following sources:Pixabay: [1], [2], [3], [4], [5], [6]Pexels: [2]

Tags: advantages of YOLOV3 blobFromImage boundingBox C++ cv.dnn cv.dnn.blobFromImage cv.dnn.DNN_BACKEND_OPENCV cv.dnn.NMSBoxes cv.dnn.readNetFromDarknet DNN_BACKEND_OPENCV inference NMSBoxes nmsThreshold non-maximumSuppression OpenCV performanceAnalysis Python readNetFromDarknet What is YOLOv3? YOLOv3 yolov3 architecture yolov3 github yolov3 model yolov3 object detection yolov3 paper yolov3 python yolov3 pytorch

Filed Under: Deep Learning, Object Detection, OpenCV, OpenCV DNN, YOLO

【本文地址】

YOLOv3

YOLOv3

今日新闻

推荐新闻