使用深度学习和OpenCV实现实时目标检测

Real-time object detection with deep learning and OpenCV

 

 

Real-time object detection with deep learning and OpenCV

Today’s blog post is broken into two parts.

In the first part we’ll learn how to extend last week’s tutorial to apply real-time object detection using deep learning and OpenCV to work with video streams and video files. This will be accomplished using the highly efficient VideoStream  class discussed in this tutorial.

From there, we’ll apply our deep learning + object detection code to actual video streams and measure the FPS processing rate.

Object detection in video with deep learning and OpenCV

To build our deep learning-based real-time object detector with OpenCV we’ll need to (1) access our webcam/video stream in an efficient manner and (2) apply object detection to each frame.

To see how this is done, open up a new file, name it  real_time_object_detection.py  and insert the following code:

We begin by importing packages on Lines 2-8. For this tutorial, you will need imutils and OpenCV 3.3.

To get your system set up, simply install OpenCV using the relevant instructions for your system (while ensuring you’re following any Python virtualenv commands).

Note: Make sure to download and install opencv and and opencv-contrib releases for OpenCV 3.3. This will ensure that the deep neural network ( dnn) module is installed. You must have OpenCV 3.3 (or newer) to run the code in this tutorial.

Next, we’ll parse our command line arguments:

Compared to last week, we don’t need the image argument since we’re working with streams and videos — other than that the following arguments remain the same:

  • prototxt : The path to the Caffe prototxt file.
  • model : The path to the pre-trained model.
  • confidence : The minimum probability threshold to filter weak detections. The default is 20%.

We then initialize a class list and a color set:

On Lines 22-26 we initialize CLASS  labels and corresponding random COLORS . For more information on these classes (and how the network was trained), please refer to last week’s blog post.

Now, let’s load our model and set up our video stream:

We load our serialized model, providing the references to our prototxt and model files on Line 30— notice how easy this is in OpenCV 3.3.

Next let’s initialize our video stream (this can be from a video file or a camera). First we start the VideoStream  (Line 35), then we wait for the camera to warm up (Line 36), and finally we start the frames per second counter (Line 37). The VideoStream  and FPS  classes are part of my imutils  package.

Now, let’s loop over each and every frame (for speed purposes, you could skip frames):

First, we read  a frame  (Line 43) from the stream, followed by resizing it (Line 44).

Since we will need the width and height later, we grab these now on Line 47. This is followed by converting the frame  to a blob  with the dnn  module (Lines 48 and 49).

Now for the heavy lifting: we set the blob  as the input to our neural network (Line 53) and feed the input through the net  (Line 54) which gives us our detections .

At this point, we have detected objects in the input frame. It is now time to look at confidence values and determine if we should draw a box + label surrounding the object– you’ll recognize this code block from last week:

We start by looping over our detections , keeping in mind that multiple objects can be detected in a single image. We also apply a check to the confidence (i.e., probability) associated with each detection. If the confidence is high enough (i.e. above the threshold), then we’ll display the prediction in the terminal as well as draw the prediction on the image with text and a colored bounding box. Let’s break it down line-by-line:

Looping through our detections , first we extract the confidence  value (Line 60).

If the confidence  is above our minimum threshold (Line 64), we extract the class label index (Line 68) and compute the bounding box coordinates around the detected object (Line 69).

Then, we extract the (x, y)-coordinates of the box (Line 70) which we will will use shortly for drawing a rectangle and displaying text.

We build a text label  containing the CLASS  name and the confidence  (Lines 73 and 74).

Let’s also draw a colored rectangle around the object using our class color and previously extracted (x, y)-coordinates (Lines 75 and 76).

In general, we want the label to be displayed above the rectangle, but if there isn’t room, we’ll display it just below the top of the rectangle (Line 77).

Finally, we overlay the colored text onto the frame  using the y-value that we just calculated (Lines 78 and 79).

The remaining steps in the frame capture loop involve (1) displaying the frame, (2) checking for a quit key, and (3) updating our frames per second counter:

The above code block is pretty self-explanatory — first we display the frame (Line 82). Then we capture a key press (Line 83) while checking if the ‘q’ key (for “quit”) is pressed, at which point we break out of the frame capture loop (Lines 86 and 87).

Finally we update our fps counter (Line 90).

If we break out of the loop (‘q’ key press or end of the video stream), we have some housekeeping to take care of:

When we’ve exited the loop, we stop the fps  counter (Line 93) and print information about the frames per second to our terminal (Lines 94 and 95).

We close the open window (Line 98) followed by stopping the video stream (Line 99).

If you’ve made it this far, you’re probably ready to give it a try with your webcam — to see how it’s done, let’s move on to the next section.

Real-time deep learning object detection results

From there, open up a terminal and execute the following command:

Provided that OpenCV can access your webcam you should see the output video frame with any detected objects. I have included sample results of applying deep learning object detection to an example video below:

Figure 1: A short clip of real-time object detection with deep learning and OpenCV + Python.

Notice how our deep learning object detector can detect not only myself (a person), but also the sofa I am sitting on and the chair next to me — all in real-time!

 

Summary

In today’s blog post we learned how to perform real-time object detection using deep learning + OpenCV + video streams.

 

推荐文章

沪公网安备 31010702002009号