Object Detection with SIFT Features and Brute Force Matcher using OpenCV

Feature extraction and matching are two fundamental steps in object detection using computer vision techniques. Here's an overview of how these two steps work together:

1. Feature Extraction:
In this step, we extract features from an object in an image that are distinctive and robust to changes in scale, rotation, illumination, and other imaging conditions. A feature is a measurable and distinctive aspect of an object, such as the edges, corners, or texture patterns.

Feature extraction involves two key sub-steps: feature detection and feature description. Feature detection is the process of finding salient points or regions in an image that can be described by local features. Feature description involves computing a feature descriptor for each keypoint, which captures the local appearance and geometry of the region around the keypoint.

Examples of popular feature detection and description algorithms include SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features), ORB (Oriented FAST and Rotated BRIEF), and AKAZE (Accelerated-KAZE).

2. Matching:
In this step, we compare the features extracted from the object in the input image with those extracted from a reference image (i.e., the object template) to find the correspondences between them. The goal is to find a set of matches that accurately and robustly align the object in the input image with the object template.

Matching algorithms typically use distance metrics (e.g., Euclidean distance, Hamming distance, cosine similarity) to measure the similarity between the feature descriptors. One common approach is to use the nearest-neighbor rule, where we match each feature in the input image with the closest feature in the reference image.

However, due to the presence of noise, occlusion, and other factors, not all matches will be correct. Hence, we need to filter out the incorrect matches and retain only the good matches. This can be done using various techniques, such as the ratio test, RANSAC (Random Sample Consensus), or machine learning-based approaches.

By combining these two steps, we can detect the presence and location of an object in an input image by comparing its features with those of a reference object template. The accuracy and robustness of the detection depend on the quality of the features and the matching algorithm used.

Object Detection with SIFT Features

In this example, we load the image and the object template, then use SIFT feature detector and descriptor extractor, and use them to detect keypoints and extract descriptors from both images. Next, we initialize the Brute-Force Matcher and match the descriptors between the two images. We select only the good matches based on a threshold (in this case, 70% of the distance between the two nearest neighbors), and calculate the transformation between the object template and the image using the good matches. Finally, we draw a bounding box around the object in the image using the calculated transformation and display

Code 1-1 Feature extraction using SIFT and mathing with Brute Force Matcher for object detection.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import cv2
import numpy as np

# Load the object image and the scene image
template = cv2.imread(r'D:\\img\\object4_template.jpg')
img = cv2.imread(r'D:\\img\\objects4.jpg')

# Convert both images to grayscale
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

# Initialize the SIFT feature detector and descriptor extractor
sift = cv2.xfeatures2d.SIFT_create()

# Detect keypoints and extract descriptors from both images
kp1, des1 = sift.detectAndCompute(template_gray, None)
kp2, des2 = sift.detectAndCompute(img_gray, None)

# Initialize the Brute-Force Matcher and match the descriptors between the two images
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
matches = bf.match(des1, des2)

# Select only the good matches based on a threshold
good_matches = []
for m in matches:
    if m.distance < 0.5 * min(len(des1), len(des2)):
        good_matches.append(m)

# Calculate the transformation between the object template and the image using the good matches
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good_matches ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good_matches ]).reshape(-1,1,2)
M, _ = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)

# Draw a bounding box around the object in the image using the calculated transformation
h, w = template_gray.shape
pts = np.float32([ [0,0], [0,h-1], [w-1,h-1], [w-1,0] ]).reshape(-1,1,2)
dst = cv2.perspectiveTransform(pts, M)
img = cv2.polylines(img, [np.int32(dst)], True, (0,0,255), 2)

# Display the image with the bounding box
cv2.imshow('Object Detection', img)
cv2.waitKey(0)
cv2.destroyAllWindows()


Line 1-2: We load the required libraries
Line 5-6: Here, we load a scene image and the object image (template) from their respective files.
Line 9-10: Next, we convert both images to grayscale, since most feature detectors and descriptor extractors operate on grayscale images.
Line 13: We initialize the SIFT (Scale-Invariant Feature Transform) feature detector and descriptor extractor algorithm using the SIFT_create() function from the `cv2.xfeatures2d` module.
Line 16-17: Here, we detect keypoints and extract descriptors from both the template image and the input image using the detectAndCompute() function of the SIFT algorithm. This function returns two outputs, `kp` which is a list of keypoints detected in the image, and `des`, which is a numpy array of shape (number_of_keypoints, 128) containing the descriptors of the keypoints.
Line 20-21: We initialize the Brute-Force Matcher algorithm using the BFMatcher() function from the `cv2` module, with L2 distance as the distance measure, and cross-checking enabled. We then match the descriptors of the template image and the input image using the match() function of the Brute-Force Matcher algorithm, which returns a list of `DMatch` objects representing the matches.
Line 24-27: Here, we select only the good matches based on a threshold. In this case, we use the ratio of the distance between the two nearest neighbors as the threshold. We first iterate over all the matches, and compute the distance between the two nearest neighbors for each match. We then select only the matches for which the distance is less than 50% of the minimum length of the descriptors of the two images. We add these matches to a new list called `good_matches`.
Line 30-32: Here, we calculate the transformation between the object template and the input image using the good matches. We first extract the coordinates of the keypoints corresponding to the good matches from both images. We then use the findHomography() function of OpenCV to compute the perspective transformation matrix `M` that maps the coordinates of the keypoints from the object template to the input image.
Line 35-38: - Next, we define an array of four points `pts` that represent the corners of the template image. This is used to define the bounding box around the object in the input image.
- We use the `perspectiveTransform` function to apply the homography matrix `M` to these points, which maps them to their corresponding points in the input image.
- Finally, we use the `polylines` function to draw a polygon (bounding box) around the object in the input image using the transformed points. The polygon is drawn in red with a thickness of 2 pixels.
Line 41-43: - The modified input image is then displayed using cv2.imshow(), and the program waits for a key press before closing the window using cv2.destroyAllWindows.

The result obtained by running the above code is shown in Figure 1.

No image
Figure 1: Object detection using SIFT and Brute Force Matcher.

OpenCV Tutorials