Text Line Detection and Recognition using OpenCV: A Comprehensive Guide
- Load and Preprocess Images: cv2.imread(), cv2.cvtColor(), cv2.GaussianBlur(), cv2.threshold()
- Perform Closing Operation: cv2.morphologyEx()
- Find and Filter Contours then Extract Text Lines: cv2.findContours()
- Recognize Text using OCR: pytesseract.image_to_string()
The below installation instructions assume you are using Anaconda as your Python distribution. If you prefer using a virtual environment manager like venv, you can create and activate a virtual environment instead of using Conda.
Once you have installed the required software and set up the Conda environment, you can proceed with running the code provided.
Software Requirements:
- OpenCV (Open Source Computer Vision Library)
- Tesseract OCR Engine
- pytesseract (Python wrapper for Tesseract)
1. Install OpenCV:
- Run the following command in PowerShell or Command Prompt:
conda install -c conda-forge opencv
- Install Tesseract OCR Engine:
- Run the following command in PowerShell or Command Prompt:
- For Windows: conda install -c conda-forge tesseract=4
- For Linux: conda install -c conda-forge tesseract=4
- For macOS: conda install -c conda-forge tesseract=4
- Install pytesseract:
- Run the following command in PowerShell or Command Prompt:
- pip install pytesseract
Detailed Installation Instructions
Introduction
Text line detection and recognition is a crucial task in various applications such as document processing, OCR (Optical Character Recognition), and computer vision. In this blog post, we will explore how to perform text line detection and recognition using the popular computer vision library OpenCV. We will cover the entire process, including image preprocessing, contour detection, filtering, and sorting, to obtain accurate text line detection results. The Code 1-1 snippet shows how to detect text lines and recognize them.
Code 1-1 Text line detection and recognition in OpenCV-Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
import cv2
import pytesseract
import numpy as np
# Step 1: Load the image
image_path = r'D:\img\text.jpg'
image = cv2.imread(image_path)
# Check if the image is None
if image is None:
raise ValueError("Invalid image file or path.")
# Step 2: Preprocess the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# selected a kernel with more width so that we want to connect lines
kernel_size = (15, 1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernel_size)
# Step 3: Perform the closing operation: Dilate and then close
bw_closed = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)
# Find contours for each text line
contours, _ = cv2.findContours(bw_closed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Filter contours to select those whose width is at least 3 times its height
filtered_contours = [cnt for cnt in contours if (cv2.boundingRect(cnt)[2] / cv2.boundingRect(cnt)[3])>=3.0]
# Sort contours based on y-coordinate
sorted_contours = sorted(filtered_contours, key=lambda contour: cv2.boundingRect(contour)[1])
padding=3
for contour in sorted_contours:
x, y, w, h = cv2.boundingRect(contour)
x, y, w, h = (x-padding, y-padding, w+padding, h+padding)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Recognize each line. Crop the image for each line and pass to OCR engine.
line_image = image[y:y + h, x:x+w]
line_text = pytesseract.image_to_string(line_image)
print(line_text)
cv2.imshow('Text Lines', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite('opencv_detect_text_lines.jpg',image)
|
Line 1-3: We import the necessary libraries, including OpenCV (`cv2`), pytesseract for OCR, and numpy for numerical operations.
Line 5-6: We specify the path of the image file and use cv2.imread() to load the image into a variable called `image`.
Line 9-10: Here, we check if the loaded image is `None`. If it is, we raise a `ValueError` with a descriptive error message. This step ensures that the image file is successfully loaded before proceeding with further operations.
Line 13-15: In this step, we preprocess the image to enhance text regions for better recognition. We convert the image to grayscale using cv2.cvtColor(), then apply Gaussian blur using cv2.GaussianBlur() to reduce noise. Next, we use Otsu's thresholding method with cv2.threshold() to convert the blurred image into a binary image (`bw`), where text appears as white on a black background.
Line 18-19: Here, we define a kernel with a larger width (15) and a height of 1 using cv2.getStructuringElement(). This kernel will be used for morphological operations to connect nearby text lines in horizontal direction.
Line 22: In this step, we perform the closing operation on the binary image (`bw`) using cv2.morphologyEx() with cv2.MORPH_CLOSE. This operation combines dilation and erosion to connect nearby text regions and form cohesive text lines. The result is stored in `bw_closed`.
Line 25: Using cv2.findContours(), we detect contours in the closed binary image. The `RETR_EXTERNAL` flag retrieves only the external contours, and the `CHAIN_APPROX_SIMPLE` flag approximates the contour's shape. The detected contours are stored in `contours`.
Line 28: In this line, we filter the detected contours to retain only those whose width is at least 3 times their height. This filtering helps remove noise and retain contours that are likely to represent text lines.
Line 31: We sort the filtered contours based on their y-coordinate so that we could display the recognized text in the same sequence as the text lines appear. The `key` parameter specifies the sorting criterion, which is the y-coordinate of each contour obtained using `cv2.boundingRect`. Sorting the contours in ascending order of the y-coordinate helps maintain the order of the text lines from top to bottom.
Line 33-41: In this loop, we iterate over the sorted contours. For each contour, we extract the bounding rectangle coordinates using cv2.boundingRect(). We apply a padding of 3 pixels to the coordinates to include some extra space around the text lines. We draw a green rectangle around each text line using cv2.rectangle(). We also extract the corresponding region of interest (ROI) from the original image and pass it to the pytesseract OCR engine using `pytesseract.image_to_string` to recognize the text in each line. Finally, we print the recognized text.
Line 43-46: We display the image with the detected text lines using cv2.imshow(), wait for a key press using cv2.waitKey(), and then close the window using cv2.destroyAllWindows(). Additionally, we save the annotated image with the detected text lines using cv2.imwrite().
A sample output is shown in Figure 1.
Figure 1: Detecting only vertical lines using Probabilistic Hough Transform.
The output of the Tesseract OCR after recognition is shown below:
Akhtar Jamil
post doctorate - Professor (Associate) at National University of
Computer and Emerging Sciences
islamabad, Pakistan
Exploring the potential of Deep Learning