DELETION METHOD OF CURVED NOISE LINE IN OCR

Asked 2 years ago, Updated 2 years ago, 72 views

Let's say there's some kind of noise in the OCR.

If it is close to a straight line, it can be deleted by Huff conversion extraction.
↓ There is also a way to delete it.
http://www.morethantechnical.com/2015/02/05/using-hidden-markov-models-for-staff-line-removal-in-omr-wcode/

If the black depth of the text and noise is different, it can be skipped by binarization.
If the thickness of the line between the characters and the noise is different, it can be expanded or contracted.
The following

濃 Same thickness
 ②Same thickness
 ③Curved

Is it possible to delete the noise line?
Create an OCR learning model with noise in it,
But I think it's difficult in reality.
Let me ask you if there is any logic that can solve this problem.

Enter a description of the image here

The final corrected image image is as follows:

Enter a description of the image here

python c++ opencv image

2022-09-29 22:16

1 Answers

It is difficult to set the same thickness and thickness.

If you understand Japanese, you can recognize that the noise line is noise, not the cancellation line, or 3-15-1 Takasago, Urawa-ku, Saitama Prefecture (because such an address is impossible).
However, for those who don't understand Japanese, they don't know whether it's noise or cancellation line, so I think a high level of awareness is essential.

If you give up on the high level of recognition, I think there will be many approaches, but I have considered two ways, so I will write them down.Both results are not satisfactory.

An area of characters is searched and other places are filled with background pixel values.
Specifically, we binarize and add the character area in the y-axis direction, with the less frequent area as the background.
Example Processing
As you can see, there is a problem that the noise line where the x-axis overlaps the text area does not disappear.

import numpy as np
import cv2
import matplotlib.pyplot asplt

file="./saitama.png"
original_img=cv2.imread(file)
gray_img = cv2.cvtColor (original_img, cv2.COLOR_BGR2GRAY)
(height, width) = gray_img.shape

gray_img = cv2.GaussianBlur(gray_img, (5,5), 0)
ret,th_img = cv2.threshold (gray_img, 100, 255, cv2.THRESH_BINARY)
# th_img = cv2.adaptiveThreshold (gray_img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
th_img = cv2.bitwise_not(th_img)
hist=np.sum(th_img,axis=0)
# mean=np.mean(hist)
neighborhood=200
th_index=neighborhood//2
extend_hist=np.zeros(width+neighborhood*2))
extend_hist [neighborhood:width+neighborhood] =hist

for x in range (width):
    x_extend=x+neighborhood
    around_hist=extend_hist [x_extend-neighborhood:x_extend+neighborhood]
    th=np.sort(around_hist) [th_index]
    if(hist[x]<th):
        mean=np.mean(original_img[:,x])
        original_img[:,x] = mean*np.ones((height,3))

disp_img = cv2.cvtColor(th_img, cv2.COLOR_GRAY2RGB)
x = np.range(0, width, 1)
plt.subplot(2,1,1)
plt.plot(x,hist)
plt.subplot(2,1,2)
plt.imshow(original_img)
plt.show()

A generalized Huff transformation for a class of smooth curves is considered, and a curve having a large number of votes, i.e., longer than a certain length, is detected and erased.
As a problem with this method, it disappears if there is a part that overlaps with the noise line, such as the letter "high" in this example.
Also, assume that the noise line is longer than the signal, so the noise line shorter than the character does not disappear.
Also, noise lines that deviate from the configured smooth curve class (for example, sharp bending noise lines) will not disappear.


2022-09-29 22:16

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.