原文链接: opencv Tesseract 验证码识别 文字识别

上一篇: opencv 人脸检测

下一篇: js 代码混淆

环境搭建

安装Tesseract

下载64位

https://github.com/UB-Mannheim/tesseract/wiki

7388abd0945d24d02112d86b9fd36383e49.jpg

安装时可以选择语言包一路next

加入path环境变量后,查看是否成功,pycharm需要重新启动,否则找不到

C:\Program Files (x86)\Tesseract-OCR

08ba06a9fc5e006c4bd051abf733e3f982e.jpg

安装Python相关库

pip install opencv-contrib-python -i  https://pypi.doubanio.com/simple/  --trusted-host pypi.doubanio.com

pip install pytesseract -i  https://pypi.doubanio.com/simple/  --trusted-host pypi.doubanio.com

英文数字识别

cd0260e58733f1535e4359b3fb891dfce62.jpg

结果

text: import cv2 as cv

import numpy as np

import pytesseract as tess
from PIL import Image

如果识别验证码的话,需要做更多的处理,去除线条和噪点

import cv2 as cv
import numpy as np
import pytesseract as tess
from PIL import Image

img = cv.imread('code2.jpg')
cv.imshow('img', img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)
cv.imshow('bin', binary)
kernel = cv.getStructuringElement(cv.MORPH_RECT, (2, 2))
open_out = cv.morphologyEx(binary, cv.MORPH_OPEN, kernel)
cv.imshow('open', open_out)

cv.bitwise_not(open_out, open_out)
cv.imshow('open_out', open_out)
text_img = Image.fromarray(open_out)
text = tess.image_to_string(text_img)
print('text:', text)
cv.waitKey(0)

中文识别

查看支持语言

tesseract --list-langs
eng    英文
chi_tra 中文繁体
chi_sim 中文简体

只需要改变一个参数即可

2abf1d8cfc170a05578585194d1790ccd4a.jpg

D:\ProgramData\Anaconda3\python.exe D:/code/py/blogsolr/验证码识别.py
text: API层面

, 学会使用OpenCy 形态学与二值化API做预处理
,使用Tesseract- OCR做文字识别
4 识别率问题讨论
import cv2 as cv
import numpy as np
import pytesseract as tess
from PIL import Image

img = cv.imread('code3.jpg')
cv.imshow('img', img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)
cv.imshow('bin', binary)
kernel = cv.getStructuringElement(cv.MORPH_RECT, (2, 2))
open_out = cv.morphologyEx(binary, cv.MORPH_OPEN, kernel)
cv.imshow('open', open_out)

cv.bitwise_not(open_out, open_out)
cv.imshow('open_out', open_out)
text_img = Image.fromarray(open_out)
text = tess.image_to_string(text_img, 'chi_sim')
print('text:', text)
cv.waitKey(0)

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐