文字识别
python的pytesseract为文字识别提供了很好的支持。整个实现只需要一行关键代码即可。
前提安装
1 2
| yum install -y tesseract-langpack-chi_sim tesseract-langpack-chi_tra tesseract pip install pytesseract
|
代码示例
1 2 3 4
| from PIL import Imageimport import pytesseract text=pytesseract.image_to_string(Image.open(file_path), lang='chi_sim') print(text)
|
识别语言: 中文简体(chi_sim), 繁体(chi_tra)
二维码识别
在没接触 Python 之前,曾使用 Zbar 的客户端进行识别,测了大概几百张相对模糊的图片,Zbar的识别速度要快很多,识别率也比 Zxing 稍微准确那边一丢丢,但是,稍微模糊一点就无法识别。
前提安装
1 2 3 4 5
| pip install -U pip pip install Pillow pip install pyzbar pip install qrcode
|
代码示例
1 2 3 4 5 6 7
| from PIL import Imageimport import pyzbar.pyzbar as pyzbar
bar_codes = pyzbar.decode(Image.open(file_path)) for bar_code in bar_codes: bar_code_info += bar_code.data.decode("utf-8") print(bar_code_info)
|
整体代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
|
import io import requests import pytesseract from PIL import Image import pyzbar.pyzbar as pyzbar
def check_image_qrcode(image): bar_code_info = "" buf_image = io.BytesIO() ir = requests.get(image, stream=True) if ir.ok: for chunk in ir: buf_image.write(chunk) else: return bar_code_info img = Image.open(buf_image) if img: bar_codes = pyzbar.decode(img) for bar_code in bar_codes: bar_code_info += bar_code.data.decode("utf-8")
if len(bar_code_info) > 0: return bar_code_info
if img: bar_code_info = pytesseract.image_to_string(img, lang='chi_tra')
return bar_code_info
def main(): image = "https://pic2.zhimg.com/80/v2-50eaea949ac63de5d5a84813d9efe491_720w.jpg" image_info = check_image_qrcode(image) if len(image_info) == 0 : print("11111") else: print(image_info)
if __name__ == "__main__": main()
|