| 知乎专栏 |
Tesseract:开源的OCR识别引擎,初期Tesseract引擎由HP实验室研发,后来贡献给了开源软件业,后由Google进行改进、修改bug、优化,重新发布。
neo@MacBook-Pro-Neo ~/workspace/python/ocr % brew install tesseract neo@MacBook-Pro-Neo ~/workspace/python/ocr % brew install tesseract-lang neo@MacBook-Pro-Neo ~/workspace/python/ocr % pip3 install pytesseract
#!/usr/bin/python3
# -*- coding:utf-8 -*-
from PIL import Image
import pytesseract
# 英文识别
english = pytesseract.image_to_string(Image.open("english.png"))
print(english)
print('-' * 50)
# 简体中文识别
chinese = pytesseract.image_to_string(Image.open("chinese.png"), lang='chi_sim')
print(chinese)