知乎专栏 |
Tesseract:开源的OCR识别引擎,初期Tesseract引擎由HP实验室研发,后来贡献给了开源软件业,后由Google进行改进、修改bug、优化,重新发布。
neo@MacBook-Pro-Neo ~/workspace/python/ocr % brew install tesseract neo@MacBook-Pro-Neo ~/workspace/python/ocr % brew install tesseract-lang neo@MacBook-Pro-Neo ~/workspace/python/ocr % pip3 install pytesseract
#!/usr/bin/python3 # -*- coding:utf-8 -*- from PIL import Image import pytesseract # 英文识别 english = pytesseract.image_to_string(Image.open("english.png")) print(english) print('-' * 50) # 简体中文识别 chinese = pytesseract.image_to_string(Image.open("chinese.png"), lang='chi_sim') print(chinese)