The default OCR software is Tesseract-OCR 5. Tesseract-OCR is a great neural net (LSTM) based OCR engine with more than 100 languages supported. However, Tesseract-OCR doesn't support converting scanned PDF documents to editable Word documents, so if you need this specific function, you'll need to change the OCR software option to "ExtendedOCR". If the target format is set to txt, the text content will be displayed in a text editor.

Different OCR software may recognize different text from same image, so we design this online OCR program to be open for all kinds of open-source OCR software. More OCR software will be tested and deployed later.

Tesseract-OCR Tesseract Open Source OCR Engine JPG, PNG, GIF, BMP, TIFF TXT, PDF, HOCR, TSV 159 languages and scripts
ExtendedOCR Extend OCR engine to support converting scanned PDF to editable Word Scanned PDF, JPG, PNG, TIFF TXT, PDF, EPUB, XPS, DOC, DOCX, RTF 128 languages
PaddleOCR Awesome multilingual OCR toolkits based on PaddlePaddle JPG, PNG TXT 80 languages
OCRmyPDF OCRmyPDF adds an OCR text layer to scanned PDF files PDF TXT 27 languages
EasyOCR Ready-to-use OCR with 80+ supported languages and all popular writing scripts JPG, PNG TXT 80 languages
chineseocr Chinese OCR implemented using tensorflow and keras JPG, PNG, TIFF TXT 2 languages