Project Details
Projekt Print View

Optimized use of OCR methods – Tesseract as a component of the OCR-D workflow

Term from 2018 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 394264782
 
Tesseract is a free software for text recognition (optical character recognition, OCR). This software has a history of more than 30 years of continuous development and improvements. In the small group of open source products for OCR Tesseract belongs to the programs with the best recognition rates.Since end of 2016 Tesseract supports state-of-the-art text recognition by neural networks (LSTM).The context of OCR-D requires well defined interfaces for OCR software. The project will actively contribute to the definition of such interfaces. It will implement them for Tesseract to allow inclusion of Tesseract in an OCR workflow. We also strives to improve the stability, performance and practical usability of Tesseract.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
Co-Investigator Stefan Weil
 
 

Additional Information

Textvergrößerung und Kontrastanpassung