Project Details
Projekt Print View

Automated postcorrection of OCRed historical printings with integrated optional interactive postcorrection

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term from 2018 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 393215159
 
The obvious need to improve current methods for full-text digitalization of historical printings represents the general background of the DFG-program ,,Skalierbare Verfahren der Text- und Strukturerkennung für die Volltextdigitalisierung historischer Drucke``. Module 3 of this program in particular explains the need for high-level postcorrection of the OCR output. In our team we developed over several years a specialized system "PoCoTo" for the interactive postcorrection of OCRed historical printings. Still, in the context of mass digitization for obvious reasons systems for automated postcorrection are clearly preferable. The main problem for automated postcorrection is to avoid a replacement of correct OCR-tokens that are not covered by the background correction dictionary. Building up on PoCoTo we want to develop an advanced system for automated postcorrection that largely avoids such ``infelicitous correction steps''. To this end, the PoCoTo background technology will be substantially extended. Since a fully automated postcorrection will not always reach the very high quality standards needed, the automated correction can be completed by an optional semi-automated or interactive postcorrection. Methods for semi-automated or interactive postcorrection that take advantage of all data and insights from the automated correction phase will be directly integrated as part of the system.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
Cooperation Partner Privatdozent Dr. Alexander Geyken
 
 

Additional Information

Textvergrößerung und Kontrastanpassung