Project Details
Automated postcorrection of OCRed historical printings with integrated optional interactive postcorrection
Applicant
Professor Dr. Klaus U. Schulz
Subject Area
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term
from 2018 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 393215159
The obvious need to improve current methods for full-text digitalization of historical printings represents the general background of the DFG-program ,,Skalierbare Verfahren der Text- und Strukturerkennung für die Volltextdigitalisierung historischer Drucke``. Module 3 of this program in particular explains the need for high-level postcorrection of the OCR output. In our team we developed over several years a specialized system "PoCoTo" for the interactive postcorrection of OCRed historical printings. Still, in the context of mass digitization for obvious reasons systems for automated postcorrection are clearly preferable. The main problem for automated postcorrection is to avoid a replacement of correct OCR-tokens that are not covered by the background correction dictionary. Building up on PoCoTo we want to develop an advanced system for automated postcorrection that largely avoids such ``infelicitous correction steps''. To this end, the PoCoTo background technology will be substantially extended. Since a fully automated postcorrection will not always reach the very high quality standards needed, the automated correction can be completed by an optional semi-automated or interactive postcorrection. Methods for semi-automated or interactive postcorrection that take advantage of all data and insights from the automated correction phase will be directly integrated as part of the system.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)
Cooperation Partner
Privatdozent Dr. Alexander Geyken