Project Details
Projekt Print View

Cross-language Learning-to-Rank for Patent Retrieval, Phase 2: Weakly Supervised Learning of Cross-lingual Systems

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term from 2012 to 2019
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 211613886
 
Final Report Year 2018

Final Report Abstract

Effective information search across languages is a key problem in today’s information society. For example, cross-lingual patent prior art search is an important tool to determine a patent’s novelty and to avoid patent infringement. For high accuracy, machine learning approaches require costly manual annotation of supervision signals such as relevance links across languages for crosslingual retrieval. We could show that cross-lingual rankings can be learned directly from data that are weakly supervised, but are not strictly parallel. Such weak supervision signals can be relevance indicators such as citations in patents or hyperlinks in Wikipedia pages. Our project showed that similar techniques can be successfully applied to optimize cross-lingual retrieval and to train machine translation systems on massive non-parallel data.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung