Detailseite
Projekt Druckansicht

Grammatikformalismen jenseits von kontextfreien Grammatiken und ihre Verwendung im statistischen maschinellen Lernen

Fachliche Zuordnung Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Förderung Förderung von 2010 bis 2021
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 183821958
 
Erstellungsjahr 2023

Zusammenfassung der Projektergebnisse

The project BeyondCFG addressed the question of how to deal with discontinous constituents in parsing and machine translation. A particular focus was on approaches based on mildly contextsensitive grammar formalisms. We developed new models and algorithms for probabilistic constituency parsing and for statistical machine translation, using formalisms such as Linear Context-Free Rewriting Systems (LCFRS) and variants of Tree Adjoining Grammar (TAG), extensions of context-free grammars (CFG) that combine aspects of synchronous grammars with the capacity to describe discontinuities. The project developed new mildly context-sensitive (MCS) grammar formalisms, investigated their formal properties and developed both symbolic as well as statistical parsers. The latter yield transparent, grammar-based characterizations of syntactic structure while achieving state-of-theart parsing accuracy. The project also developed the first approach to grammar-less, transitionbased parsing of discontinous constituents. Linked to discontinous constituency parsing, BeyondCFG also developed several methods for treebanking, combining approaches such as active learning with an intuitive annotation interface. Finally, Beyond CFG also developed a grammarbased statistical machine translation system that allows for discontinuous constituents and complex types of alignment. One topic that was not planned in the beginning was morpho-syntactic processing of Arabic. Due to the lack of Arabic constituency treebanks of sufficiently high quality at the time, our focus moved from constituency parsing to morphology. Arabic is interesting in this respect since it displays discontinuous units in morphology. An additonal complication was that many texts in Arabic come with code switching between some dialect and Modern Standard Arabic. In the context of morphosyntactic processing of Arabic, the project constributed important results to segmentation, language identification and POS tagging for Arabic NLP. The project has produced several implementations, comprising several parsers, tools for processing discontinous constituency trees, and tools for Arabic NLP, that are publicly available and that are still in use.

Projektbezogene Publikationen (Auswahl)

  • 2015. Discontinuous Incremental Shift-reduce Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1202–1212. Beijing, China: Association for Computational Linguistics
    Maier, W.
    (Siehe online unter https://doi.org/10.3115/v1/P15-1116)
  • 2015. Hierarchical Machine Translation With Discontinuous Phrases. In Proceedings of the Tenth Workshop on Statistical Machine Translation, 228–238. Lisbon, Portugal: Association for Computational Linguistics
    Kaeshammer, M.
    (Siehe online unter https://doi.org/10.18653/v1/W15-3028)
  • 2015. On the Mild Context-Sensitivity of k-Tree Wrapping Grammar. In Proceedings of the 20th and 21st International Conferences on Formal Grammar - Volume 9804, 77–93. Berlin, Heidelberg: Springer-Verlag
    Kallmeyer, L.
    (Siehe online unter https://doi.org/10.1007/978-3-662-53042-9_5)
  • 2016. Data-oriented parsing with discontinuous constituents and function tags. Journal of Language Modelling 4(1). 57–111
    van Cranenburgh, A., R. Scha & R. Bod
    (Siehe online unter https://doi.org/10.15398/jlm.v4i1.100)
  • 2016. Discontinuous parsing with continuous trees. In Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing, 47–57. San Diego, California: Association for Computational Linguistics
    Maier, W. & T. Lichte
    (Siehe online unter https://doi.org/10.18653/v1/W16-0906)
  • 2016. LR Parsing for LCFRS. Algorithms 9(3)
    Kallmeyer, L. & W. Maier
    (Siehe online unter https://doi.org/10.3390/a9030058)
  • 2016. Multilingual Code-switching Identification via LSTM Recurrent Neural Networks. In Proceedings of the Second Workshop on Computational Approaches to Code Switching, 50–59. Austin, Texas: Association for Computational Linguistics
    Samih, Y., S. Maharjan, M. Attia, L. Kallmeyer & T. Solorio
    (Siehe online unter https://doi.org/10.18653/v1/W16-5806)
  • 2017. Learning from Relatives: Unified Dialectal Arabic Segmentation. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 432–441. Vancouver, Canada: Association for Computational Linguistics
    Samih, Y., M. Eldesouki, M. Attia, K. Darwish, A. Abdelali, H. Mubarak & L. Kallmeyer
    (Siehe online unter https://doi.org/10.18653/v1/K17-1043)
  • 2018. Active DOP: A constituency treebank annotation tool with online learning. In Proceedings of COLING system demonstrations, 38–42
    van Cranenburgh, A.
  • 2019. A Neural Graph-based Approach to Verbal MWE Identification. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), 114–124. Florence, Italy: Association for Computational Linguistics
    Waszczuk, J., R. Ehren, R. Stodden & L. Kallmeyer
    (Siehe online unter https://doi.org/10.18653/v1/W19-5113)
  • 2019. From partial neural graph-based LTAG parsing towards full parsing. Computational Linguistics in the Netherlands Journal 9. 3–26
    Bladier, T., J. Waszczuk, L. Kallmeyer & J. Janke
  • 2020. Statistical Parsing of Tree Wrapping Grammars. In Proceedings of the 28th International Conference on Computational Linguistics, 6759– 6766. Barcelona, Spain (Online): International Committee on Computational Linguistics
    Bladier, T., J. Waszczuk & L. Kallmeyer
    (Siehe online unter https://doi.org/10.18653/v1/2020.coling-main.595)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung