Project Details
Projekt Print View

RECOLAGE: Real-Time Vision-Grounded Collaborative Language Generation

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term from 2019 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 423217434
 
Current spoken dialogue systems mostly use simple canned utterances when producing verbal output. Theoretically, data-driven natural language generation (NLG) systems, that map non-verbal data (e.g. images) to verbal output in a flexible way, promise to provide means for more fluid and natural interaction between humans and machines. Unfortunately, certain assumptions made in state-of-the-art NLG systems are heavily tailored totext and cannot easily be transferred to spoken interaction: essentially, existing frameworks conceive of NLG as an autonomous process that is entirely decoupled from an interlocutor and a visual dynamic environment. This assumption is particularly problematic for spoken task-oriented interaction in visual contexts. Here, human interlocutors expect the speaker to behave collaboratively even while he is speaking, to react to concurrent events, and to adapt his utterances accordingly, e.g. extending or revising them if needed. The central objective of RECOLAGE is to develop a data-driven model for real-time and collaborative language generation in visually grounded dialogue systems, that maintains a close feedback loopbetween a user’s non-verbal actions and the system’s verbal actions. This framework will implement an approach to visual and conversational language grounding that is able to package its verbal output in real-time, and revise it if uncertainty or changes in the world make this desirable.Producing speaking behaviour of this kind requires the coordination and interleaving of tasks that are traditionally handled sequentially, namely the prediction of system actions (action management, AM), the generation of utterances (natural language generation, NLG), and the synthesis of speech (speech synthesis, SYN). RECOLAGE will model AM as a continuous decision making process which schedules tasks for NLG and SYN; these in turn retain autonomy over the linguistic decisions they have to make (which words to say, and how to say them), but are adapted to operate on minimal chunks and under strong mutual contextual constraints. Building on the substantial relevant prior work of the applicants, RECOLAGE will follow a data-driven approach, where linguistic decisions will be optimised through machine learning techniques.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung