Project Details
Long-term and Few-shot Action Anticipation using Causal Representation Learning
Applicant
Professor Dr. Andreas Bulling
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Methods in Artificial Intelligence and Machine Learning
Methods in Artificial Intelligence and Machine Learning
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 564158652
With artificial intelligence (AI) systems entering ever more corners of our lives, most of them will have to interact closely with humans. Key to efficient interactions with humans for the AI systems is the ability to proactively adapt to the interaction partner, which in turn requires anticipating what the interaction partner will do next. By incorporating knowledge about which actions are required to complete a particular task, these systems can plan ahead and, suggest or even proactively carry out important actions on behalf of their human interaction partner. Two of the most pressing open challenges in action anticipation, and their integration into a joint computational framework are: 1) Existing works have mainly focused on anticipation time spans of only a few seconds. This is insufficient for most practical applications, given that human behaviour has rich internal structure and long-term interdependencies between individual actions. 2) Given the large variability of human behaviour, for action anticipation to be practically useful, we cannot assume that interactive AI systems can be trained sufficiently on all possible action sequences and behaviours. A system needs to generalise to actions and tasks that it hasn't seen often, or even not at all. In this project we will investigate causal representation learning (CRL) for long-term and few-shot action anticipation. Representations in the form of structural causal models (SCM) have two distinct advantages that make them particularly suited for these tasks: First, given that SCMs encode "which action(s) cause(s) which action(s)'', they effectively reduce the search space when predicting possible future actions. This search space reduction in each prediction step is the more crucial the further into the future actions are predicted. Second, the modular representations in SCMs encode "robust knowledge'' and can thus be applied in different activity domains. They can potentially identify unknown (out of distribution) actions and activities, thus leading to more efficient few-shot learning. The project addresses four objectives: 1) developing a method to perform activity parsing on videos to obtain rich semantic information for learning causal representations. 2) Developing a causal method for long-term action anticipation by adapting the learned causal representation to predict future actions. 3) Generalising activity parsing and long-term action anticipation to unknown activities in a few-shot setting. Our approach will be based on the learned causal representation and reuse it in unknown activities. 4) Integrating all developed methods into one joint system.
DFG Programme
Research Grants
