Project Details
Projekt Print View

JUMP: Joint Spatio-Temporal Scene Understanding with Motion Prior

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Methods in Artificial Intelligence and Machine Learning
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 563662485
 
Self-supervised foundation models are becoming the bedrock of AI. However, despite a flurry of foundation models for visual understanding, there is a critical issue. Each model can accommodate only a narrow fraction of downstream applications, and no single model is universally applicable. My project JUMP, Joint spatio-temporal scene Understanding with Motion Prior, will close this gap. JUMP will bring deep representations of vision foundation models to the next level of generalization across computer vision domains and beyond. Leveraging large public video collections, JUMP will distil motion patterns to enhance both the semantic and geometric properties of deep visual representations. On the one hand, these properties will reveal scene composition to a greater degree of granularity ("movable" objects). On the other hand, JUMP will make visual representations more geometrically grounded. These enhancements will advance core computer vision tasks, such as 3D reconstruction and visual odometry, alongside semantic tasks, such as panoptic and open vocabulary segmentation. Overall, visual representations enhanced by JUMP will provide a unified backbone for visual scene understanding, improving downstream accuracy in a cost-efficient manner across applications in robotics, augmented reality, medical imaging and human-assistive systems.
DFG Programme WBP Fellowship
International Connection United Kingdom
 
 

Additional Information

Textvergrößerung und Kontrastanpassung