Project Details
Projekt Print View

Video Segmentation from Multiple Representations using Lifted Multicuts

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 360826079
 
Video segmentation is an important technique for many high-level tasks such as objectdetection, action recognition or 3D reconstruction. Key challenges include:1. The difficulty to determine which objects are salient.2. The ambiguity concerning the desired level of detail of a segmentation.3. The discrepancy between visible image gradients and semantic object boundaries.4. The lack of proper training data.5. The handling of the large amounts of data.In contrast to image segmentation, video data offers many additional cues for the segmentation. The motion present in videos appears to be a strong cue for object saliency. At the same time, motion makes the notion of objectness easier to define and gives rise to improved estimates of a scene's geometry.However, current video segmentation methods can barely make use of these advantages, because it is difficult to handle the large amount of data . In order to facilitate the use of (computationally expensive) graphical models, most state-of-the-art video segmentation methods build graphs on precomputed frame-wise superpixels. Furthermore, supervised learning based methods can only learn from the temporally sparse annotations that are currently available. Temporal consistency is mostly addressed by using frame-by-frame optical flow.In this project, we want to build a flexible, probabilistic, graph-based framework for temporally consistent video segmentation. The intended framework includes all sorts of temporal and spatial cues such as local pixel information, point trajectories or object and object part detections into one objective function and provides segmentations (groupings) for all those representations at different levels of detail.Driven by the motivation to generate the most likely segmentation for given cut probabilities between entities at different levels of granularity, this project builds upon the Minimum Cost Lifted Multicut problem formulation to generate segmentations. This enables to not only optimize over the segment label assignments but also over the number of segments and thus determine the right amount of objects.Our most important planned contributions are, first, to provide a fast Minimum Cost Lifted Multicut Problem Solver that allows to solve video-scale problem instances, second, to extract boundary estimates from video that properly include the temporal information, third, to generate properly optimized point trajectories such that long term motion can be well represented, and last, to provide a formulation of the video segmentation problem using these multiple representations as a Minimum Cost Lifted Multicut Problem, that allows for a joint optimization.
DFG Programme Research Grants
Cooperation Partner Professor Dr. Björn Andres
 
 

Additional Information

Textvergrößerung und Kontrastanpassung