Learning-based computer vision approach for the analysis and synthesis of photo realistic 3D facial expressions
Final Report Abstract
Interfacing computer science and human perception research areas aims at a better understanding of cognitive processes and ultimately can lead to a better understanding of neural mechanisms. With this working hypothesis we were aiming for a better understanding of dynamic human facial expression recognition aided by novel continuous perceptual spaces obtained through development of new technologies. In particular, highly controllable and realistic computer animated faces enabled advanced experiments. Constraints derived from perception inspired development of technology and vice versa offered novel tools for perception studies. We focused on Computer Vision and Machine Learning algorithms to reconstruct, parameterize and animate human facial performance from 3D depth data. From that we derived animations for the rendering of realistic computer-generated face avatars. The semantic parameterization of highdimensional data spaces as produced by 3D data capture has been an important goal throughout this project. The animations have been used already for insightful perception experiments and have been perceptually validated. Semantic expression modeling. Rich face models are of grown interest in the fields of computer vision, perception research, as well as computer graphics and animation. Attributes such as descriptiveness, semantics, and intuitive control have been desirable properties but have been hard to achieve. One way to achieve high-quality face models is to use real-world scanned data that can be parameterized for animation, and are driven with motion retargeting. Our approach to retarget facial motions from motion capture to 3D scanned head models consists of semantic keyshapes, i.e. that correspond to the Facial Action Coding System. This technology has been established now in many movie productions, moreover, it offers new opportunities for rich stimulus generation for perception research going beyond older works that tried to reveal with animation the role that facial motion for identity recognition in addition to facial form. Nevertheless, this and similar approaches have been mostly marker based and required the design of correspondences for animatable head models. Thus 3D statistical moving head models were not yet available but desperately needed. It was desirable to estimate and animate facial expressions fully from real-word measured data. Markerless Semantic 3D movement tracking, reconstruction and animation. We have presented in particular a novel 3D model-based analysis-by-synthesis approach that is able to parameterize 3D facial depth data automatically for animation, and that can estimate the state of semantically meaningful components, even from noisy depth data such as that produced by Time-of-Flight (ToF) cameras or devices such as Microsoft Kinect. The work has been complemented with 4D implicit face surface tracking. Novel Facial Movement Metrics based on Anechoic Mixture models. In a collaboration within Perceptual Graphics we exploited unsupervised methods for the estimation of facial movement synergies. This has led to an interesting compact description for compressing multidimensional time signals opening a new window for experiments allowing (for the first time) the manipulation of the temporal structure in faces for the generation of faces. Human Perception studies. In addition to this signal processing work, we collaborated extensively with the other partner of Perceptual Graphics, Section Computational sensomotorics in Tübingen, on psychophysical experiments on the perception of dynamic facial expressions. This work exploited the learnt generative models for the synthesis of realistic dynamic faces, studying fundamental mechanisms in the processing of dynamic faces, such as the representation of temporal order and the relevance of high-level after effects for dynamic stimuli. We found (for the first time) after effects for dynamic facial stimuli and could show by analysis of the associated optic flow patterns that the observed effects were not based on low-level motion after effects. The applications of the developed framework for facial expression analysis and synthesis range from basic research to technical applications in computer graphics for, e.g. non intrusive Human-Computer Interfaces in Virtual Realities. In brief, ongoing work in terms of exploiting the technology lies in the following fields: 1. Emotion and face recognition research 2. Perceptual realism of dynamic faces 3. Real-time animation for interactions (ongoing EU Project TANGO) 4. Studying plasticity mechanisms in the brain with clinical relevance (in the planning). Possible application scenarios: 1. Medical applications: Movement Disorder Monitoring, Surgery Planning, Rehabilitation 2. Wearable Computing, Embedded real-time implementation of 3D analysis framework 3. Safety in cars, and market research
Publications
- (July-2006) Semantic 3D motion retargeting for facial animation 3rd Symposium on Applied Perception in Graphics and Visualization (APGV '06), ACM Press, New York, NY, USA, 77-84
Curio C, Breidt M, Kleiner M, Vuong QC, Giese MA and Bülthoff HH
(See online at https://doi.org/10.1145/1140491.1140508) - (2008) Probing Dynamic Human Facial Action Recognition From The Other Side Of The Mean. 5th Symposium on Applied Perception in Graphics and Visualization (APGV 2008, co-located with SIGGRAPH 2008), ACM Press, New York, NY, USA, 59-66.
Curio C., Giese M.A., Breidt M., Kleiner M., Bülthoff H.H.
(See online at https://doi.org/10.1145/1394281.1394293) - (2009). Markerless 3D Face Tracking. DAGM Symposium: 1-10
Walder, C. J., Breidt, M., Bülthoff, H.H., Schölkopf, B., and Curio, C.
(See online at https://dx.doi.org/10.1007/978-3-642-03798-6_5) - (2010) Dynamic Faces: Insights from Experiments and Computation. MIT Press, Cambridge, MA, USA
Curio C., Bülthoff H.H., Giese M.A.
(See online at https://doi.org/10.7551/mitpress/9780262014533.001.0001) - (December-2010) Face Models from Noisy 3D Cameras 3rd ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia 2010), ACM Press, New York, NY, USA, 1-2
Breidt M, Bülthoff HH and Curio C
(See online at https://doi.org/10.1145/1899950.1899962) - (2011). Robust Semantic Analysis by Synthesis of 3D Facial Motion, Ninth IEEE Int. Conf. on Automatic Face and Gesture Recognition (FG 2011), 1- 8
Breidt M, Bülthoff HH and Curio C
(See online at https://doi.org/10.1109/FG.2011.5771336)