|
Speech driven photo realistic facial animation based on an articulatory DBN model and AAM features This publication appears in: Multimedia Tools and Applications Authors: D. Jiang, Y. Zhao, H. Sahli and Y. Zhang Volume: 73 Issue: 1 Pages: 397-415 Publication Year: 2014
Abstract: This paper presents a photo realistic facial animation synthesis approach based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN), in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/-velum, can be controlled. Perceptual Linear Prediction (PLP) features from audio speech, as well as active appearance model (AAM) features from face images of an audio visual continuous speech database, are adopted to train the AF_AVDBN model parameters. Based on the trained model, given an input audio speech, the optimal AAM visual features are estimated via a maximum likelihood estimation (MLE) criterion, which are then used toconstruct face images for the animation. In our experiments, facial animations are synthesized for 20 continuous audio speech sentences, using the proposed AF_AVDBN model, as well as the state-of-art methods, being the audio visual state synchronous DBN model (SS_DBN) implementing a multi-stream Hidden Markov Model, and the state asynchronous DBN model (SA_DBN). Objective evaluations on the learned AAM features show that much more accurate visual features can be learned from the AF_AVDBN model. Subjective evaluations show that the synthesized facial animations using AF_AVDBN are better than those using the state based SA_DBN and SS_DBN models, in the overall naturalness andmatching accuracy of the mouth movements to the speech content.
|
|