Multimodal Coherency Issues in Designing and Optimizing Audiovisual Speech Synthesis Techniques Host Publication: Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet Authors: W. Mattheyses, L. Latacz and W. Verhelst Publication Date: Sep. 2009 Number of Pages: 6
Abstract: This paper proposes a 2D audiovisual text-to-speech synthesis system that constructs the output signal by selecting and concatenating multimodal segments containing natural combinations of audio and video. We describe the experiments that were conducted in order to assess the impact of this joint audio/video synthesis technique on the perceived quality of the synthetic speech. The experiments indicate that a maximal level of audiovisual coherence present in the output speech improves the perceived quality when compared to the traditional approach of synthesizing the visual signal separately from the audio. In addition, we measured that there is a same maximum allowable desynchronization between the audio and the image sequence, irrespective whether the degree of desynchronization is constant or time varying. This tolerance is used in the synthesizer for further optimizing the segment cuttings points in the audio and in the video mode. External Link.
|