|
AV Lab is providing all the infrastructure and engineering of a professional grade recording studio for the production of audiovisual databases. Multi-sensor signal processing (in-synchrony capturing and processing of multiple audio, video and human physiological signals) is an emerging research area, where data obtained through microphones, cameras and other sensors are used in order to develop, train and test new multimedia applications, enriching the human-machine interaction.
We are producing databases for the technologies being developed at ETRO, like microphone arrays, 3D surround sound, audiovisual photorealistic text-to-speech synthesis, emotional and expressive speech synthesis and recognition. In conjunction with other departments, researchers are using this facility to explore human behavior analysis, studying mother-child communication or ways of enriching the interaction between humans and robots or the gaming experience. In addition, researchers can test their algorithms using microphones, loudspeakers or video cameras under variable acoustic and lightning conditions.
Some examples of work that has been done for research:
-
Audio-visual database for research (2010): HD video (uncompressed 720p / 60fps) recordings with blue-key in controlled conditions. Multi-channel audio capture. Recording protocol. Database optimized for research in photorealistic audio-visual text-to-speech synthesis.
|
|