An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization Host Publication: 20th European Signal Processing Conference (EUSIPCO-2012) Authors: P. Soens and W. Verhelst Publisher: IEEE Publication Year: 2012 ISBN: 978-1-4673-1068-0
Abstract: Vocal Tract Length Normalization is a widely deployed speaker normalization technique, which compensates for vocal tract length differences among speakers by appropriately warping the frequency axis of the speech signal. In this work, we study the use of this technique on the time synchronization paradigm. An efficient bilinear frequency warping procedure is proposed, in which the amount of warping is iteratively optimized in accordance with a criterion that is directly related to the output of the standard Dynamic Time Warping algorithm. Subjective listening tests performed on mixed gender time-aligned results obtained with a subset of data from the English EUROM1 Many Talker Set have shown that the proposed procedure significantly improves the overall speech quality and the time synchronization accuracy with 85% and 91%, respectively.
|