ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

ETRO Events

A list of events ETRO is organizing or participating in.

PhD Defense
Data Representation and Kernel-based Machine Learning Methods for Speech Emotion Recognition

Presenter

Miss Fengna Wang [Email]

Abstract

This dissertation aims at highlighting potential solutions, from both the model aspect and the feature aspect, for recognizing the latent emotions of humans from their speech signals. In this work, two sparse kernel machines, i.e., relevance vector machine (RVM) and relevance units machine (RUM), have been proposed as recognition models for speech emotion recognition. Moreover, sparse coding (SC) is employed for emotional feature representation.
Support vector machine (SVM), as a popular machine learning approach, has been applied in several application domains, as well as in speech emotion recognition. Though SVM is theoretically sound, its model is based on kernel functions that satisfy the strong Mercer’s condition and SVM has the limitation that the required number of support vectors (SVs) typically grows linearly with the size of the training data.
To alleviate this limitation, in this dissertation, the RVM and the RUM have been adopted as alternative kernel approaches. RVM, a Bayesian based kernel method, can achieve comparable and even better performance than SVM, while providing a much sparser model. RUM, actually a further extension of RVM under the Bayesian framework, releases the constraint that relevance units (RUs) have to be selected from the training samples. Moreover, RUM treats RUs and kernel parameters as part of the model parameters. Therefore, RUM maintains all advantages of RVM, offers superior sparsity, and has better generalization performance for unseen data.
Finding an appropriate feature representation for audio data is central to speech emotion recognition. Most existing audio features rely on hand-crafted signal processing techniques. An alternative approach is to use features that are instead learned automatically. This has the advantage of generalizing well to new data, particularly if the features are learned in an unsupervised manner.
We propose using sparse coding (SC), a popular representative within this class, as a mean to automatically learn features from audio data.
Two SC based frameworks are proposed for speech emotion recognition, namely pooling shift-invariant sparse coding (PSISC), and hierarchical sparse coding (HSC). Overall, our experiments offer insights into what makes the proposed frameworks work well on speech emotion recognition benchmarks.

Short CV

Master degree in Computer Application, Northwestern Polytechnical University, 2009

Logistics

Date: 15.01.2015

Time: 13:00

Location: Promotion room Building D

- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

info@etro.vub.ac.be

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer