ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Audio-Visual Consistency Retrieval

Subject

A significant rise in the availability and consumption of video content can be observed over the recent years with the prominent role of social platforms and streaming services. An interlinked modality to video is audio. A common problem that arises in situations of limited, overloaded, or unreliable bandwidth. The aim of this project is to discover temporal inconsistencies between audio and video (e.g. lagging audio or playback mismatch).

Kind of work

The work to be carried out for this project will involve the use of both video and audio models jointly. The primary goal will be to discover the correspondence (across time) of visual features to those of auditory features. This will be done by jointly encoding visual and audio embeddings to a common feature space and then studying their cyclic consistency over time. The two embeddings are considered consistent (i.e. temporally aligned) if the projections from one embedding directly correspond to embeddings at the supplementary modality at the same time step. The two embeddings can be considered cyclic consistent if consistency holds true for both modalities.
Finally, by enforcing cyclic consistency as the main objective, the audio and video can be aligned.

Framework of the Thesis

Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P. and Zisserman, A., 2019. Temporal cycle-consistency learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1801-1810).
Feichtenhofer, C., 2020. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 203-213).
Kazakos, E., Nagrani, A., Zisserman, A. and Damen, D., 2021, June. Slow-fast auditory streams for audio recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 855-859). IEEE.

Number of Students

1

Expected Student Profile

The project requires proficiency is Python with (some) previous experience on Computer Vision. Audio will be processes in the form of spectrograms (i.e. 2D representation of frequency x times) so no previous knowledge on audio recognition is required.

Promotors

Prof. Dr. Ir. Nikos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

Prof. Hichem Sahli

+32 (0)2 629 2916

hsahli@etrovub.be

more info

Supervisor

Mr. Alexandros Stergiou

+32 (0)2 629 2930

astergio@etrovub.be

more info

- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

info@etro.vub.ac.be

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer