ETRO-VUB Department of Electronics and Informatics

About ETRO | News | Events | Vacancies | Contact

Home

Research

Master theses

Current and past ideas and concepts for Master Theses.


	Multi-Modal Representation Learning on Large-Scale Datasets

	Subject Multi-modal representation learning is a machine learning approach that aims to combine information from multiple data modalities (e.g., images, text, audio, video, or sensor data) into a shared representation space. The motivation comes from the fact that different modalities often provide complementary perspectives on the same phenomenon. By aligning or fusing these heterogeneous inputs, multi-modal representation learning enables models to: 1. Leverage complementary information across modalities. 2. Improve generalization on downstream tasks such as classification, retrieval, and reasoning. 3. Handle missing information by transferring knowledge from one modality to another. Kind of work Current efforts in the literature focus on a small number of modalities. Moreover, most evaluations are limited to a subset of datasets. This raises issues regarding the scalability of these methods to large datasets. Framework of the Thesis The thesis will begin with a thorough literature review, covering existing research on techniques for addressing the computational costs of uni-modal representation learning. In parallel, the student should learn how to run their implemented codes in a cluster infrastructure of hardware resources, like Hydra [1]. The student will then implement and evaluate existing approaches for tackling the aforementioned issues under a vision classification task setting. In the next step, the student will investigate how to extend these approaches to multi-modal representation learning. The research findings, results, and contributions will be documented in a thesis manuscript. References [1] https://hpc.vub.be/docs/infrastructure/ [2] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. Expected Student Profile ? The ideal candidate for this master thesis should have a strong background and interest in deep learning with Python programming. ? The candidate should have a background in Vision Convolutional Neural Networks. ? The candidate should have experience with the PyTorch framework. ? The candidate should have an interest in learning and developing codes in a cluster of GPU resources. ? The candidate should have an interest in working with large-scale datasets like ImageNet [2].

Promotors

Prof. Dr. Ir. Nikos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

Prof. Hichem Sahli

+32 (0)2 629 2916

hsahli@etrovub.be

more info

Supervisor

Mr. Hamed Behzadi Khormouji

+32 (0)2 629 2930

hbehzadi@etrovub.be

more info

	Image

	Multi modal representation


Research - Contact person - IRIS - AVSP - LAMI	Education - Contact person - Thesis proposals - ETRO Courses	Industry - Contact person - Spin-offs - Know How	Publications - Journals - Conferences - Books	About ETRO - Vacancies - News - Events - Press	Contact ETRO Department info@etro.vub.ac.be Tel: +32 2 629 29 30


©2025 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • Webmaster • Disclaimer