|
Subject
Multi-modal representation learning is a machine learning approach that aims to combine information from multiple data modalities (e.g., images, text, audio, video, or sensor data) into a shared representation space. The motivation comes from the fact that different modalities often provide complementary perspectives on the same phenomenon. By aligning or fusing these heterogeneous inputs, multi-modal representation learning enables models to: 1. Leverage complementary information across modalities.
2. Improve generalization on downstream tasks such as classification, retrieval, and reasoning.
3. Handle missing information by transferring knowledge from one modality to another.
Kind of work
Current efforts in the literature focus on a small number of modalities. Moreover, most evaluations are limited to a subset of datasets. This raises issues regarding the scalability of these methods to large datasets.
Framework of the Thesis
The thesis will begin with a thorough literature review, covering existing research on techniques for addressing the computational costs of uni-modal representation learning. In parallel, the student should learn how to run their implemented codes in a cluster infrastructure of hardware resources, like Hydra [1]. The student will then implement and evaluate existing approaches for tackling the aforementioned issues under a vision classification task setting. In the next step, the student will investigate how to extend these approaches to multi-modal representation learning. The research findings, results, and contributions will be documented in a thesis manuscript.
References [1] https://hpc.vub.be/docs/infrastructure/ [2] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
Expected Student Profile
? The ideal candidate for this master thesis should have a strong background and interest in deep learning with Python programming. ? The candidate should have a background in Vision Convolutional Neural Networks. ? The candidate should have experience with the PyTorch framework. ? The candidate should have an interest in learning and developing codes in a cluster of GPU resources. ? The candidate should have an interest in working with large-scale datasets like ImageNet [2].
|
|