|
Subject
The problem of semi-supervised learning refers to learning with limited labels. As an example consider a dataset of ~1 million images, but only 1000 labels of this dataset exist, which implies we have 999K unlabeled data. In Semi-Supervised learning, we try to learn a model from the unlabeled images (on the 999K images) and the labeled images (1K), together. There are several methods to do this, such as pretraining the model in a self-supervised fashion on the unlabeled data and then fine-tuning it on the labeled data, or even using techniques such as consistency and confidence to utilize both labeled and unlabeled images together. In fact, there are many real-word scenarios where we do not have labeled data (e.g medical imaging). This research is therefore very important in the field of deep learning and has almost approached the supervised baseline. Given its significance, it is therefore necessary to understand how these models are learning. The task is to develop new explainability techniques similar to the ones proposed for image classification (e.g Integrated Gradients, Occlusion, Smooth Gradients, Grad-CAM and its variants, LIME
etc) in order to suit semi-supervised models.
Kind of work
The student will need to develop and/or apply explainability techniques in order to understand how semi-supervised models learn from such a small amount of data. Since there will be no training, there will be no need for a large amount of GPUs to perform this work. In fact, one GPU is enough, or even a good-performing CPU. The tasks can also easily be performed on Google Colab.
Number of Students
1-2
Expected Student Profile
Prior knowledge in Machine Learning Prior knowledge in Python and PyTorch
|
|