ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Federated Learning for Language Models

Subject

Language models have become fundamental tools in natural language processing (NLP), powering applications such as machine translation, text generation, and sentiment analysis. Traditional training methods for these models require vast centralized datasets, which often pose significant privacy and security risks. Federated learning (FL) offers an innovative solution by enabling the decentralized training of models across multiple devices or institutions while keeping raw data localized. This thesis aims to investigate the application of federated learning to language models, focusing on optimizing training efficiency, preserving model performance, and ensuring data privacy. The research will develop and evaluate novel FL algorithms tailored for language models, and explore potential real-world applications.

Kind of work

The objectives of the thesis are:
1. Literature Review: Conduct an extensive review of current federated learning methodologies and language models, understanding their applications, advantages, and limitations.
2. Algorithm Development: Design and implement federated learning algorithms specifically optimized for training language models.
3. Privacy: Explore and integrate advanced techniques such as differential privacy and compression to enhance data security within the federated learning framework.
4. Efficiency Optimization: Develop strategies to minimize communication overhead and enhance the efficiency of the federated training process, using methods like model compression and asynchronous updates.
5. Performance Evaluation: Assess the performance of the federated learning algorithms on various NLP benchmarks, comparing them with traditional centralized training methods.
6. Applications: Identify and evaluate potential applications of federated learning for language models in sectors such as healthcare, where data privacy is crucial.

Framework of the Thesis

Expected Outcomes:
- Development of novel federated learning algorithms optimized for language models.
- Enhanced privacy measures integrated into the federated training process.
- Improved training efficiency through optimized federated learning techniques.
- Comprehensive evaluation results showcasing the feasibility and effectiveness of federated learning for language models.
- Case studies highlighting the practical applications and benefits of federated learning.

A list of publications on federated learning, language models, and privacy-preserving technologies.
A. AbhishekV, S. Binny, R. JohanT, N. Raj, and V. Thomas, “Federated learning: Collaborative machine learning without centralized training data,” international journal of engineering technology and management sciences, 2022
J. Sun, A. Li, B. Wang, H. Yang, H. Li, and Y. Chen, “Soteria: Provable defense against privacy leakage in federated learning from representation perspective,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9307–9315, 2020.
Chen, Yiming, Lusine Abrahamyan, Hichem Sahli, and Nikos Deligiannis. "Learned Model Compression for Efficient and Privacy-Preserving Federated Learning." Authorea Preprints (2024).

Number of Students

1

Expected Student Profile

Good knowledge of machine learning, NLP and Python (especially PyTorch)

Promotor

Prof. Dr. Ir. Nikos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

info@etro.vub.ac.be

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer