|
Subject
Breast cancer is the most common cancer among women worldwide and remains a leading cause of cancer-related mortality. A key early indicator of breast cancer is the presence of breast microcalcifications (MCs) - tiny calcium deposits that appear as small white spots in mammography images. Their detection and accurate classification are crucial, as they are often the first visible sign of malignancy and guide radiologists in early diagnosis. Extensive research exists on the classification of MCs using machine learning (ML) and deep learning (DL) techniques. Recently, a comprehensive study evaluated various state-of-the-art algorithms for this task, including Convolutional Neural Networks (CNNs) and Transformers [1]. The study identified several top-performing models, providing valuable insights into effective architectures for their classification. Building on this, the aim of this thesis is to re-apply and evaluate some of these top-performing CNN and Transformer models on a newly released dataset - the EMBED dataset [2]. The focus will be on comparing model performance for individual MCs classification and cluster-level classification. The models have not yet been used in this dataset and the individual vs cluster comparison has not yet been explored either.
Kind of work
Objective: To explore, develop, and evaluate CNN- and Transformer-based methods for classifying breast MCs from 2D mammography images using the EMBED dataset, with a focus on comparing individual vs. cluster-level classification performance across selected top-performing models.
Description of work: - Literature Review (ETOC: 2 months): Review the key study identifying top CNN and Transformer models for MCs classification. Explore additional relevant literature and techniques. - Dataset Familiarization (ETOC: 1 month) - Understand the EMBED datasets structure, image annotations, and preprocessing requirements. Explore whether segmentation or detection of individual MCs is necessary based on available annotations. - Implementation (ETOC: 6 months) - Implement selected CNN and Transformer models (open-source or custom, where needed). Preprocess data, train models, and compare their performance for individual MCs classification vs. cluster-level classification. Evaluate models using classification metrics (e.g., accuracy, AUC, F1-score). Analyse and compare the performance of each model across both classification levels.
Framework of the Thesis
Related work: [1] Cantone, M., Marrocco, C., Tortorella, F. and Bria, A., 2023. Convolutional networks and transformers for mammography classification: an experimental study. Sensors, 23(3), p.1229. [2] Jeong, Jiwoong J., Brianna L. Vey, Ananth Bhimireddy, Thomas Kim, Thiago Santos, Ramon Correa, Raman Dutt et al. "The EMory BrEast imaging Dataset (EMBED): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images." Radiology: Artificial Intelligence 5, no. 1 (2023): e220047.
Expected Student Profile
Following an MSc in a field related to one or more of the following: Computer Science, Biomedical Engineering, Applied Computer Science - Digital Health. Strong programming skills (Python). Ability to write scientific reports and communicate research results at conferences in English.
|
|