|
DEEP LEARNING FOR BIG DATA ANALYSIS Presenter Mr Minh Duc Nguyen - ETRO, VUB [Email] Abstract Recent years have witnessed massive growths in the amount of data generated from applications in various domains, spanning from smart cities and healthcare to e-commerce and social networks. This phenomenon, often captured by the term Big Data, has led to a significant progress in machine learning research. For instance, deep-learning-based models trained on large-scale datasets can generate text that is plausible to a human reader, achieve near human-level performance in object recognition, or defeat the world champion in Go, a complex strategic game. Despite this progress, the full potential of big data has not yet been unleashed. In order to fully make sense of big data, machine learning, and particularly deep learning algorithms, need to address numerous challenges, including data quality and data modelling.
In this thesis, we focus on two challenges of machine learning for big data, namely, dealing with incomplete data and learning on graphs. The first challenge arises in different areas due to the nature of the data collection processes. In recommender systems, for example, the observed user-item interactions are often dominated by the unobserved ones, resulting in big incomplete data matrices. In e-commerce systems, user-generated data usually contains one or multiple missing attributes, resulting in incomplete data tables. While the first challenge concerns the quality of the data, the second challenge is about effectively making sense of graph data, which is the most natural way to represent information in various applications, such as user interactions in social networks, traffic flows in traffic monitoring and forecasting systems, and semantic relationships among entities in knowledge graphs. Methods that overcome these two challenges could benefit a great number of big data applications.
We address the two challenges by leveraging the deep-learning paradigm. Our first contribution in this thesis is a set of deep-learning-based matrix completion solutions to deal with the incomplete data problem. Our solutions employ novel neural network architectures and consider challenging yet meaningful requirements, namely, the abilities to extend to new rows and columns of data, effectively deal with discrete data matrices in certain applications, and perform robustly under scarcity of observations. Our second contribution is a deep neural network model for learning on graphs, which effectively incorporates the relationships among the graphs nodes at the learning and inference stages.
The models developed in this thesis are generic and can lead to improvement in many big data application domains such as e-commerce and social media analysis. Our comprehensive experiments in the recommender systems and rumour detection settings demonstrate the effectiveness of these models and their useful properties over existing ones.
Short CV Minh Duc Nguyen received the B.Sc. degree in Computer Science from Vietnam National University, Hanoi, Vietnam in 2012, and the M.Sc. degree in Computer Science from the Vrije Universiteit Brussel, Brussels, Belgium in 2015
|