Author name | Ιγνάτιος Χατζηγιανέλλης |
---|---|
Title | Greek news topics classification using graph neural networks |
Year | 2019-2020 |
Supervisor | George Petasis GeorgePetasis |
Recent advancements in deep learning have increased the research interest in the field of Natural Language Processing (NLP). This has led to various state-of-the-art breakthroughs, with the family of graph-based methods being no exception. The aim of this thesis is to contribute in the progression of Graph Neural Networks (GNN) and the field of NLP, by studying the problem of text classification in the Greek language. We start by evaluating the task with well established machine learning and deep learning methods, then we finalize our work by researching graph-based approaches and proposing a novel addition to them. GNN methods use different kinds of neural networks to process spatial information and construct their graphs on document or corpus level. However, regardless their structure, all studies use non-contextual embeddings for their training. Drawing inspiration from contextual language models, in this work we propose a method based on a recent study, which has been modified to utilize quantized contextual embeddings. In order to achieve this, we first employ a pre-trained BERT model which produces the contextual embeddings for our vocabulary. Using every available embedding, though, would lead to a very sparse and inefficient graph. To overcome this issue, we quantize the numerous representations of every word with the use of K-means, by clustering multiple embeddings into a fixed amount of centroids. Finally, we use those centroids as the actual input of the graphs. Multiple experiments on our dataset show that our suggested method outperforms both the baseline experiments and the original method on which we have been based on.