| Author name | Ioannis Atlamazoglou |
|---|---|
| Title | Document Clustering And Topic Mining |
| Year | 2020-2021 |
| Supervisor | George Petasis GeorgePetasis |
The purpose of this thesis is topic of extraction from documents in Greek language and document clustering according to these topics, so that documents that that refer to the same topic or are similar, belong in the same cluster. After researching related work, popular methods of topic extraction models such as the LDA and text representation methods such as BERT and FASTTEXT, which are among the state if the art technologies used to export text representations in the form of vectors, were explored and applied. To evaluate the document clustering performance according to their vector embeddings, several metrics are applied which are suitable for such tasks.