| Author name | Angelos Geroulanos |
|---|---|
| Title | Deep Learning for Music Emotion Recognition |
| Year | 2020-2021 |
| Supervisor | Theodoros Giannakopoulos TheodorosGiannakopoulos |
Music is a carrier of many powerful emotions. With the growth of technology and internet, huge amounts of music content can be accessed instantly from almost anywhere. Despite the availability, music selection based on the listener's emotional state is quite a difficult task. This work investigates through deep learning techniques the ability of well-known CNN architectures (VGG, AlexNet, DenseNet, Inception, ResNeXt, SqueezeNet) in music emotion recognition under scarce data conditions, with diverse and not always balanced sets. The techniques used are Transfer Learning and data augmentation via Generative Adversarial Networks (GANs). But before that, traditional machine learning is used to extract hand-crafted features of all audio samples and classify them using well-known classifiers (SVM, K-NN, Random Forest, Extra Trees) in order to have a reference point for the aggregated results.
Thus, the samples are converted into Mel-spectrograms as inputs to the convolutional networks which are trained with two transfer learning scenarios and yield models that are tested in emotion classification experiments. Finally, data augmentation is performed using StyleGAN2-ADA and a new artificial set is created which in turn is tested in classification experiments. The ground truth of these experiments is the 360-set of Eerola & Vuoskoski's research fully tagged by experts in the music field, a fact that makes it quite rare. It consists of 360 excerpts of movie music with a duration of 15''-30''', divided into Energy (high, medium, low), Valence (positive, neutral, negative), Tension (high, medium, low) and Emotions (anger, fear, happy, sad, tender). To our knowledge, this is the first study to conduct such extensive experiments on this set.
Related publication:
Geroulanos, A., Giannakopoulos, T. (2023). Emotion Recognition in Music Using Deep Neural Networks. In: Biswas, A., Wennekes, E., Wieczorkowska, A., Laskar, R.H. (eds) Advances in Speech and Music Technology. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-18444-4_10