Machine Learning οn Multimedia Data

Course semester
2nd semester
Course category
Elective
ECTS
7,5
Tutors

Th. Giannakopoulos, I. Maglogiannis

Goal

Upon successful completion of the course, the student will be able to:

  • Identify and recognize opportunities, limitations and possibilities of applying multimedia signal analysis and recognition techniques in various areas of modern life
  • Point out the specificity of the individual problems, the selection and adaptation to them of the appropriate techniques of analysis and recognition of multimedia signals
  • Plan the evaluation of machine learning methods in comparison with each other, to recognize the possibilities and limitations of each method/technique, always taking into account the specificities of the multimedia data under analysis
  • With the ultimate goal of being able to design, build and evaluate multimedia data content segmentation, analysis, recognition and visualization systems

Also, the course targets to the following general competencies:

  • Ability to organize and plan work and manage time effectively
  • Ability to communicate effectively (orally and written)
  • Ability to solve problems
  • Ability to develop critical thinking and capacity for critical approaches
  • Ability to work in a team
  • Ability to apply theoretical knowledge in practice
  • Ability to research
  • Ability to adapt methods and techniques to new situations and conditions

Contents

  • Signal and image analysis topics
  • Audio representations and feature extraction
  • Audio signal characterization: classification, segmentation, clustering, matching
  • Voice recognition
  • Introduction to image data, coding and representation, basic machine vision concepts
  • Image processing with machine learning: segmentation, edge detection, alignment, feature extraction classification, search and retrieval
  • Video analysis: motion and flow analysis, time-dimensional event recognition, video metadata and annotation, search and retrieval
  • Using deep learning for image and video classification, convolutional neural networks, visualization and understanding, transfer learning
  • Using temporal representation models for video analysis

Bibliography

  • Digital Image Processing (4th Edition) 4th Edition, by Rafael C. Gonzalez, Richard E. Woods
  • Computer Vision: Models, Learning, and Inference 1st Edition, by Simon J. D. Prince
  • Theory and Applications of Digital Speech Processing, by Lawrence Rabiner
  • MPEG-7 Audio and Beyond.: Audio Content Indexing and Retrieval, by Hyoung-Gook Kim, Nicolas Moreau, Thomas Sikora
  • Introduction to Audio Analysis: A MATLAB® Approach, by Theodoros Giannakopoulos, Aggelos Pikrakis
  • Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications, by Meinard Müller
  • Discrete-Time Speech Signal Processing: Principles and Practice, by Thomas F. Quatieri