Goal
Upon successful completion of the course, the student will be able to:
- Identify and recognize opportunities, limitations and possibilities of applying multimedia signal analysis and recognition techniques in various areas of modern life
- Point out the specificity of the individual problems, the selection and adaptation to them of the appropriate techniques of analysis and recognition of multimedia signals
- Plan the evaluation of machine learning methods in comparison with each other, to recognize the possibilities and limitations of each method/technique, always taking into account the specificities of the multimedia data under analysis
- With the ultimate goal of being able to design, build and evaluate multimedia data content segmentation, analysis, recognition and visualization systems
Also, the course targets to the following general competencies:
- Ability to organize and plan work and manage time effectively
- Ability to communicate effectively (orally and written)
- Ability to solve problems
- Ability to develop critical thinking and capacity for critical approaches
- Ability to work in a team
- Ability to apply theoretical knowledge in practice
- Ability to research
- Ability to adapt methods and techniques to new situations and conditions
Contents
- Signal and image analysis topics
- Audio representations and feature extraction
- Audio signal characterization: classification, segmentation, clustering, matching
- Voice recognition
- Introduction to image data, coding and representation, basic machine vision concepts
- Image processing with machine learning: segmentation, edge detection, alignment, feature extraction classification, search and retrieval
- Video analysis: motion and flow analysis, time-dimensional event recognition, video metadata and annotation, search and retrieval
- Using deep learning for image and video classification, convolutional neural networks, visualization and understanding, transfer learning
- Using temporal representation models for video analysis
Bibliography
- Digital Image Processing (4th Edition) 4th Edition, by Rafael C. Gonzalez, Richard E. Woods
- Computer Vision: Models, Learning, and Inference 1st Edition, by Simon J. D. Prince
- Theory and Applications of Digital Speech Processing, by Lawrence Rabiner
- MPEG-7 Audio and Beyond.: Audio Content Indexing and Retrieval, by Hyoung-Gook Kim, Nicolas Moreau, Thomas Sikora
- Introduction to Audio Analysis: A MATLAB® Approach, by Theodoros Giannakopoulos, Aggelos Pikrakis
- Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications, by Meinard Müller
- Discrete-Time Speech Signal Processing: Principles and Practice, by Thomas F. Quatieri