Goal
Upon successful completion of the course, the student will be able to
- Understand the levels of Natural Language Analysis and Processing (NLP)
- Recognize, understand, explain NLP techniques in combination with corresponding applications
- Highlight the specificity of individual NLP problems, the selection and adaptation to them of appropriate techniques
- Plan the evaluation of the methods in comparison with each other, recognizing the possibilities and limitations of each NLP method
- Communicate ideas related to the application of NLP techniques in a clear, concise and formal manner. The overall aims is students to be able to design, build and evaluate NLP systems to solve real-world problems, and explain their operation
Also, the course targets to the following general competencies:
- Ability to organize and plan work and manage time effectively
- Ability to communicate effectively (orally and written)
- Ability to solve problems
- Ability to develop critical thinking and capacity for critical approaches
- Ability to work in a team
- Ability of interdisciplinary approaches
- Ability to apply theoretical knowledge in practice
- Ability to evaluate algorithms, analyze and explain results
- Ability to research
- Ability to adapt methods and techniques to new situations and conditions
- Ability to generate new ideas – Creativity
Contents
- Introduction to natural language processing: basic concepts, layers of linguistic analysis, application examples
- Morphological analysis, text tokenization, sentence splitting, subword tokenization, regular expressions, text normalization, statistical properties of text and corpora
- Language modeling: n-gram models, smoothing techniques, neural language models, evaluation of language models
- Vector representation of words and texts, topic models, static embeddings
- Pre-trained language models and deep learning, contextualized embeddings
- Sequence to sequence (seq2seq) methods, Encoder-decoder models, Machine translation, Text summarization
- Sequence classification: methods and applications
- Sequence labeling, named-entity recognition and part-of-speech tagging
- Syntactic analysis: context-free grammars, probabilistic grammars, dependency parsing, full and partial parsing
- Semantic analysis, word sense disambiguation, semantic role labelling
Bibliography
- Κωνσταντίνος Τ. Φράγγος, Αναστάσιος Π. Κουτσούκος, «Η τεχνολογία της πληροφορίας στην επεξεργασία φυσικής γλώσσας – προβλήματα επεξεργασίας φυσικής γλώσσας», εκδόσεις ΜΥΡΜΙΔΟΝΕΣ, 2010, ISBN: 978-960-992790-1.
- Jurafsky, Daniel, and James H. Martin. "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition." (2009). https://web.stanford.edu/~jurafsky/slp3/
- Manning, Christopher D., Christopher D. Manning, and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999. https://nlp.stanford.edu/fsnlp/