Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Lecture “Natural language processing – Chapter 4: Computational linguistics” has contents: What is computational linguistics, corpus definitions, corpus categories, parallel corpora application, alignment methods, normalization, lemmatization and tokenization. | Trường Đại học Công nghiệp Tp. HCM Khoa Công nghệ thông tin (Faculty of Information Technology) N.L.P. NATURAL LANGUAGE PROCESSING Teacher: Lê Ngọc Tấn Email: letan.dhcn@gmail.com Blog: http://lengoctan.wordpress.com Chapter 4 Computational Linguistics NLP. p.2 What is computational linguistics? It is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective Corpus, Corpora Pre-processing : normalization, tokenization, Alignment Methods Programming NLP. p.3 Corpus Definitions What is a corpus? – It contains an important number of texts – Corpora : a set of corpus Golden corpus – Brown Corpus – Susanne Corpus – EUROPARL Corpus Corpus can be annotated or POS tagged NLP. p.4 Corpus Categories (1) Schema of corpus evolution NLP. .