Đang chuẩn bị liên kết để tải về tài liệu:
A clustering technique for the Vietnamese word categorization

Hải Nam 81 12 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

A clustering technique for the Vietnamese word categorization. In natural language processing, part-of-speech (POS) tagging plays an important role, as its output is the input of many other tasks (syntax analysis, semantic analysis. . . ). One of the problems related to POS tagging is to define the POS set. This could be solved using unsupervised machine learning methods. | TẠP CHÍ KHOA HỌC ĐẠI HỌC ĐÀ LẠT Tập 6, Số 2, 2016 207–218 207 A CLUSTERING TECHNIQUE FOR THE VIETNAMESE WORD CATEGORIZATION Nguyen Minh Hiepa, Nguyen Thi Minh Huyenb, Ngo The Quyenb, Tran Thi Phuong Linha a The Faculty of Information Technology, Dalat University, Lamdong, Vietnam b The Faculty of Informatics, VNU University of Science, Hanoi, Vietnam Article history Received: January 04th, 2016 Received in revised form: March 10th, 2016 Accepted: March 16th, 2016 Abstract In natural language processing, part-of-speech (POS) tagging plays an important role, as its output is the input of many other tasks (syntax analysis, semantic analysis. . . ). One of the problems related to POS tagging is to define the POS set. This could be solved using unsupervised machine learning methods. This paper presents an application of the DBSCAN clustering algorithm to classify Vietnamese words from a large corpus. The features used to characterize each word are naturally defined by the context of that word in a sentence. We use a large corpus containing sentences automatically extracted from the online Nhan Dan newspaper. Keywords: Clustering; Corpus; DBSCAN; POS; POS tagging; Tag set. 1. INTRODUCTION The question of Vietnamese word classification has been discussed in several linguistic studies [1]. This problem can be solved by the method called unsupervised machine learning method. We present technique that clusters Vietnamese words from a store of documents in the order to identify a tagged lexical class. The feature which is used to cluster words is the context of this word in the sentence. The algorithm DBSCAN is used to cluster words. Data training are automatically clustered big size Vietnamese document store from Nhan Dan online and Thanh Nien online newspapers. Corresponding author: Email: hiepnm@dlu.edu.vn TẠP CHÍ KHOA HỌC ĐẠI HỌC ĐÀ LẠT [ĐẶC SAN CÔNG NGHỆ THÔNG TIN] 208 This article comprises three parts. Part 1 introduces the research motivation .

TÀI LIỆU LIÊN QUAN

Fuzzy clustering as an intrusion detection technique

Entropy-based intuitionistic fuzzy C-means clustering

Báo cáo khoa học: "Clustering Technique in Multi-Document Personal Name Disambiguation"

A clustering technique for the Vietnamese word categorization

An improved K-power means technique using minkowski distance metric and dimension weights for clustering wireless multipaths in indoor channel scenario

A review on various clustering techniques in data mining

An improved forecasting model combining recurrent fuzzy logical relationships and K-means clustering technique

The capacitated single source p-center problem in the presence of fixed cost and multilevel capacities using VNS and aggregation technique

Báo cáo hóa học: " Research Article Ultra-Wideband Geo-Regioning: A Novel Clustering and Localization Technique"

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.