TAILIEUCHUNG - Báo cáo khoa học: "INTEGRATING WITH WORD BOUNDARY IDENTIFICATION SENTENCE UNDERSTANDING"

Chinese sentences are written with no special delimiters such as space to indicate word boundaries. Existing Chinese NLP systems therefore employ preprocessors to segment sentences into words. Contrary to the conventional wisdom of separating this issue from the task of sentence understanding, we propose an integrated model that performs word boundary identification in lockstep with sentence understanding. In this approach, there is no distinction between rules for word boundary identification and rules for sentence understanding. These two functions are combined. . | INTEGRATING WORD BOUNDARY IDENTIFICATION WITH SENTENCE UNDERSTANDING Kok Wee Gan Department of Information Systems Computer Science National University of Singapore Kent Ridge Crescent Singapore 0511 Internet gankw@ Abstract Chinese sentences are written with no special delimiters such as space to indicate word boundaries. Existing Chinese NLP systems therefore employ preprocessors to segment sentences into words. Contrary to the conventional wisdom of separating this issue from the task of sentence understanding we propose an integrated model that performs word boundary identification in lockstep with sentence understanding. In this approach there is no distinction between rules for word boundary identification and rules for sentence understanding. These two functions are combined. Word boundary ambiguities are detected especially the fallacious ones when they block the primary task of discovering the inter-relationships among the various constituents of a sentence which essentially is the essence of the understanding process. In this approach statistical information is also incorporated providing the system a quick and fairly reliable starting ground to carry out the primary task of relationship- building. 1 THE PROBLEM Chinese sentences are written with no special delimiters such as space to indicate word boundaries. Existing Chinese NLP systems therefore employ preprocessors to segment sentences into words. Many techniques have been developed for this task from simple pattern matching methods . maximum matching reverse maximum matching Wang et al. 1990 Kang Zheng 1991 to statistical methods . word association relaxation Sproat Shih 1990 Fan Tsai 1988 to rule-based approaches Huang 1989 Yeh Lee 1991 He et al. 1991 . However it is observed that simple pattern matching methods and stochastic methods perform poorly in sentences such as 1 2 and 3 where word boundary ambiguities exist. 1 1 ta benren sheng le She alone give birth to ASP san ge haizi .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.