TAILIEUCHUNG - Báo cáo khoa học: "Character-Level Dependencies in Chinese: Usefulness and Learning"

We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences, word in Chinese is not so natural a concept as in English, nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks. The first is to handle trivial character dependencies that are equally transformed from traditional word boundaries. . | Character-Level Dependencies in Chinese Usefulness and Learning HaiZhao Department of Chinese Translation and Linguistics City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong China haizhao@ Abstract We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences word in Chinese is not so natural a concept as in English nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks. The first is to handle trivial character dependencies that are equally transformed from traditional word boundaries. The second furthermore considers the case that annotated internal character dependencies inside a word are involved. Both of these results from character-level dependency parsing are positive. This study provides an alternative way to formularize basic character-and word-level representation for Chinese. 1 Introduction In many human languages word can be naturally identified from writing. However this is not the case for Chinese for Chinese is born to be written in character1 sequence rather than word sequence namely no natural separators such as blanks exist between words. As word does not appear in a natural way as most European languages2 it Character here stands for various tokens occurring in a naturally written Chinese text including Chinese charac-ter hanzi punctuation and foreign letters. However Chinese characters often cover the most part. 2Even in European languages a naive but necessary method to properly define word is to list them all by hand. Thank the first anonymous reviewer who points this fact. brings the argument about how to determine the word-hood in Chinese. Linguists views .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.