TAILIEUCHUNG - Báo cáo khoa học: "USE OF HERRISTIC KNOWLEDGE IN CHINNESE LANGUAGE ANALYSIS"

This paper describes an analysis method which uses heuristic knowledge to find local syntactic structures of Chinese sentences. We call it a preprocessing, because we use it before we do global syntactic structure analysisCl]of the input sentence. Our purpose is to guide the global analysis through the search space, to avoid unnecessary computation. To realize this, we use a set of special words that appear in commonly used patterns in Chinese. We call them "characteristic words" . They enable us to pick out fragments that might figure in the syntactic structure of the sentence. . | USE OF HEURISTIC KNOWLEDGE IN CHINESE LANGUAGE ANALYSIS Yiming Yang Toyoaki Nishida and Shuji Doshita Department of Information Science Kyoto University Sakyo-ku Kyoto 606 JAPAN ABSTRACT This paper describes an analysis method which uses heuristic knowledge to find local syntactic structures of Chinese sentences. wo call it a preprocessing because we use it before we do global syntactic structure analysiscllof the input sentence. Our purpose is to guide the global analysis through the search space to avoid unnecessary computation. To realize this we use a set of special words that appear in conmonly used patterns in Chinese. We call them characteristic words . They enable US to pick out fragments that might figure in the syntactic structure of the sentence. Knowledge concerning the use of characteristic words enables US to rate alternative fragments according to pattern statistics fragment length distance between characteristic words and so on. The preprocessing system proposes to the global analysis level a most likely partial structure. In case this choice is rejected backtracking looks for a second choice and so on. For our system we use 200 characteristic words. Their rules are written by 101 automata. We tested them against 120 sentences taken from a Chinese physics text book. For this limited set correct partial structures were proposed as first choice for 94 of sentences. Allowing a 2nd choice the score is 98 with a 3rd choice the score is 100 . 1. THE PROBLEM OF CHINESE LANGUAGE ANALYSIS Being a language in which only characters ideograms are used Chinese language has specific problems. Conpared to languages such as English there are few formal inflections to indicate the grammatical category of a word and the few inflections that do exist are often omitted. In English postfixes are often used to distinguish syntactical categories . translation translate difficult dificulty but in Chinese it is very common to use the same word characters for a verb a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.