TAILIEUCHUNG - Báo cáo khoa học: "A Stochastic Finite-State Morphological Parser for Turkish"

This paper presents the first stochastic finite-state morphological parser for Turkish. The non-probabilistic parser is a standard finite-state transducer implementation of two-level morphology formalism. A disambiguated text corpus of 200 million words is used to stochastize the morphotactics transducer, then it is composed with the morphophonemics transducer to get a stochastic morphological parser. We present two applications to evaluate the effectiveness of the stochastic parser; spelling correction and morphology-based language modeling for speech recognition. . | A Stochastic Finite-State Morphological Parser for Turkish Ha im Sak Tunga Gungor Dept. of Computer Engineering Boga ici University TR-34342 Bebek Istanbul Turkey gungort@ Murat Saraclar Dept. of Electrical Electronics Engineering Bogazici University TR-34342 Bebek Istanbul Turkey Abstract This paper presents the first stochastic finite-state morphological parser for Turkish. The non-probabilistic parser is a standard finite-state transducer implementation of two-level morphology formalism. A disambiguated text corpus of 200 million words is used to stochas-tize the morphotactics transducer then it is composed with the morphophonemics transducer to get a stochastic morphological parser. We present two applications to evaluate the effectiveness of the stochastic parser spelling correction and morphology-based language modeling for speech recognition. 1 Introduction Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. The computational aspects of Turkish morphology have been well studied and several morphological parsers have been built Oflazer 1994 Gungor 1995 . In language processing applications we may need to estimate a probability distribution over all word forms. For example we need probability estimates for unigrams to rank misspelling suggestions for spelling correction. None of the previous studies for Turkish have addressed this problem. For morphologically complex languages estimating a probability distribution over a static vocabulary is not very desirable due to high out-ofvocabulary rates. It would be very convenient for a morphological parser as a word generator analyzer to also output a probability estimate for a word generated analyzed. In this work we build such a stochastic morphological parser for Turkish1 and give two example applications for evaluation. 1The stochastic morphological parser is available for research purposes at http .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.