TAILIEUCHUNG - Báo cáo khoa học: "A Language-Independent Unsupervised Model for Morphological Segmentation"

Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. . | A Language-Independent Unsupervised Model for Morphological Segmentation Vera Demberg School of Informatics University of Edinburgh Edinburgh EH8 9LW Gb Abstract Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation speech recognition speech synthesis and information retrieval. Recently a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German and also yields good results on agglutinative languages such as Finnish and Turkish. We also propose a method for detecting variation within stems in an unsupervised fashion. The segmentation quality reached with the new algorithm is good enough to improve grapheme-to-phoneme conversion. 1 Introduction Morphological segmentation has been shown to be beneficial to a number of NLP tasks such as machine translation Goldwater and McClosky 2005 speech recognition Kurimo et al. 2006 information retrieval Monz and de Rijke 2002 and question answering. Segmenting a word into meaningbearing units is particularly interesting for morphologically complex languages where words can be composed of several morphemes through inflection derivation and composition. Data sparseness for such languages can be significantly decreased when 920 words are decomposed morphologically. There exist a number of rule-based morphological segmentation systems for a range of languages. However expert knowledge and labour are expensive and the analyzers must be updated on a regular basis in order to cope with language change the emergence of new words and their inflections . One might argue that unsupervised algorithms are not an interesting option from the engineering point of view because rule-based systems usually lead to better results. However segmentations .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.