TAILIEUCHUNG - Báo cáo khoa học: "Decompounding query keywords from compounding languages"

Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). Furthermore, real-time IR systems (such as search engines) need to cope with noisy data, as user queries are sometimes written quickly and submitted without review. In this paper we apply a state-of-the-art procedure for German decompounding to other compounding languages, and we show that it is possible to have a single decompounding model that is applicable across languages. . | Decompounding query keywords from compounding languages Enrique Alfonseca Google Inc. ealfonseca@ Slaven Bilac Google Inc. slaven@ Stefan Pharies Google Inc. stefanp@ Abstract Splitting compound words has proved to be useful in areas such as Machine Translation Speech Recognition or Information Retrieval IR . Furthermore real-time IR systems such as search engines need to cope with noisy data as user queries are sometimes written quickly and submitted without review. In this paper we apply a state-of-the-art procedure for German decompounding to other compounding languages and we show that it is possible to have a single decompounding model that is applicable across languages. 1 Introduction Compounding languages Krott 1999 such as German Dutch Danish Norwegian Swedish Greek or Finnish allow the generation of complex words by merging together simpler ones. So for instance the flower bouquet can be expressed in German as Blumenstraufie made up of Blumen flower and straufie bouquet and in Finnish as kukkakimppu from kukka flower and kimppu bunch collection . For many language processing tools that rely on lexicons or language models it is very useful to be able to decompose compounds to increase their coverage and reduce out-of-vocabulary terms. Decompounders have been used successfully in Information Retrieval Braschler and Ripplinger 2004 Machine Translation Brown 2002 Koehn and Knight 2003 and Speech Recognition Adda-Decker et al. 2000 . The Cross Language Evaluation Forum CLEF competitions have shown that very simple approaches can produce big gains in Cross Language Information Retrieval CLIR for German and Dutch Monz and de Rijke 2001 and for Finnish Adafre et al. 2004 . When working with web data which has not necessarily been reviewed for correctness many of the words are more difficult to analyze than when working with standard texts. There are more words with spelling mistakes and many texts mix words from different languages.

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.