TAILIEUCHUNG - Báo cáo khoa học: "Arabic Retrieval Revisited: Morphological Hole Filling"

Due to Arabic’s morphological complexity, Arabic retrieval benefits greatly from morphological analysis – particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links. | Arabic Retrieval Revisited Morphological Hole Filling Kareem Darwish Ahmed M. Ali Qatar Computing Research Institute Qatar Foundation Doha Qatar kdarwish@ amali@ Abstract Due to Arabic s morphological complexity Arabic retrieval benefits greatly from morphological analysis - particularly stemming. However the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links. The use of our model yields statistically significant improvements in Arabic retrieval over the use of the best statistical stemming technique. The technique can potentially be applied to other languages. 1. Introduction Arabic exhibits rich morphological phenomena that complicate retrieval. Arabic nouns and verbs are typically derived from a set of 10 000 roots that are cast into stems using templates that may add infixes double letters or remove letters. Stems can accept the attachment of clitics in the form of prefixes or suffixes such as prepositions determiners pronouns etc. Orthographic rules can cause the addition deletion or substitution of letters during suffix and prefix attachment. Further stems can be inflected to obtain plural forms via the addition of suffixes or through using a different stem form altogether producing so-called broken1 aka irregular plurals. For retrieval we would ideally like to match related stem forms regardless of inflected form or attached clitic. Tolerating some form of derivational morphology where nouns are transformed into adjectives via the attachment of the suffix ự y 2 ex. J x mSr ựj mSry is desirable as they are semantically related. Matching all stems that are cast from the same root would introduce undesired ambiguity because a single root can produce up to 1 000 stems. Two general approaches have been shown to improve Arabic retrieval. The first approach

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.