Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Alignment-Based Discriminative String Similarity"

Ðức Phong 47 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identiﬁcation of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance; on nine separate cognate identiﬁcation experiments using six language pairs, we more than double the precision of traditional orthographic measures like Longest Common Subsequence Ratio and Dice’s Coefﬁcient. . | Alignment-Based Discriminative String Similarity Shane Bergsma and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton Alberta Canada T6G 2E8 bergsma kondrak @cs.ualberta.ca Abstract A character-based measure of similarity is an important component of many natural language processing systems including approaches to transliteration coreference word alignment spelling correction and the identification of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance on nine separate cognate identification experiments using six language pairs we more than double the precision of traditional orthographic measures like Longest Common Subsequence Ratio and Dice s Coefficient. We also show strong improvements over other recent discriminative and heuristic similarity functions. 1 Introduction String similarity is often used as a means of quantifying the likelihood that two pairs of strings have the same underlying meaning based purely on the character composition of the two words. Strube et al. 2002 use Edit Distance as a feature for determining if two words are coreferent. Taskar et al. 2005 use French-English common letter sequences as a feature for discriminative word alignment in bilingual texts. Brill and Moore 2000 learn misspelled-word to correctly-spelled-word similarities for spelling correction. In each of these examples a similarity measure can make use of the recurrent substring pairings that reliably occur between 656 words having the same meaning. Across natural languages these recurrent substring correspondences are found in word pairs known as cognates words with a common form and meaning across languages. Cognates arise either from words in a common ancestor language e.g. light Licht night Nacht in English German or from foreign .

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.