TAILIEUCHUNG - Báo cáo khoa học: "AN ALGORITHM FOR IDENTIFYING COGNATES BETWEEN RELATED LANGUAGES"

The algorithm takes as only input a llst of words, preferably but not necessarily in phonemic transcription, in any two putatively related languages, and sorts it into decreasing order of probable cognatlon. The processing of a 250-1tem bilingual list takes about five seconds of CPU time on a DEC KLI091, and requires 56 pages of core memory. The algorithm is given no information w h a t s o e v e r about the phonemic transcription .used, and even though cognate i d e n t i f i c a t i o n is carried. | AN ALGORITHM FOR IDENTIFYING COGNATES BETWEEN RELATED LANGUAGES Jacques . Guy Linguistics Department RSPacS Australian National University GPO Box 4 Canberra 2601 AUSTRALIA ABSTRACT The algorithm takes as only input a list of words preferably but not necessarily in phonemic transcription in any two putatively related languages and sorts it into decreasing order of probable cognation. The processing of a 250-ltem bilingual list takes about five seconds of CPU time on a DEC KL1091 and requires 56 pages of core memory. The algorithm is given no information whatsoever about the phonemic transcription .used and even though cognate identification is carried out on the basis of a context-free one-for-one matching of Individual characters its cognation decisions are bettered by a trained linguist using more information only In cases of wordlists sharing less than 40 cognates and involving complex multiple sound correspondences. I FUNDAMENTAL PROCEDURES A. Identifying Sound Correspondences Consider the following wordlist from two hypothetical Austroneslan-llke languages Titia Sese eye mata nas sea tasi sab father tama san mother mama nan tongue mimi nen shellfish slsl hehe bad sati has to stand ti se to come ma na with mi ne not sa ha Take the first word pair mata nas. We have no information about the phonetic values of their constituent characters we do not know whether the same system of transcription was used in both wordlists for all we know a might denotes a high back rounded vowel In Titla and a uvular trill In Sese. The only assumption allowed is that In each word list the same characters represent more or less the same sounds. Under this assumption the possibility that any one character of a member of a word pair may correspond to any character of the other member cannot be discarded. Thus In the pair mata nas Tltia m may correspond to Sese n a or s and so may Titia a t a and s . We summarize the evidence for these possible correspondences In an TxS matrix where

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.