Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Automatic Sanskrit Segmentizer Using Finite State Transducers"

Mộng Thu 90 6 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. | Automatic Sanskrit Segmentizer Using Finite State Transducers Vipul Mittal Language Technologies Research Center IIIT-H Gachibowli Hyderabad India. vipulmittal@research.iiit.ac.in Abstract In this paper we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 rules extracted from a parallel corpus of manually sandhi split text. While the first approach augments the finite state transducer used to analyze Sanskrit morphology and traverse it to segment a word the second approach generates all possible segmentations and validates each constituent using a morph analyzer. 1 Introduction Sanskrit has a rich tradition of oral transmission of texts and this process causes the text to undergo euphonic changes at the word boundaries. In oral transmission the text is predominantly spoken as a continuous speech. However continuous speech makes the text ambiguous. To overcome this problem there is also a tradition of reciting the pada-patha recitation of words in addition to the recitation of a sarnhita a continuous sandhied text . In the written form because of the dominance of oral transmission the text is written as a continuous string of letters rather than a sequence of words. Thus the Sanskrit texts consist of a very Sandhi means euphony transformation of words when they are consecutively pronounced. Typically when a word W1 is followed by a word w2 some terminal segment of wi merges with some initial segment of w2 to be replaced by a smoothed phonetic interpolation corresponding to minimizing the energy necessary to reconfigurate the vocal organs at the juncture between the words. long sequence of phonemes with the word boundaries having undergone .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: " An Input Device for the Harvard Automatic Dictionary"

Báo cáo khoa học: "A Formula Finder for the Automatic Synthesis of Translation Algorithms"

Báo cáo khoa học: "Automatic Paraphrasing in Essay Format"

Báo cáo khoa học: "Some Comments on Algorithm and Grammar in the Automatic Parsing of Natural Languages"

Báo cáo khoa học: "Some Notes on Russian Predicative Infinitives in Automatic Translation"

Báo cáo khoa học: "Automatic Determination of Parts of Speech of English Words"

Báo cáo khoa học: "Automatic Event Extraction with Structured Preference Modeling"

Báo cáo khoa học: "Automatic Evaluation of Linguistic Quality in Multi-Document Summarization"

Báo cáo khoa học: "Automatic Generation of Story Highlights"

Báo cáo khoa học: "TrustRank: Inducing Trust in Automatic Translations via Ranking"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.