TAILIEUCHUNG - Báo cáo khoa học: "Automatic Sanskrit Segmentizer Using Finite State Transducers"

In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. | Automatic Sanskrit Segmentizer Using Finite State Transducers Vipul Mittal Language Technologies Research Center IIIT-H Gachibowli Hyderabad India. vipulmittal@ Abstract In this paper we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 rules extracted from a parallel corpus of manually sandhi split text. While the first approach augments the finite state transducer used to analyze Sanskrit morphology and traverse it to segment a word the second approach generates all possible segmentations and validates each constituent using a morph analyzer. 1 Introduction Sanskrit has a rich tradition of oral transmission of texts and this process causes the text to undergo euphonic changes at the word boundaries. In oral transmission the text is predominantly spoken as a continuous speech. However continuous speech makes the text ambiguous. To overcome this problem there is also a tradition of reciting the pada-patha recitation of words in addition to the recitation of a sarnhita a continuous sandhied text . In the written form because of the dominance of oral transmission the text is written as a continuous string of letters rather than a sequence of words. Thus the Sanskrit texts consist of a very Sandhi means euphony transformation of words when they are consecutively pronounced. Typically when a word W1 is followed by a word w2 some terminal segment of wi merges with some initial segment of w2 to be replaced by a smoothed phonetic interpolation corresponding to minimizing the energy necessary to reconfigurate the vocal organs at the juncture between the words. long sequence of phonemes with the word boundaries having undergone .

Mộng Thu 90 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Automatic Sanskrit Segmentizer Using Finite State Transducers"

6 78 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461848 55

Giới thiệu :Lập trình mã nguồn mở

14 22536 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10868 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10031 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9492 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8252 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8208 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6649 253

Vật lý hạt cơ bản (1)

29 5757 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 262 0 20-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 205 0 20-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 20-04-2024

Management and Services Part 1

10 155 0 20-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 20-04-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 137 0 20-04-2024

Khurana et al. Journal of Orthopaedic Surgery and Research 2010, 5:23

7 133 0 20-04-2024

GIÁO TRÌNH MÁY ĐIỆN KHÍ CỤ ĐIỆN - PHẦN I MÁY ĐIỆN - CHƯƠNG 1

46 129 2 20-04-2024

New Trends and Developments in Automotive Industry Part 7

35 91 0 20-04-2024

Kỹ thuật nuôi cá rồng part 5

7 126 0 20-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5617 1333

Ebook Chào con ba mẹ đã sẵn sàng

112 3752 1229

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5259 1127

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8252 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3475 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10868 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3671 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4031 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4109 479