TAILIEUCHUNG - Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation"

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. . | Splitting Long or Ill-formed Input for Robust Spoken-language Translation Osamu FURUSE 1 Setsuo YAMADA Kazuhide YAMAMOTO ATR Interpreting Telecommunications Research Laboratories 2-2 Hikaridai Seika-cho Soraku-gun Kyoto 619-0288 Japan syamada yamamoto @ Abstract This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages 1 elimination of null outputs 2 splitting of utterances into sentences and 3 robust translation of erroneous speech recognition results. 1 Introduction A spoken-language translation system requires the ability to treat long or ill-formed input. An utterance as input of a spoken-language translation system is not always one well-formed sentence. Also when treating an utterance in speech translation the speech recognition result which is the input of the translation component might be corrupted even though the input utterance is well-formed. Such a misrecognized result can cause a parsing failure and consequently no translation output would be produced. Furthermore we cannot expect that a speech recognition result includes punctuation marks such as a comma or a period between words which are useful information for parsing. 1 As a solution for treating long input long-sentence splitting techniques such as that of Current affiliation is NTT Communication Science Laboratories. 1 Punctuation marks are not used

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.