Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation"

Thuận Toàn 74 7 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. . | Splitting Long or Ill-formed Input for Robust Spoken-language Translation Osamu FURUSE 1 Setsuo YAMADA Kazuhide YAMAMOTO ATR Interpreting Telecommunications Research Laboratories 2-2 Hikaridai Seika-cho Soraku-gun Kyoto 619-0288 Japan furuseỗcslab.keel.ntt.co.jp syamada yamamoto @itl.atr.co.jp Abstract This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages 1 elimination of null outputs 2 splitting of utterances into sentences and 3 robust translation of erroneous speech recognition results. 1 Introduction A spoken-language translation system requires the ability to treat long or ill-formed input. An utterance as input of a spoken-language translation system is not always one well-formed sentence. Also when treating an utterance in speech translation the speech recognition result which is the input of the translation component might be corrupted even though the input utterance is well-formed. Such a misrecognized result can cause a parsing failure and consequently no translation output would be produced. Furthermore we cannot expect that a speech recognition result includes punctuation marks such as a comma or a period between words which are useful information for parsing. 1 As a solution for treating long input long-sentence splitting techniques such as that of Current affiliation is NTT Communication Science Laboratories. 1 Punctuation marks are not used

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Language-independent Compound Splitting with Morphological Operations"

Báo cáo khoa học: "Instance Splitting Strategies for Dependency Relation-based Information Extraction"

Báo cáo khoa học: "Splitting Complex Temporal Questions for Question Answering systems"

Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation"

Báo cáo khoa học: "Target splitting in radiation therapy for lung cancer: further developments and exemplary treatment plans"

Báo cáo toán học: "The Degree of the Splitting Field of a Random Polynomial over a Finite Field"

Báo cáo toán học: "Splitting Numbers of Grids Dwight Duﬀus"

Báo cáo hóa học: " Binding Energy and Spin-Orbit Splitting of a Hydrogenic Donor Impurity in AlGaN/GaN Triangle-Shaped Potential Quantum Well"

Báo cáo hóa học: " Erratum to: Binding Energy and Spin-Orbit Splitting of a Hydrogenic Donor Impurity in AlGaN/GaN Triangle-Shaped Potential Quantum Well"

Báo cáo hóa học: " Fine Splitting of Electron States in Silicon Nanocrystal with a Hydrogen-like Shallow Donor"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.