Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining"

Thục Ðào 58 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difﬁcult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. | Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining Ariya Rastrow Mark Dredze Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language and Speech Processing Johns Hopkins University Baltimore MD uSa ariya mdredze khudanpur @jhu.edu Abstract Long-span features such as syntax can improve language models for tasks such as speech recognition and machine translation. However these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work we propose substructure sharing which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition we obtain significant speed improvements with both N-best and hill climbing rescoring and show that up-training leads to WER reduction. 1 Introduction Language models LM are crucial components in tasks that require the generation of coherent natural language text such as automatic speech recognition ASR and machine translation MT . While traditional LMs use word n-grams where the n 1 previous words predict the next word newer models integrate long-span information in making decisions. For example incorporating long-distance dependencies and syntactic structure can help the LM better predict words by complementing the predictive power of n-grams Chelba and Jelinek 2000 Collins et al. 2005 Filimonov and Harper 2009 Kuo et al. 2009 . 175 The long-distance dependencies can be modeled in either a generative or a discriminative framework. Discriminative models which directly distinguish correct from incorrect hypothesis are particularly attractive because they allow the inclusion of arbitrary features .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection"

Báo cáo khoa học: "Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation"

Báo cáo khoa học: "Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining"

Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection"

Báo cáo khoa học: "A Discriminative Hierarchical Model for Fast Coreference at Large Scale"

Báo cáo khoa học: "Fast Online Lexicon Learning for Grounded Language Acquisition"

Báo cáo khoa học: "Temporal information processing of a new language: fast porting with minimal resources"

Báo cáo khoa học: "Jointly optimizing a two-step conditional random ﬁeld model for machine transliteration and its fast decoding algorithm"

Báo cáo khoa học: "Fast Consensus Decoding over Translation Forests"

Báo cáo khoa học: "A Fast and Accurate Method for Approximate String Search"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.