TAILIEUCHUNG - Báo cáo khoa học: "Robust PCFG-Based Generation using Automatically Acquired LFG Approximations"

We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations (Cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. We evaluate our approach using stringbased evaluation. We currently achieve coverage of , a BLEU score of and string accuracy of on the Penn-II WSJ Section 23 sentences of length ≤20. grammar for generation. Belz (2005) describes a method for building statistical generation models using an automatically created generation treebank for weather forecasts. . | Robust PCFG-Based Generation using Automatically Acquired LFG Approximations Aoife Cahill1 and Josef van Genabith1 2 1 National Centre for Language Technology NCLT School of Computing Dublin City University Dublin 9 Ireland 2 Center for Advanced Studies IBM Dublin Ireland acahill josef @ Abstract We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations Cahill et al. 2004 automatically extracted from treebanks maximising the probability of a tree given an f-structure. We evaluate our approach using stringbased evaluation. We currently achieve coverage of a BLEU score of and string accuracy of on the Penn-II WSJ Section 23 sentences of length 20. 1 Introduction Wide coverage grammars automatically extracted from treebanks are a corner-stone technology in state-of-the-art probabilistic parsing. They achieve robustness and coverage at a fraction of the development cost of hand-crafted grammars. It is surprising to note that to date such grammars do not usually figure in the complementary operation to parsing - natural language surface realisation. Research on statistical natural language surface realisation has taken three broad forms differing in where statistical information is applied in the generation process. Langkilde 2000 for example uses n-gram word statistics to rank alternative output strings from symbolic hand-crafted generators to select paths in parse forest representations. Bangalore and Rambow 2000 use n-gram word sequence statistics in a TAG-based generation model to rank output strings and additional statistical and symbolic resources at intermediate generation stages. Ratnaparkhi 2000 uses maximum entropy models to drive generation with word bigram or dependency representations taking into account unrealised semantic features. Valldal and Oepen 2005 present a discriminative disambiguation model using a hand-crafted HPSG grammar for generation. Belz 2005

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.