Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Reranking and Self-Training for Parser Adaptation"

Quang Trung 49 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too ﬁnely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard “Charniak parser” checks in at a labeled precisionrecall f -measure of 89.7% on the Penn WSJ test set, but only 82.9% on the test set. | Reranking and Self-Training for Parser Adaptation David McClosky Eugene Charniak and Mark Johnson Brown Laboratory for Linguistic Information Processing BLLIP Brown University Providence RI 02912 dmcc ec mj @cs.brown.edu Abstract Statistical parsers trained and tested on the Penn Wall Street Journal wsj treebank have shown vast improvements over the last 10 years. Much of this improvement however is based upon an ever-increasing number of features to be trained on typically the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard Charniak parser checks in at a labeled precisionrecall f-measure of 89.7 on the Penn WSJ test set but only 82.9 on the test set from the Brown treebank corpus. This paper should allay these fears. In particular we show that the reranking parser described in Charniak and Johnson 2005 improves performance of the parser on Brown to 85.2 . Furthermore use of the self-training techniques described in Mc-Closky et al. 2006 raise this to 87.8 an error reduction of 28 again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4 . 1 Introduction Modern statistical parsers require treebanks to train their parameters but their performance declines when one parses genres more distant from the training data s domain. Furthermore the treebanks required to train said parsers are expensive and difficult to produce. Naturally one of the goals of statistical parsing is to produce a broad-coverage parser which is relatively insensitive to textual domain. But the lack of corpora has led to a situation where much of the current work on parsing is performed on a single domain using training data from that domain the Wall Street Journal wsj section of the Penn Treebank Marcus et al. 1993 . Given the aforementioned costs it is unlikely that many .

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.