TAILIEUCHUNG - Báo cáo khoa học: "Native Language Detection with Tree Substitution Grammars"

We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. | Native Language Detection with Tree Substitution Grammars Ben Swanson Brown University chonger@ Eugene Charniak Brown University ec@ Abstract We investigate the potential of Tree Substitution Grammars as a source of features for native language detection the task of inferring an author s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set. 1 Introduction The correlation between a person s native language L1 and aspects of their writing in a second language L2 can be exploited to predict L1 label given L2 text. The International Corpus of Learner English Granger et al 2002 or ICLE is a large set of English student essays annotated with L1 labels that allows us to bring the power of supervised machine learning techniques to bear on this task. In this work we explore the possibility of automatically induced Tree Substitution Grammar TSG rules as features for a logistic regression model1 trained to predict these L1 labels. Automatic TSG induction is made difficult by the exponential number of possible TSG rules given a corpus. This is an active area of research with two distinct effective solutions. The first uses a nonparametric Bayesian model to handle the large number 1 . Maximum Entropy Model 193 of rules Cohn and Blunsom 2010 while the second is inspired by tree kernel methods and extracts common subtrees from pairs of parse trees Sangati and Zuidema 2011 . While both are effective we show that the Bayesian method of TSG induction produces superior features and achieves a new best result at the task of native language detection. 2 Related Work Native Language Detection Work in .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.