TAILIEUCHUNG - Báo cáo khoa học: "GRAMMATICAL AN ALYSIS BY COMPUTER OF THE LANCASTER OSLO/BERGEN (LOB) CORPUS OF BRITISH ENGLISH TEXTS."

Research has been under way at the Unit for Computer Research on the ~hglish Language at the University of Lancaster, England, to develop a suite of computer programs which provide a detailed grammatical analysis of the LOB corpus, a collection of about 1 million words of British English texts available in machine readable form. The first phrase of the pruject, completed in September 1983, produced a grammatically annotated version of the corpus giving a tag showing the word class of each word token. . | GRAMMATICAL ANALYSIS BY COMPUTER OF THE LANCASTER-OSLO BERGEN LOB CORPUS OF BRITISH ENGLISH TEXTS. Andrew David Beale Unit for Computer Research on the Ehglish Language Bowland College University of Lancaster Bailrigg Lancaster Ehgland LAI AYT. ABSTRACT Research has been under way at the Unit for Computer Research on the Ehglish Language at the University of Lancaster Ehgland to develop a suite of computer programs which provide a detailed grammatical analysis of the LOB corpus a collection of about 1 million words of British Ehglish texts available in machine readable form. The first phrase of the project completed in September 1983 produced a grammatically annotated version of the corpus giving a tag showing the word class of each word token. Over 93 per cent of the word tags were correctly selected by using a matrix of tag pair probabilities and this figure was upgraded by a further 3 per cent by retagging problematic strings of words prior to disambiguation and by altering the probability weightings for sequences of three tags. Hie remaining 3 to A per cent were corrected by a human post-editor. The system was originally designed to run in batch mode over the corpus but we have recently modified procedures to run interactively for sample sentences typed in by a user at a terminal. We are currently extending the word tag set and improving the word tagging procedures to further reduce manual intervention. A similar probabilistic system is being developed for phrase and clause tagging. THE STRUCTURE AND PURPOSE OF THE LOB CORPUS. The LOB Corpus Johansson Leech and Goodluck 1978 like its American Ehglish counterpart the Brown Corpus Kụcéra and Francis 196A Hauge and Hofland 1978 is a collection of 500 samples of British Ehgiish texts each containing about 2 000 word tokens. The samples are representations of 15 different text categories A. Press Reportage B. Press Editorial c. Press Reviews D. Religion E. Stills and Hobbies F. Popular Lore G. Belles Lettres .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.