TAILIEUCHUNG - Báo cáo khoa học: "Using Machine Learning Techniques to Build a Comma Checker for Basque"

In this paper, we describe the research using machine learning techniques to build a comma checker to be integrated in a grammar checker for Basque. After several experiments, and trained with a little corpus of 100,000 words, the sys­ tem guesses correctly not placing com­ mas with a precision of 96% and a re­ call of 98%. It also gets a precision of 70% and a recall of 49% in the task of placing commas. Finally, we have shown that these results can be im­ proved using a bigger and a more ho­ mogeneous corpus to train, that is,. | Using Machine Learning Techniques to Build a Comma Checker for Basque Inaki Alegria Bertol Arrieta Arantza Diaz de Ilarraza Eli Izagirre Montse Maritxalar Computer Engineering Faculty. University of the Basque Country. Manuel de Lardizabal Pasealekua 1 20018 Donostia Basque Country Spain. acpalloi bertol jipdisaa jibizole jipmaanm @ Abstract In this paper we describe the research using machine learning techniques to build a comma checker to be integrated in a grammar checker for Basque. After several experiments and trained with a little corpus of 100 000 words the system guesses correctly not placing commas with a precision of 96 and a recall of 98 . It also gets a precision of 70 and a recall of 49 in the task of placing commas. Finally we have shown that these results can be improved using a bigger and a more homogeneous corpus to train that is a bigger corpus written by one unique author. 1 Introduction In the last years there have been many studies aimed at building a grammar checker for the Basque language Ansa et al. 2004 Diaz De Il-arraza et al. 2005 . These works have been focused mainly on building rule sets taking into account syntactic information extracted from the corpus automatically that detect some erroneous grammar forms. The research here presented wants to complement the earlier work by focusing on the style and the punctuation of the texts. To be precise we have experimented using machine learning techniques for the special case of the comma to evaluate their performance and to analyse the possibility of applying it in other tasks of the grammar checker. However developing a punctuation checker encounters one problem in particular the fact that the punctuation rules are not totally established. In general there is no problem when using the full stop the question mark or the exclamation mark. Santos 1998 highlights these marks are reliable punctuation marks while all the rest are unreliable. Errors related to the reli able ones putting or

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.