TAILIEUCHUNG - Báo cáo khoa học: "Development of a Stemming Algorithm"

Institute of Technology, Cambridge, Massachusetts 02139 A stemming algorithm, a procedure to reduce all words with the same stem to a common form, is useful in many areas of computational linguistics and information-retrieval work. While the form of the algorithm varies with its application, certain linguistic problems are common to any stemming procedure. | Mechanical Translation and Computational Linguistics nos. 1 and 2 March and June 1968 Development of a Stemming Algorithm by Julie Beth Lovins t Electronic Systems Laboratory Massachusetts Institute of Technology. Cambridge Massachusetts 02139 A stemming algorithm a procedure to reduce all words with the same stem to a common form is useful in many areas of computational linguistics and information-retrieval work. While the form of the algorithm varies with its application certain linguistic problems are common to any stemming procedure. As a basis for evaluation of previous attempts to deal with these problems this paper first discusses the theoretical and practical attributes of stemming algorithms. Then a new version of a context-sensitive longest-match stemming algorithm for English is proposed though developed for use in a library information transfer system it is of general application. A major linguistic problem in stemming variation in spelling of stems is discussed in some detail and several feasible programmed solutions are outlined along with sample results of one of these methods. I. Introduction A stemming algorithm is a computational procedure which reduces all words with the same root or if prefixes are left untouched the same stem to a common form usually by stripping each word of its derivational and inflectional suffixes. Researchers in many areas of computational linguistics and information retrieval find this a desirable step but for varying reasons. In automated morphological analysis the root of a word may be of less immediate interest than its suffixes which can be used as clues to grammatical structure. See . Earl 2 3 and Resnikoff and Dolby 6 . This field has also been reported on by S. Silver and M. Lott Machine Translation Project University of California Berkeley personal communication . At the other extreme what suffixes are found may be subsidiary to the problem of removing them consistently enough to obtain sets of exactly

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.