TAILIEUCHUNG - Báo cáo khoa học: "Automatic Detection of Text Genre"

In information retrieval, genre classification could enable users to sort search results according to their immediate interests. People who go into a bookstore or library are not usually looking simply for information about a particular topic, but rather have requirements of genre as well: they are looking for scholarly articles about hypnotism, novels about the French Revolution, editorials about the supercollider, and so forth. If genre classification is so useful, why hasn't it figured much in computational linguistics before now? One important reason is that, up to now, the digitized corpora and collections which are the subject of much. | Automatic Detection of Text Genre Brett Kessler Geoffrey Nunberg Hinrich Schiitze Xerox Palo Alto Research Center Department of Linguistics 3333 Coyote Hill Road Stanford u niversity Palo Alto cA 94304 USA Stanford CA 94305-2150 USA email schuetze URL ftp . com pub qca papers genre Abstract As the text databases available to users become larger and more heterogeneous genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets which correlate with various surface cues and argue that genre detection based on surface cues is as successful as detection based on deeper structural properties. 1 Introduction Computational linguists have been concerned for the most part with two aspects of texts their structure and their content. That is. we consider texts on the one hand as formal objects and on the other as symbols with semantic or referential values. In this paper we want to consider texts from the point of view of genre that is. according to the various functional roles they play. Genre is necessarily a heterogeneous classificatory principle which is based among other things on the way a text was created the way it is distributed the register of language it uses and the kind of audience it is addressed to. For all its complexity this attribute can be extremely important for many of the core problems that computational linguists are concerned with. Parsing accuracy could be increased by taking genre into account for example certain object-less constructions occur only in recipes in English . Similarly for POS-tagging the frequency of uses of trend as a verb in the Journal of Commerce is 35 times higher than in Sociological Abstracts . In word-sense disambiguation many senses are largely restricted to texts of a particular style such as colloquial or formal for example the word pretty is far more .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.