TAILIEUCHUNG - Báo cáo khoa học: " THE DISTRIBUTION OF WORD LENGTH IN TECHNICAL RUSSIAN"

IN the course of an analysis of several samples of technical Russian undertaken as part of a study in mechanical translation, a number of statistical data reflecting the structure of these samples were compiled. One of these, the distribution of word length, is presented here as Fig. | Mechanical Translation December 1954 pp. 38-40 THE DISTRIBUTION OF WORD LENGTH IN TECHNICAL RUSSIAN Anthony G. Oettinger Computation Laboratory Harvard University IN the course of an analysis of several samples of technical Russian undertaken as part of a study in mechanical translation a number of statistical data reflecting the structure of these samples were compiled. One of these the distribution of word length is presented here as Fig. 1. The theoretical interest of this distribution arises from the possibility of using it as a basis for an operational definition of words in printed texts. If texts are considered purely as sequences of symbols including the letters punctuation marks and space the resulting sequences are of a length which no practicable machine can manage. A study of the distribution of the number of symbols between pairs of successive symbols of certain classes would be one way to reveal structural characteristics of the text sequences potentially useful toward the definition of manageable and significant subsequences. The subsequences included between successive occurrences of letter pairs have not been investigated. Those included between successive pairs of periods exclamation points or question marks can be identified with the classical sentence and finally those included between successive pairs of punctuation marks or spaces can be identified with words. The length distribution of the latter subsequences has the desirable property not shared by the others of being concentrated at relatively low values of length and of having no elements exceeding a certain length Fig. 1 . Words defined in this fashion can readily be identified by a machine and they are of limited variety so that their listing in a dictionary is practicable. From the practical point of view the distribution is useful in planning input and storage facilities in experimental translating equipment. The samples used were relatively small and Fig. 1 should .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.