TAILIEUCHUNG - Báo cáo khoa học: "Multilingual Text Processing in a Two-Byte Code"

National and international standards committees are now discussing a two-byte code for multilingual information processing. This provides for 65,536 separate character and control codes, enough to make permanent code assiguments for all the charanters of ell national alphabets of the world, and also to include Chinese/Japanese characters. This paper discusses the kinds of flexibility required to handle both Roman and non-Roman . It is crucial to separate information units (codes) from graphic forms, to maximize processing p ower, Comparing alphabets around the world, we find the graphic devices (letters, digraphs, accent marks, punctuation, spacing, etc.) represent a very. | Multilingual Text Processing in a Two-Byte Code Lloyd B. Anderson Ecological Linguistics 316 A St. ỗ. E. Washington D. c. 20003 ABSTRACT National and international standards committees are now discussing a two-byte code for multilingual information processing. This provides for 65 536 separate character and control codes enough to make permanent code assignments for all the characters of all national alphabets of the world and also to include Chinese Japanese characters. This paper discusses the kinds of flexibility required to handle both Roman and non-Roman alphabets. It is crucial to separate Information units codes from graphic forms to maximize processing power. Comparing alphabets around the world we find that the graphic devices letters digraphs accent marks punctuation spacing etc. represent a very limited number of information units. It is possible to arrange alphabet codes to provide transliteration equivalence the best of three solutions compared as a framework for code assignments. Information vs. Form. In developing proposals far codes in information processing the most Important decisions are the choices of what to code. In a proposal for a multilingual two-byte code Xerox Corporation has made explicit a principle which we can state precisely as follows Basic codes Stand for Independently functioning information units not for visual forms The choice of type font presence or absence of serifs and variations like boldface Italics or underlining are matters of form. Such choices are normally made once for spans at least as long as one word. We do not use ComPLeX mIXturEs but consistent strings like this THIS this or THIS. By assigning the same basic code to variations of a single letter as a a A a all variants will automatically be alphabetized the same way which is as it should be. The choice of variant farms Is specified by supplementary looks information. The capitalization of first letters of sentences proper names ar nouns is a kind of punctuation.

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.