TAILIEUCHUNG - Báo cáo khoa học: "Directed Replacement"

This paper introduces to the finite-state calculus a family of directed replace operators. In contrast to the simple replace expression, UPPER - LOWER,defined in K a r t t u n e n (1995), the new directed version, UPPER ©- LOWER, yields an unambiguous transducer if the lower language consists of a single string. It transduces the input string from left to right, making only the longest possible replacement at each point. A new type of replacement expression, UPPER @- PREFIX . . . SUFFIX, yields a transducer that inserts text around strings that are instances of UPPER. . | Directed Replacement Lauri Karttunen Rank Xerox Research Centre Grenoble 6 chemin de Maupertuis F-38240 MEYLAN FRANCE Abstract This paper introduces to the finite-state calculus a family of directed replace operators. In contrast to the simple replace expression UPPER - LOWER defined in Karttunen 1995 the new directed version UPPER fl- LOWER yields an unambiguous transducer if the lower language consists of a single string. It transduces the input string from left to right making only the longest possible replacement at each point. A new type of replacement expression UPPER a- PREFIX . SUFFIX yields a transducer that inserts text around strings that are instances of UPPER. The symbol . denotes the matching part of the input which itself remains unchanged. PREFIX and SUFFIX are regular expressions describing the insertions. Expressions of the type UPPER 3- PREFIX . . . SUFFIX may be used to compose a deterministic parser for a local grammar in the sense of Gross 1989 . Other useful applications of directed replacement include tokenization and filtering of text streams. 1 Introduction Transducers compiled from simple replace expressions UPPER - LOWER Karttunen 1995 Kempe and Karttunen 1996 are generally nondeterministic in the sense that they may yield multiple results even if the lower language consists of a single string. For example let US consider the transducer in Figure 1 representing a b I b I b a I a b a - 1The regular expression formalism and other notational conventions used in the paper are explained in the Appendix at the end. Figure 1 ab b ba aba- x. The four paths with aba on the upper side are 0 a 0 b x 2 a 0 0 a 0 b x 2 a 0 0 0 a x 1 b 0 2 a 0 and 0 a x 1 b 0 2 a 0 0 . The application of this transducer to the input aba produces four alternate results axa ax xa and x as shown in Figure 1 since there are four paths in the network that contain aba on the upper side with different strings on the lower side. This .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.