Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Unified Tagging Approach to Text Normalization"

Khắc Thành 50 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘informally inputted’ text into the canonical form, by eliminating ‘noises’ in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. | A Unified Tagging Approach to Text Normalization Conghui Zhu Jie Tang Hang Li Harbin Institute of Technology Department of Computer Science Microsoft Research Asia Harbin China Tsinghua University China Beijing China chzhu@mtlab.hit.edu.cn jietang@tsinghua.edu.cn hangli@microsoft.com Hwee Tou Ng Department of Computer Science National University of Singapore Singapore nght@comp.nus.edu.sg Tiejun Zhao Harbin Institute of Technology Harbin China tjzhao@mtlab.hit.edu.cn Abstract This paper addresses the issue of text normalization an important yet often overlooked problem in natural language processing. By text normalization we mean converting informally inputted text into the canonical form by eliminating noises in the text and detecting paragraph and sentence boundaries in the text. Previously text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields CRF . The paper shows that with the introduction of a small set of tags most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning show that the proposed method significantly outperforms the approach of using cascaded models and that of employing independent models. 1 Introduction More and more informally inputted text data becomes available to natural language processing 688 such as raw text data in emails newsgroups forums and blogs. Consequently how to effectively process the data and make it suitable for natural language processing becomes a challenging issue. This is because informally inputted text data is usually very noisy and is not properly segmented. For example it may contain extra line breaks extra spaces and extra punctuation

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A Unified Graph Model for Sentence-based Opinion Retrieval"

Báo cáo khoa học: "A Unified Tagging Approach to Text Normalization"

Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics"

Báo cáo khoa học: "A Unified Statistical Model for the Identification of English BaseNP"

Báo cáo khoa hoc:" Exploring the bases for a mixed reality stroke rehabilitation system, Part I: A unified approach for representing action, quantitative evaluation, and interactive feedback"

báo cáo khoa học:" Health-Related Quality of Life in Parkinson disease: Correlation between Health Utilities Index III and Unified Parkinson’s Disease Rating Scale (UPDRS) in U.S. male veterans"

Báo cáo y học: "A unified framework of immunological and epidemiological dynamics for the spread of viral infections in a simple network-based population"

Báo cáo hóa học: " Exploring the bases for a mixed reality stroke rehabilitation system, Part I: A unified approach for representing action, quantitative evaluation, and interactive feedback"

báo cáo hóa học:" Health-Related Quality of Life in Parkinson disease: Correlation between Health Utilities Index III and Unified Parkinson’s Disease Rating Scale (UPDRS) in U.S. male veterans"

Báo cáo hóa học: " Research Article Integral Means Inequalities for Fractional Derivatives of a Unified Subclass of Prestarlike Functions with Negative Coefficients"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.