TAILIEUCHUNG - Báo cáo khoa học: "A Broad-Coverage Normalization System for Social Media Language"

Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. | A Broad-Coverage Normalization System for Social Media Language Fei Liu Fuliang Weng Xiao Jiang Research and Technology Center Robert Bosch LlC @ @ Abstract Social media language contains huge amount and wide variety of nonstandard tokens created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage . for any user-created nonstandard term the system should be able to restore the correct word within its top n output candidates. In this paper we propose a cognitively-driven normalization system that integrates different human perspectives in normalizing the nonstandard tokens including the enhanced letter transformation visual priming and string phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90 word-coverage across all data sets a 10 absolute increase compared to state-of-the-art the broad word-coverage can also successfully translate into message-level performance gain yielding 6 absolute increase compared to the best prior approach. 1 Introduction The amount of user generated content has increased drastically in the past few years driven by the prosperous development of the social media websites such as Twitter Facebook and Google . As of June 2011 Twitter has attracted over 300 million users and produces more than 2 billion tweets per week Twitter 2011 . In a broader sense Twitter messages SMS messages Facebook updates chat logs Emails etc. can all be considered as social text 1035 which is significantly different from the traditional news text due to the informal writing style and the conversational nature. The social text serves as a very valuable information source for many NLP applications such as the information .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.