TAILIEUCHUNG - Báo cáo khoa học: "Learning to Identify Fragmented Words in Spoken Discourse"

Disfluent speech adds to the difficulty of processing spoken language utterances. In this paper we concentrate on identifying one disfluency phenomenon: fragmented words. Our data, from the Spoken Dutch Corpus, samples nearly 45,000 sentences of human discourse, ranging from spontaneous chat to media broadcasts. We classify each lexical item in a sentence either as a completely or an incompletely uttered, . fragmented, word. | Learning to Identify Fragmented Words in Spoken Discourse Piroska Lendvai ILK Research Group Tilburg University The Netherlands Abstract Disfluent speech adds to the difficulty of processing spoken language utterances. In this paper we concentrate on identifying one disfluency phenomenon fragmented words. Our data from the Spoken Dutch Corpus samples nearly 45 000 sentences of human discourse ranging from spontaneous chat to media broadcasts. We classify each lexical item in a sentence either as a completely or an incompletely uttered . fragmented word. The task is carried out both by the IB1 and RIPPER machine learning algorithms trained on a variety of features with an extensive optimization strategy. Our best classifier has a F-score which is a significant improvement over the baseline. We discuss why memory-based learning has more success than rule induction in correctly classifying fragmented words. 1 Introduction Although human listeners are good at handling disfluent items self-corrections repetitions hesitations incompletely uttered words and the like cf. Shriberg 1994 in spoken language utterances these are likely to cause confusion when used as input to automatic natural language processing NLP systems resulting in poor humancomputer interaction Nakatani and Hirschberg 1994 Eklund and Shriberg 1998 . Detecting dis-fluent passages can help clean the spoken input and improve further processing such as parsing. By treating fragments we cover a considerable portion of the occurring disfluencies as incompletely uttered words often occur as part of a speaker s self-repair Bear et al. 1992 Nakatani and Hirschberg 1994 . Moreover if an incompletely pronounced item is identified we thereby determine the interruption point a central phenomenon in disfluencies Bear et al. 1992 Hee-man 1999 Shriberg et al. 2001 . The surroundings of this disfluency element are to be treated with greater care as before an interruption point there might be word

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.