**TAILIEUCHUNG - Báo cáo khoa học: "A Model of Lexical Attraction and Repulsion*"**

This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as conversational speech, reveals that the "attraction" between words decays exponentially, while stylistic and syntactic contraints create a "repulsion" between words that discourages close co-occurrence. W e show that these characteristics are well described by simple mixture. | A Model of Lexical Attraction and Repulsion Doug Beeferman Adam Berger John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 USA dougb aberger lafferty Abstract This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text as well as conversational speech reveals that the attraction between words decays exponentially while stylistic and syntactic contraints create a repulsion between words that discourages close co-occurrence. We show that these characteristics are well described by simple mixture models based on two-stage exponential distributions which can be trained using the EM algorithm. The resulting distance distributions can then be incorporated as penalizing features in an exponential language model. 1 Introduction One of the fundamental characteristics of language viewed as a stochastic process is that it is highly nonstationary. Throughout a written document and during the course of spoken conversation the topic evolves effecting local statistics on word occurrences. The standard trigram model disregards this nonstationarity as does any stochastic grammar which assigns probabilities to sentences in a contextindependent fashion. Research supported in part by NSF grant IRI-9314969 DARPA AASERT award DAAH04-95-1-0475 and the ATR Interpreting Telecommunications Research Laboratories. Stationary models are used to describe such a dynamic source for at least two reasons. The first is convenience stationary models require a relatively small amount of computation to train and to apply. The second is ignorance we know so little about how to model effectively the nonstationary characteristics of language

Thúy Ngà 55 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difﬁculty of Texts for FFL"

9 72 0

**Báo cáo khoa học: "A Model of Lexical Attraction and Repulsion*"**

8 39 0

Báo cáo khoa học: "Bootstrapping a Uniﬁed Model of Lexical and Phonetic Acquisition"

10 53 0

Báo cáo khoa học: "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation"

10 55 0

Báo cáo khoa học: "Lexical transfer using a vector-space model"

7 55 0

Báo cáo khoa học: "Chronometric Studies of Lexical Ambiguity Resolution "

4 78 0

Báo cáo khoa học: "COMPUTATIONAL PLEXITY AND LEXICAL FUNCTIONAL GRAMMAR"

6 80 0

Báo cáo khoa học: "Controlling Lexical Substitution in Computer Text Generation"

4 47 0

Báo cáo khoa học: "LEXICAL SEMANTICS IN HUMAN-COMPUTER COMMUNICATION"

4 117 0

Báo cáo khoa học: "LEXICAL KNOWLEDGE BASES"

2 92 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462090 59

Giới thiệu :Lập trình mã nguồn mở

14 23884 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11133 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10380 459

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9659 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8678 1151

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8365 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7949 2251

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7001 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6817 1619

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Management and Services Part 1

10 192 1 05-07-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 176 1 05-07-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 169 5 05-07-2024

Tự học thổi sáo và ngâm thơ part 4

11 176 1 05-07-2024

The Constituents of Medicinal Plants

185 142 0 05-07-2024

ĐỀ THI THỬ ĐH NĂM 2011 MÔN VẬT LÍ _ ĐỀ SỐ 101

7 122 0 05-07-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 205 2 05-07-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 122 0 05-07-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 126 0 05-07-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 140 0 05-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7949 2251

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6817 1619

Ebook Chào con ba mẹ đã sẵn sàng

112 4055 1306

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5762 1206

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8678 1151

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3662 668

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3854 602

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4424 548

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11133 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4309 486