TAILIEUCHUNG - Báo cáo khoa học: "DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS"

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. | DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT T Bell Laboratories 600 Mountain Ave. Murray Hill NJ 07974 USA pereira@ Naftali Tishby Dept of Computer Science Hebrew University Jerusalem 91904 Israel tishby@ . il Lillian Lee Dept of Computer Science Cornell University Ithaca NY 14850 USA llee@ Abstract We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters as the annealing parameter increases existing clusters become unstable and subdivide yielding a hierarchical soft clustering of the data. Clusters are used as the basis for class models of word coocurrence and the models evaluated with respect to held-out test data. INTRODUCTION Methods for automatically classifying words according to their contexts of use have both scientific and practical interest. The scientific questions arise in connection to distributional views of linguistic particularly lexical structure and also in relation to the question of lexical acquisition both from psychological and computational learning perspectives. From the practical point of view word classification addresses questions of data sparseness and generalization in statistical language models particularly models for deciding among alternative analyses proposed by a grammar. It is well known that a simple tabulation of frequencies of certain words participating in certain configurations for example of .

Thanh Thuận 60 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Distributional Similarity Models: Clustering Neighbors"

8 70 0

Báo cáo khoa học: "DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS"

8 55 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461847 55

Giới thiệu :Lập trình mã nguồn mở

14 22518 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10865 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10029 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9490 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8243 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8206 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6646 253

Vật lý hạt cơ bản (1)

29 5755 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 262 0 20-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 20-04-2024

extremetech Hacking Firefox phần 7

46 186 0 20-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 20-04-2024

Công nghiệp gang thép Việt Nam : Một giai đoạn phát triển và chuyển đổi chính sách mới part 5

6 193 0 20-04-2024

MySQL Database Usage & Administration PHẦN 7

37 154 0 20-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 20-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 175 2 20-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 136 0 20-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 20-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5601 1327

Ebook Chào con ba mẹ đã sẵn sàng

112 3752 1229

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8243 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5255 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3473 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10865 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3670 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4024 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4100 478