TAILIEUCHUNG - Choosing seeds for semi-supervised graph based clustering

Though clustering algorithms have long history, nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network, electronic commerce, GIS, etc. Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc., which uses side information to boost the performance of clustering process, has received a great deal of attention. Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link). | Journal of Computer Science and Cybernetics 2019 373-384 DOI 1813-9663 35 4 14123 CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING CUONG LE1 VIET-VU VU1 LE THI KIEU OANH2 NGUYEN THI HAI YEN3 1 VNU Information Technology Institute Vietnam National University Hanoi 2 University of Economic and Technical Industries 3 Hanoi Procuratorate University vuvietvu@ Crossref Similarity Check Abstract. Though clustering algorithms have long history nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network electronic commerce GIS etc. Recently semi-supervised clustering for example semi-supervised K-Means semi-supervised DBSCAN semi-supervised graph-based clustering SSGC etc. which uses side information to boost the performance of clustering process has received a great deal of attention. Generally there are two forms of side information seed form labeled data and constraint form must-link cannot-link . By integrating information provided by the user or domain expert the semi-supervised clustering can produce expected results of users. In fact clustering results usually depend on side information provided so different side information will produce different results. In some cases the performance of clustering may decrease if the side information is not carefully chosen. This paper addresses the problem of choosing seeds for semi-supervised clustering especially for graph based clustering by seeding SSGC . The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from users. For this purpose we propose an active learning algorithm called SKMMM for the seeds collection task which identifies candidates to solicit users by using the K-Means and min-max algorithms. Experiments conducted on some real data sets from UCI and a real collected document data set show the effectiveness of our approach .

Uyển Nhi 102 12 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Promoting active learning and strategies for students at Ba Ria - Vung Tau university (BVU)

14 90 1

English language graduation thesis: Active learning-the impact of active learning on student performance and student's attitudes toward active learning in English class

60 94 8

Effects of active learning methodologies on the students’ emotions, self-efficacy beliefs and learning outcomes in a science distance learning course

11 98 0

Using active learning in motor control and matlab simulation

5 120 0

Enhancing moodle for active learning

9 60 3

Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

4 64 0

An intelligent tutoring system for supporting active learning: A case study on predictive parsing learning

23 42 3

Reverse active learning based atrous DenseNet for pathological image classification

15 76 1

Learning stations in primary education - A training tool that keeps students motivated and active

7 55 1

Efficient discovery of responses of proteins to compounds using active learning

11 37 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26718 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11376 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10568 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9855 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8907 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8519 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7945 1823

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7293 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 241 3 10-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 152 2 10-01-2025

Quy Trình Canh Tác Cây Bông Vải

8 170 3 10-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 246 8 10-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 164 1 10-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1078 2 10-01-2025

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 182 2 10-01-2025

Valve Selection Handbook - Fourth Edition

337 150 2 10-01-2025

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 152 1 10-01-2025

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining

101 150 1 10-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7945 1823

Ebook Chào con ba mẹ đã sẵn sàng

112 4439 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6363 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8907 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3859 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4780 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11376 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4535 490