TAILIEUCHUNG - Choosing seeds for semi-supervised graph based clustering

Though clustering algorithms have long history, nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network, electronic commerce, GIS, etc. Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc., which uses side information to boost the performance of clustering process, has received a great deal of attention. Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link). | Journal of Computer Science and Cybernetics 2019 373-384 DOI 1813-9663 35 4 14123 CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING CUONG LE1 VIET-VU VU1 LE THI KIEU OANH2 NGUYEN THI HAI YEN3 1 VNU Information Technology Institute Vietnam National University Hanoi 2 University of Economic and Technical Industries 3 Hanoi Procuratorate University vuvietvu@ Crossref Similarity Check Abstract. Though clustering algorithms have long history nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network electronic commerce GIS etc. Recently semi-supervised clustering for example semi-supervised K-Means semi-supervised DBSCAN semi-supervised graph-based clustering SSGC etc. which uses side information to boost the performance of clustering process has received a great deal of attention. Generally there are two forms of side information seed form labeled data and constraint form must-link cannot-link . By integrating information provided by the user or domain expert the semi-supervised clustering can produce expected results of users. In fact clustering results usually depend on side information provided so different side information will produce different results. In some cases the performance of clustering may decrease if the side information is not carefully chosen. This paper addresses the problem of choosing seeds for semi-supervised clustering especially for graph based clustering by seeding SSGC . The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from users. For this purpose we propose an active learning algorithm called SKMMM for the seeds collection task which identifies candidates to solicit users by using the K-Means and min-max algorithms. Experiments conducted on some real data sets from UCI and a real collected document data set show the effectiveness of our approach .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.