TAILIEUCHUNG - Data mining over large datasets using hadoop in cloud environment

Looping is avoided in retrieving a particular data from huge datasets and it consumes less amount of time for executing the data. HDFS file system is used to store the data after performing the map reduce operations and the execution time is decreased when the number of nodes gets increased. The performance analysis is tuned with the parameters such as the HBase Heap Memory and Caching Parameter. | ISSN:2249-5789 V Nappinna Lakshmi et al, International Journal of Computer Science & Communication Networks,Vol 3(2), 73-78 DATA MINING OVER LARGE DATASETS USING HADOOP IN CLOUD ENVIRONMENT lakshmi 1, N. Revathi2* 1 PG Scholar, 2Assistant Professor Department of Information Technology, Sri Venkateswara College of Engineering, Sriperumbudur – 602105, Chennai, INDIA. 1 Nappinnavenkat@ 2 revathi@* * Corresponding author Abstract- There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis a very popular Data Mining Algorithm is used in Amazon cloud environment by integrating it with Hadoop Ecosystem and Hbase. The objective is to store the data persistently along with the past history of the data set and performing the report analysis of those data set. The main aim of this system is to improve performance through parallelization of various operations such as loading the data, index building and evaluating the queries. Thus the performance analysis is done with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It runs on the top of the Hadoop. It consists of a single key with multiple values. Looping is avoided in retrieving a particular data from huge datasets and it consumes less amount of time for executing the data. HDFS file system is used to store the data after performing the map reduce operations and the execution time is decreased when the number of nodes gets increased. The performance analysis is tuned with the parameters such as the HBase Heap Memory and Caching Parameter. Keywords- HBase, Cloud computing, .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.