TAILIEUCHUNG - Improving MapReduce Performance in Heterogeneous Environments

We extracted Mel-Frequency Cepstral Coefficients (MFCCs) features for this task. MFCC are short-term spectral-based features and have been widely used in speech recognition [13] and audio event classification. We ex- tracted 12MFCC coefficients from the original audio signal using a sliding window of 40ms at fixed intervals of 20ms. The number of training and testing frames for the different methods is shown in Table 1. Note that there is no need for unusual event training data for our approach. For the un- supervised HMM, there is no need for training data. The percentage of frames for unusual events in the test sequence is around | Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia Andy Konwinski Anthony D. Joseph Randy Katz Ion Stoica University of California Berkeley matei andyk adj randy stoica @ Abstract MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing data mining and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop s performance is closely tied to its task scheduler which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center such as Amazon s Elastic Compute Cloud EC2 . We show that Hadoop s scheduler can cause severe performance degradation in heterogeneous environments. We design a new scheduling algorithm Longest Approximate Time to End LATE that is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2. 1 Introduction Today s most popular computer applications are Internet services with millions of users. The sheer volume of data that these services work with has led to interest in parallel processing on commodity clusters. The leading example is Google which uses its MapReduce framework to process 20 petabytes of data per day 1 . Other Internet services such as e-commerce websites and social networks also cope with enormous volumes of data. These services generate clickstream data from millions of users every day which is a potential gold mine for understanding access patterns and increasing ad revenue. Furthermore for each user action a web application generates one or two orders of magnitude more .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.