TAILIEUCHUNG - Parallel Programming: for Multicore and Cluster Systems- P45

Parallel Programming: for Multicore and Cluster Systems- P45: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | Conjugate Gradient Method 423 processor performs the arithmetic operations locally and the vector xk 1 results in a blockwise distribution. 4 The axpy-operation gk 1 gk ak wk is computed analogously to computation step 3 and the result vector gk 1 is distributed in a blockwise way. 5 The scalar product yk 1 g 1 gk 1 is computed analogously to computation step 2 . The resulting scalar value ftk is computed by the root processor of a single-accumulation operation and then broadcasted to all other processors. 6 The axpy-operation dk 1 -gk 1 pkdk is computed analogously to computation step 3 . The result vector dk 1 has a blockwise distribution. Parallel Execution Time The parallel execution time of one iteration step of the CG method is the sum of the parallel execution times of the basic operations involved. We derive the parallel execution time for p processors n is the system size. It is assumed that n is a multiple of p. The parallel execution time of one axpy-operation is given by T axpy 2 n tOp p since each processor computes n p components and the computation of each component needs one multiplication and one addition. As in earlier sections the time for one arithmetic operation is denoted by top. The parallel execution time of a scalar product is n TscaLprod 2 p - 1 top Tacc p 1 Tsb p 1 where Tacc op p m denotes the communication time of a single-accumulation operation with reduction operation op on p processors and message size m. The computation of the local scalar products with n p components requires n p multiplications and n p - 1 additions. The distribution of the result of the parallel scalar product which is a scalar value . has size 1 needs the time of a single-broadcast operation Tsb p 1 . The matrix-vector multiplication needs time Tmath_vec_mult 2- p since each processor computes n p scalar products. The total computation time of the CG method is Tcg Tmb p n A p Tmath_vec_mult 2 TscaLprod 3 Taxpy 424 7 Algorithms for Systems of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.