TAILIEUCHUNG - Parallel Programming: for Multicore and Cluster Systems- P42

Parallel Programming: for Multicore and Cluster Systems- P42: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | Direct Methods for Linear Systems with Banded Structure 393 for which the values of k-1 r k-1 -1 k-1 Uj bj cj yj from the previous step computed by a different processor are required. Thus there is a communication in each of the flog p steps with a message size of four values. After step N flog p processor Pi computes x. _ N xi _ yi bi Phase 3 Parallel substitution of cyclic reduction After the second phase the values X x q are already computed. In this phase each processor Pi i 1 . p computes the values Xj with j i - 1 q 1 . iq 1 in several steps according to Eq. . In step k k Q 1 . 0 the elements Xj j 2k . n with step size 2k 1 are computed. Processor P computes Xj with j div q 1 i for which the values X 1 X 1 q and X 1 X 1 q computed by processors P 1 and P 1 are needed. Figure illustrates the parallel algorithm for p 2 and n 8. Q steps log p steps Q steps x7 phase 1 phase 2 phase 3 Fig. Illustration of the parallel algorithm for the cyclic reduction for n 8 equations and p 2 processors. Each of the processors is responsible for q 4 equations we have Q 2. The first and the third phases of the computation have log q 2 steps. The second phase has log p 1 step. As recursive doubling is used in the second phase there are more components of the solution to be computed in the second phase compared with the computation shown in Fig. i i o i 2 O i 3 i 4 o i 5 i 6 o i 7 i 8 o k 0 394 7 Algorithms for Systems of Linear Equations Parallel Execution Time The execution time of the parallel algorithm can be modeled by the following runtime functions. Phase 1 executes Q log q log n log n - log p steps where in step k with 1 k Q each processor computes at most q 2k coefficient blocks of 4 values each. Each coefficient block requires 14 arithmetic operations according to Eq. . The computation time of phase 1 can therefore be estimated as T1 n p 14tOp q 14n tOp . k p Moreover each processor exchanges in each of the Q steps two messages of

TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.