TAILIEUCHUNG - Parallel Programming: for Multicore and Cluster Systems- P21

Parallel Programming: for Multicore and Cluster Systems- P21: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | 192 4 Performance Analysis of Parallel Programs Fig. Illustration of the parameters of the LogP model Figure illustrates the meaning of these parameters 33 . All parameters except P are measured in time units or as multiples of the machine cycle time. Furthermore it is assumed that the network has a finite capacity which means that between any pair of processors at most L g messages are allowed to be in transmission at any time. If a processor tries to send a message that would exceed this limit it is blocked until the message can be transmitted without exceeding the limit. The LogP model assumes that the processors exchange small messages that do not exceed a predefined size. Larger messages must be split into several smaller messages. The processors work asynchronously with each other. The latency of any single message cannot be predicted in advance but is bounded by L if there is no blocking because of the finite capacity. This includes that messages do not necessarily arrive in the same order in which they have been sent. The values of the parameters L o and g depend not only on the hardware characteristics of the network but also on the communication library and its implementation. The execution time of an algorithm in the LogP model is determined by the maximum of the execution times of the participating processors. An access by a processor Pi to a data element that is stored in the local memory of another processor P2 takes time 2 L 4 o half of this time is needed to bring the data element from P2 to P1 the other half is needed to bring the data element from P1 back to P2. A sequence of n messages can be transmitted in time L 2 o n - 1 g see Fig. . A drawback of the original LogP model is that it is based on the assumption that the messages are small and that only point-to-point messages are allowed. More complex communication patterns must be assembled from point-to-point messages. Fig. Transmission of a larger message as a sequence of n .