TAILIEUCHUNG - Parallel Programming: for Multicore and Cluster Systems- P3

Parallel Programming: for Multicore and Cluster Systems- P3: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | 10 2 Parallel Computer Architecture functional unit. But using even more functional units provides little additional gain 35 99 because of dependencies between instructions and branching of control flow. 4. Parallelism at process or thread level The three techniques described so far assume a single sequential control flow which is provided by the compiler and which determines the execution order if there are dependencies between instructions. For the programmer this has the advantage that a sequential programming language can be used nevertheless leading to a parallel execution of instructions. However the degree of parallelism obtained by pipelining and multiple functional units is limited. This limit has already been reached for some time for typical processors. But more and more transistors are available per processor chip according to Moore s law. This can be used to integrate larger caches on the chip. But the cache sizes cannot be arbitrarily increased either as larger caches lead to a larger access time see Sect. . An alternative approach to use the increasing number of transistors on a chip is to put multiple independent processor cores onto a single processor chip. This approach has been used for typical desktop processors since 2005. The resulting processor chips are called multicore processors. Each of the cores of a multicore processor must obtain a separate flow of control . parallel programming techniques must be used. The cores of a processor chip access the same memory and may even share caches. Therefore memory accesses of the cores must be coordinated. The coordination and synchronization techniques required are described in later chapters. A more detailed description of parallelism by multiple functional units can be found in 35 84 137 164 . Section describes techniques like simultaneous multithreading and multicore processors requiring an explicit specification of parallelism. Flynn s Taxonomy of Parallel Architectures Parallel .