TAILIEUCHUNG - Parallel Programming: for Multicore and Cluster Systems- P11

Parallel Programming: for Multicore and Cluster Systems- P11: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | 90 2 Parallel Computer Architecture the requested cache block sends it to both the directory controller and the requesting processor. Instead the owning processor could send the cache block to the directory controller and this one could forward the cache block to the requesting processor. Specify the details of this protocol. Exercise Consider the following sequence of memory accesses 2 3 11 16 21 13 64 48 19 11 3 22 4 27 6 11 Consider a cache of size 16 bytes. For the following configurations of the cache determine for each of the memory accesses in the sequence whether it leads to a cache hit or a cache miss. Show the resulting cache state that results after each access with the memory locations currently held in cache. Determine the resulting miss rate a direct-mapped cache with block size 1 b direct-mapped cache with block size 4 c two-way set-associative cache with block size 1 LRU replacement strategy d two-way set-associative cache with block size 4 LRU replacement strategy e fully associative cache with block size 1 LRU replacement f fully associative cache with block size 4 LRU replacement. Exercise Consider the MSI protocol from Fig. p. 79 for a bus-based system with three processors P1 P2 P3. Each processor has a direct-mapped cache. The following sequence of memory operations access two memory locations A and B which are mapped to the same cache line Processor Action P1 write A 4 P3 write B 8 P2 read A P3 read A P3 write A B P2 read A P1 read B P1 write B 10 We assume that the variables are initialized to A 3 and B 3 and that the caches are initially empty. For each memory access determine the cache state of each processor after the memory operations the content of the cache and the memory location for A and B the processor actions PrWr PrRd caused by the access and the bus operations BusRd BusRdEx flush caused by the MSI protocol. Exercise Consider the following memory accesses of three processors Pi P2 P3 Exercises for Chap. 2