Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Fault Tolerant Computer Architecture-P9: For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore’s law into remarkable increases in performance. Recently, however, the bounty provided by Moore’s law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults | ERROR RECOVERY 69 Forward Error Recovery. During error-free execution most FER schemes incur a slight performance penalty for error detection. Because FER schemes cannot recover to a prior state they cannot commit any operation until it has been determined to be error-free. Effectively for systems with FER all operations are output operations and are subject to the output commit problem. Thus error detection is on the critical path for FER. When an error occurs FER incurs little additional performance penalty to correct it. Backward Error Recovery. During error-free execution most BER schemes incur a slight performance penalty for saving state. This penalty is a function of how often state is saved and how long it takes to save it. In the absence of output operations BER schemes can often take error detection off the critical path because even if an error is detected after the erroneous operation has been allowed to proceed the processor can still recover to a pre-error checkpoint. To overlap the latency of error detection requires pipelined checkpointing as described in When to Deallocate a Recovery Point from Section 3.1.2. When an error occurs BER incurs a relatively large penalty to restore the recovery point and replay the work since the recovery point that was lost. 3.2 MICROPROCESSOR CORES Both FER and BER approaches exist for microprocessor cores. 3.2.1 FER for Cores The only common FER scheme for an entire core is TMR. With three cores and a voter an error in a single core is corrected when the result of that core is outvoted by the other two cores. Within a core TMR can be applied to specific units although this is rare in commodity cores due to the hardware and power costs for TMR. A more common approach for FER within a core is the use of ECC. By protecting storage e.g. register file or a bus with ECC the core can correct errors without needing to restore a previous state. However even ECC may be infeasible in many situations because it is on the .