Konuşmacılar
Açıklama
Safety-critical embedded computer systems employ fault tolerance techniques for hardware errors to deal with extreme environmental conditions. While software approaches maintain reliable executions with replications, redundant codes induce significant performance overheads. Embedded systems with GPU accelerators utilized for safety-critical demanding computations require both performance and reliability. Therefore, it becomes important to evaluate the behavior of redundant executions and make a design choice. In this paper, we implement software-based redundancy techniques for the CUDA programs from a safety-critical domain targeting the GPU resources on the NVIDIA Jetson Xavier NX embedded device and perform a performance-reliability tradeoff analysis for alternative execution scenarios.