Konuşmacılar
Açıklama
GPUs have emerged as the forefront technology in high performance systems utilized for artificial intelligence and scientific computations. A significant portion of computational tasks involves graph traversals, and computations associated with large graph traversal. Graphs containing billions of vertices have been employed to represent a wide array of real-world problems. Consequently, accelerating graph processing algorithms is a matter of great interest in many fields.
Kernel fusion has been proposed in previous studies as a means of fusing computational kernels in parallel jobs to optimize their performance. As kernel fusion's fundamental idea is sharing resource access to increase utilization by multiple jobs, it is even better suited for GPUs that run hundreds of jobs in parallel than CPUs that run a few jobs in parallel.
However, kernel fusion is itself a computationally expensive operation that incorporates control flow changes and manipulation of the order of execution for computational jobs. As such, kernel fusion is harder to utilize on GPUs compared to CPUs.
This work presents a streamlined kernel fusion framework for concurrent graph processing on GPUs. The framework facilitates the definition and implementation of graph jobs that can be executed in parallel on GPUs. The kernel fusion algorithm, integrated into the framework, automatically combines these jobs to enhance performance. Additionally, the framework introduces novel data handling structures and a meta-compiler that enables static polymorphism, allowing multiple different jobs to run efficiently as a job queue on the GPU without the performance overhead typically associated with polymorphism.
The effectiveness of the framework is evaluated by defining four common graph jobs and running up to 100 parallel instances of these jobs in homogeneous manners, both with and without kernel fusion, resulting in improvements ranging from 2% to 10% when employing kernel fusion.