Konuşmacılar
Açıklama
The softmax function is a type of activation function that is frequently used in the output layer of a neural network. In large neural networks, the softmax function can become computationally intensive and parallelization of it can provide performance benefits. In this work, we investigate the performance of a softmax function implementation on a shared- and distributed-memory non-uniform computing environments.
Then, we propose a sophisticated workload distribution scheme to improve the parallel performance.
The shared-memory system contains a heterogeneous asymmetric multicore Intel CPU with eight performance and eight efficiency cores. The distributed-memory system contains nodes with different types of quad-core CPUs. The experimental results demonstrate the significant performance advantage of our proposed workload distribution scheme compared to uniform workload distribution.