Konuşmacılar
Açıklama
In modern multi-core architectures with distributed directory-based cache coherence, each memory address is overseen by a distributed directory unit, known as a Caching/Home Agent (CHA), that monitors cache line state and location. Neither the CHA nor core locations in a processor are directly exposed to the programmer. In this work, we firstly analyze and compare the methodologies for uncovering both the CHA and core topology of Intel Xeon Scalable processors, as well as the methods to reveal the mapping of memory addresses to CHAs. Leveraging the topology and the address mapping information, we investigate the impact of spatial proximity between communicating cores and CHAs on application performance, and propose a thread mapping heuristic that assigns threads to cores by considering cache coherence traffic. We evaluate our heuristic on an N-body simulation application that exhibits high amount of on-chip communication traffic. Our heuristic achieves a speedup of up to ~8% and with an average of 4.4% over compact placement.