Balancing Shared and Distributed Heaps on NUMA Architectures

Malak Aljabri, Hans-Wolfgang Loidl, Phil Trinder
To appear in TFP'14, the Symposium on Trends in Functional Programming 2014, Utrecht, Netherlands, May 2014.

Due to the varying latencies between memory banks, efficient shared memory access is challenging on modern NUMA architectures. This has a major impact on the shared memory performance of parallel programs, particularly those written in languages with automatic memory management. This paper presents a performance evaluation of distributed and shared heap implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine. The evaluation exposes bottlenecks in the shared-memory management, which results in limits to scalability beyond 25 out of the 48 cores.

We demonstrate that a hybrid system, \gumsmp, that combines both distributed and shared heap abstractions consistently outperforms the shared memory GHC implementation on seven benchmarks by a factor of 3.3 on average. Specifically, we show that the best results are obtained when sharing memory only within a single NUMA region, and using distributed memory system abstractions across the regions.

Available in: pdf

The benchmarks for this paper is: Benchmark tarball (.tar.bz2)

GPH Papers | GPH