|
Distributed memory reduction operations in presence of process desynchronization Presenter Mr Petar Marendic - ETRO, Vrije Universiteit Brussel [Email] Abstract Decades of exponential growth in computational power have resulted in computers that can perform several million billion arithmetic operations per second. However, problems like space weather and genome analysis demand even more powerful computers. To that end, researchers around the world are working on the development of an exascale computer that can perform a billion billion arithmetic operations per second. Due to technological limitations, modern supercomputers owe their power to ever increasing levels of parallelization. Because of this, the first exascale computer is estimated to consist of more than 10 million processors that will work in unison to solve the grand computational problems of the near future. However, with the high degree of parallelization come new problems. Writing programs that would efficiently use the available computational power has turned out to be a major challenge.
Modern high performance computing (HPC) applications typically use less than half of the available computational power of today's supercomputers. This is expected to drop to less than 10% in the case of exascale supercomputers. This massive performance shortfall is a result of a myriad factors each chipping away at the available computational resources. One such factor is process desynchronization at collective operations. This happens whenever some processors in a collective task start their work later than others. As a result, the task may take longer to execute as some processors have to wait on others before being able to complete their work. Up till now, process desynchronization has received little attention from the scientific community.
Our contribution to the HPC domain consists of a set of algorithms designed to mitigate the impact of process desynchronization at collective operation call sites. The need for such algorithms was spurred by our work in the design of image compositing algorithms for in-situ visualization of a space weather application, under the auspices of the Intel Exascience Lab in Leuven. Yet, the results of our work easily generalize to other application domains where process desynchronization is a common occurrence.
This dissertation proposes three new reduction algorithms, implemented in C++ and MPI, all robust to process desynchronization. Two of the algorithms use side information to pre-construct optimized reduction schedules, while the third one re-orders the reduction schedule at runtime. Our experimental evaluation, conducted on several HPC systems, has found that these algorithms constitute a notable improvement to the state-of-the-art. Short CV Master of Science in Computer Science, 2006, University of Split
|