Finding, analysing and solving MPI communication bottlenecks in Earth System models (bibtex)
by , , , , , , ,
Abstract:
It is a matter of consensus that the ability to efficiently use current and future high performance computing systems is crucial for science, however, the reality is that the performance currently achieved by most of the parallel scientific applications is far from desired. Despite inter-process communication has already been a matter of study in many different works, it is a fact that their recommendations are not taken into account in most of computational model development processes, at least in the case of Earth Science. This work presents a methodology that aims to help scientists working with computational models using inter-process communication, to deal with the difficulties they face when trying to understand their applications behaviour. Following a series of steps that are presented here, both users and developers will learn how to identify performance issues by characterizing applications scalability, identifying which parts present a bad performance and understand the role that inter-process communication plays. In this work, the Nucleus for European Modelling of the Ocean (NEMO), the state-of-the-art European global ocean circulation model, will be used as an example of success. It is a community code widely used in Europe, to the extent that more than a hundred million core hours are used every year in experiments involving NEMO. In the analysis exercise, it is shown how to answer the questions of where, why and what is degrading model's scalability, and how this information can help developers in finding solutions that will mitigate their eventual issues. This document also demonstrates how performance analysis carried out with small size experiments, using limited resources, can lead to optimizations that will impact bigger experiments running on thousands of cores, making it easier to deal with the exascale challenge.
Reference:
Finding, analysing and solving MPI communication bottlenecks in Earth System models (Oriol Tintó Prims, Miguel Castrillo, Mario C. Acosta, Oriol Mula-Valls, Alicia Sanchez Lorente, Kim Serradell, Ana Cortés, Francisco J. Doblas-Reyes), In Journal of Computational Science, 2018.
Bibtex Entry:
@Article{	  prims.ea_2018,
  title		= "Finding, analysing and solving MPI communication
		  bottlenecks in Earth System models",
  journal	= "Journal of Computational Science",
  year		= "2018",
  issn		= "1877-7503",
  doi		= "https://doi.org/10.1016/j.jocs.2018.04.015",
  url		= "http://www.sciencedirect.com/science/article/pii/S1877750318304150",
  author	= "Oriol Tintó Prims and Miguel Castrillo and Mario C.
		  Acosta and Oriol Mula-Valls and Alicia Sanchez Lorente and
		  Kim Serradell and Ana Cortés and Francisco J.
		  Doblas-Reyes",
  keywords	= "Earth System modelling, Ocean modelling, Performance
		  analysis, Performance optimization, MPI optimization",
  abstract	= "It is a matter of consensus that the ability to
		  efficiently use current and future high performance
		  computing systems is crucial for science, however, the
		  reality is that the performance currently achieved by most
		  of the parallel scientific applications is far from
		  desired. Despite inter-process communication has already
		  been a matter of study in many different works, it is a
		  fact that their recommendations are not taken into account
		  in most of computational model development processes, at
		  least in the case of Earth Science. This work presents a
		  methodology that aims to help scientists working with
		  computational models using inter-process communication, to
		  deal with the difficulties they face when trying to
		  understand their applications behaviour. Following a series
		  of steps that are presented here, both users and developers
		  will learn how to identify performance issues by
		  characterizing applications scalability, identifying which
		  parts present a bad performance and understand the role
		  that inter-process communication plays. In this work, the
		  Nucleus for European Modelling of the Ocean (NEMO), the
		  state-of-the-art European global ocean circulation model,
		  will be used as an example of success. It is a community
		  code widely used in Europe, to the extent that more than a
		  hundred million core hours are used every year in
		  experiments involving NEMO. In the analysis exercise, it is
		  shown how to answer the questions of where, why and what is
		  degrading model's scalability, and how this information can
		  help developers in finding solutions that will mitigate
		  their eventual issues. This document also demonstrates how
		  performance analysis carried out with small size
		  experiments, using limited resources, can lead to
		  optimizations that will impact bigger experiments running
		  on thousands of cores, making it easier to deal with the
		  exascale challenge."
}
Powered by bibtexbrowser