Accuracy and Reproducibility (lib_fortran.F90)

Issues with intrinsinc SIGN function (key_ nosignedzero)

The SIGN(A, B) is the FORTRAN intrinsic function delivers the magnitude of A with the sign of B. For example, SIGN(-3.0,2.0) has the value 3.0. The problematic case is when the second argument is zero, because, on platforms that support IEEE arithmetic, zero is actually a signed number. There is a positive zero and a negative zero.

In FORTRAN 90, the processor was required always to deliver a positive result for SIGN(A, B) if B was zero. Nevertheless, in FORTRAN 95, the processor is allowed to do the correct thing and deliver ABS(A) when B is a positive zero and -ABS(A) when B is a negative zero. This change in the specification becomes apparent only when B is of type real, and is zero, and the processor is capable of distinguishing between positive and negative zero, and B is negative real zero. Then SIGN delivers a negative result where, under FORTRAN 90 rules, it used to return a positive result. This change may be especially sensitive for the ice model, so we overwrite the intrinsinc function with our own function simply performing :
IF( B >= 0.e0 ) THEN ; SIGN(A,B) = ABS(A)
This feature can be found in lib_fortran.F90 module and is effective when key_ nosignedzero is defined. We use a CPP key as the overwritting of a intrinsic function can present performance issues with some computers/compilers.

MPP reproducibility

The numerical reproducibility of simulations on distributed memory parallel computers is a critical issue. In particular, within NEMO global summation of distributed arrays is most susceptible to rounding errors, and their propagation and accumulation cause uncertainty in final simulation reproducibility on different numbers of processors. To avoid so, based on He and Ding [2001] review of different technics, we use a so called self-compensated summation method. The idea is to estimate the roundoff error, store it in a buffer, and then add it back in the next addition.

Suppose we need to calculate $ b = a_1 + a_2 + a_3$. The following algorithm will allow to split the sum in two ( $ sum_1 = a_{1} + a_{2}$ and $ b = sum_2 = sum_1 + a_3$) with exactly the same rounding errors as the sum performed all at once.

$\displaystyle sum_1   $ $\displaystyle = a_1 + a_2$    
$\displaystyle error_1$ $\displaystyle = a_2 + ( a_1 - sum_1 )$    
$\displaystyle sum_2   $ $\displaystyle = sum_1 + a_3 + error_1$    
$\displaystyle error_2$ $\displaystyle = a_3 + error_1 + ( sum_1 - sum_2 )$    
$\displaystyle b \qquad  $ $\displaystyle = sum_2$    

This feature can be found in lib_fortran.F90 module and is effective when key_ mpp_rep. In that case, all calls to glob_sum function (summation over the entire basin excluding duplicated rows and columns due to cyclic or north fold boundary condition as well as overlap MPP areas). Note this implementation may be sensitive to the optimization level.

MPP scalability

The default method of communicating values across the north-fold in distributed memory applications (key_ mpp_mpi) uses a MPI_ALLGATHER function to exchange values from each processing region in the northern row with every other processing region in the northern row. This enables a global width array containing the top 4 rows to be collated on every northern row processor and then folded with a simple algorithm. Although conceptually simple, this "All to All" communication will hamper performance scalability for large numbers of northern row processors. From version 3.4 onwards an alternative method is available which only performs direct "Peer to Peer" communications between each processor and its immediate "neighbours" across the fold line. This is achieved by using the default MPI_ALLGATHER method during initialisation to help identify the "active" neighbours. Stored lists of these neighbours are then used in all subsequent north-fold exchanges to restrict exchanges to those between associated regions. The collated global width array for each region is thus only partially filled but is guaranteed to be set at all the locations actually required by each individual for the fold operation. This alternative method should give identical results to the default ALLGATHER method and is recommended for large values of jpni. The new method is activated by setting ln_nnogather to be true (nammpp). The reproducibility of results using the two methods should be confirmed for each new, non-reference configuration.

Gurvan Madec and the NEMO Team
NEMO European Consortium2017-02-17