The SIGN(A, B) is the FORTRAN intrinsic function delivers the magnitude of A with the sign of B. For example, SIGN(-3.0,2.0) has the value 3.0. The problematic case is when the second argument is zero, because, on platforms that support IEEE arithmetic, zero is actually a signed number. There is a positive zero and a negative zero.
In FORTRAN 90, the processor was required always to deliver a positive result for SIGN(A, B)
if B was zero. Nevertheless, in FORTRAN 95, the processor is allowed to do the correct thing
and deliver ABS(A) when B is a positive zero and -ABS(A) when B is a negative zero.
This change in the specification becomes apparent only when B is of type real, and is zero,
and the processor is capable of distinguishing between positive and negative zero,
and B is negative real zero. Then SIGN delivers a negative result where, under FORTRAN 90
rules, it used to return a positive result.
This change may be especially sensitive for the ice model, so we overwrite the intrinsinc
function with our own function simply performing :
IF( B >= 0.e0 ) THEN ; SIGN(A,B) = ABS(A)
ELSE ; SIGN(A,B) =-ABS(A)
This feature can be found in lib_fortran.F90 module and is effective when key_ nosignedzero is defined. We use a CPP key as the overwritting of a intrinsic function can present performance issues with some computers/compilers.
The numerical reproducibility of simulations on distributed memory parallel computers is a critical issue. In particular, within NEMO global summation of distributed arrays is most susceptible to rounding errors, and their propagation and accumulation cause uncertainty in final simulation reproducibility on different numbers of processors. To avoid so, based on He and Ding  review of different technics, we use a so called self-compensated summation method. The idea is to estimate the roundoff error, store it in a buffer, and then add it back in the next addition.
Suppose we need to calculate . The following algorithm will allow to split the sum in two ( and ) with exactly the same rounding errors as the sum performed all at once.
The default method of communicating values across the north-fold in distributed memory applications (key_ mpp_mpi) uses a MPI_ALLGATHER function to exchange values from each processing region in the northern row with every other processing region in the northern row. This enables a global width array containing the top 4 rows to be collated on every northern row processor and then folded with a simple algorithm. Although conceptually simple, this "All to All" communication will hamper performance scalability for large numbers of northern row processors. From version 3.4 onwards an alternative method is available which only performs direct "Peer to Peer" communications between each processor and its immediate "neighbours" across the fold line. This is achieved by using the default MPI_ALLGATHER method during initialisation to help identify the "active" neighbours. Stored lists of these neighbours are then used in all subsequent north-fold exchanges to restrict exchanges to those between associated regions. The collated global width array for each region is thus only partially filled but is guaranteed to be set at all the locations actually required by each individual for the fold operation. This alternative method should give identical results to the default ALLGATHER method and is recommended for large values of jpni. The new method is activated by setting ln_nnogather to be true (nammpp). The reproducibility of results using the two methods should be confirmed for each new, non-reference configuration.
Gurvan Madec and the NEMO Team
NEMO European Consortium2017-02-17