abstract
| - We propose a set of new Fortran reference implementations, based on an algorithm proposed by Kahan,
for the Level 1 BLAS routines *NRM2 that compute the Euclidean norm of a real or complex input vector.
The principal advantage of these routines over the current offerings is that, rather than losing accuracy
as the length of the vector increases, they generate results that are accurate to almost machine precision
for vectors of length N < Nmax where Nmax depends upon the precision of the floating point arithmetic
being used. In addition we make use of intrinsic modules, introduced in the latest Fortran standards, to
detect occurrences of non-finite numbers in the input data and return suitable values as well as setting
IEEE floating point status flags as appropriate. A set of C interface routines is also provided to allow simple,
portable access to the new routines.
To improve execution speed, we advocate a hybrid algorithm; a simple loop is used first and, only if IEEE
floating point exception flags signal, do we fall back on Kahan’s algorithm. Since most input vectors are
‘easy’, i.e., they do not require the sophistication of Kahan’s algorithm, the simple loop improves performance
while the use of compensated summation ensures high accuracy.
We also report on a comprehensive suite of test problems that has been developed to test both our new
implementation and existing codes for both accuracy and the appropriate settings of the IEEE arithmetic
status flags.
- We propose a set of new Fortran reference implementations, based on an algorithm proposed by Kahan,
for the Level 1 BLAS routines *NRM2 that compute the Euclidean norm of a real or complex input vector.
The principal advantage of these routines over the current offerings is that, rather than losing accuracy
as the length of the vector increases, they generate results that are accurate to almost machine precision
for vectors of length N < Nmax where Nmax depends upon the precision of the floating point arithmetic
being used. In addition we make use of intrinsic modules, introduced in the latest Fortran standards, to
detect occurrences of non-finite numbers in the input data and return suitable values as well as setting
IEEE floating point status flags as appropriate. A set of C interface routines is also provided to allow simple,
portable access to the new routines.
To improve execution speed, we advocate a hybrid algorithm; a simple loop is used first and, only if IEEE
floating point exception flags signal, do we fall back on Kahan’s algorithm. Since most input vectors are
‘easy’, i.e., they do not require the sophistication of Kahan’s algorithm, the simple loop improves performance
while the use of compensated summation ensures high accuracy.
We also report on a comprehensive suite of test problems that has been developed to test both our new
implementation and existing codes for both accuracy and the appropriate settings of the IEEE arithmetic
status flags.
|