Class LongElementaryModMath

Direct Known Subclasses:

public class LongElementaryModMath extends Object
Elementary modulo arithmetic functions for long data.

Modular addition and subtraction are trivial, when the modulus is less than 263 and overflow can be detected easily.

Modular multiplication is more complicated, and since it is usually the single most time consuming operation in the whole program execution, the very core of the Number Theoretic Transform (NTT), it should be carefully optimized.

The algorithm for multiplying two longs and taking the remainder is not entirely obvious. The basic problem is to get the full 128-bit result of multiplying two 64-bit integers. It would be possible to do this by splitting the arguments to high and low 32-bit words and performing four multiplications. The performance of this solution would be not very good.

Another approach is to use longs only for getting the lowest 64 bits of the result. Casting the operands to double and multiplying as floating-point numbers, we can get the highest (roughly) 52 bits of the result. However since only 116 bits can be acquired this way, it would be possible to only use 58 bits in each of the multiplication operands (not the full 64 or 63 bits). Furthermore, round-off errors in the floating-point multiplications, as allowed by the IEEE specification, actually prevent getting even 52 of the top bits accurately, and actually only 57 bits can be used in the multiplication operands. This is the approach chosen in this implementation.

The first observation is that since the modulus is practically constant, it should be more efficient to calculate (once) the inverse of the modulus, and then subsequently multiply by the inverse modulus instead of dividing by the modulus.

The second observation is that to get the remainder of the division, we don't necessarily need the actual result of the division (we just want the remainder). So, we should discard the topmost 50 bits of the full 114-bit result whenever possible, to save a few operations.

The basic approach is to get an approximation of a * b / modulus (using floating-point operands, that is doubles). The approximation should be within +1 or -1 of the correct result. We first calculate a * b - approximateDivision * modulus to get the initial remainder. This calculation can use the lowest 64 bits only and is done using longs. It is enough to use a double to do the approximate division, as it eliminates at least 51 bits from the top of the 114-bit multiplication result, leaving at most 63 bits in the remainder. The calculation result - approximateDivision * modulus must then be done once more to reduce the remainder since the original multiplication operands are only 57-bit numbers. The second reduction reduces the results to the correct value ±modulus. It is then easy to detect the case when the approximate division was off by one (and the remainder is ±modulus off) as the final step of the algorithm.

Mikko Tommila