Chapter 3. The FPU
38
] is a small double precision floating point arithmetic unit design, which is
suitable for such implementations as it focuses mainly in area optimization and cost
reduction. The design supports exception flags and overflow-underflow checks that are
given as outputs to be handled by the higher level design, in our case the processor.
The implementation of the double precision FPU is ours and it follows the published
work of Paschalakis et al[
], with some alterations. For example for the multiplier an
alteration was made so the instruction is completed in 1 cycle instead of ten with only
20% more area increase, but it performs much faster.
3.1
Floating point addition-subtraction
Adding and subtracting two floating point numbers is more complicated than the cor-
responding addition-subtraction performed for integer numbers.
This is due to the
representation of the floating point numbers. Suppose we wish to add two floating
point numbers, A and B. A has the form of A = ±S
A
∗ 2
E
A
and B has the form of
B = ±S
B
∗ 2
E
B
. In order to perform an addition between those two numbers the first
step is to modify the exponents so as to have the same value. This is done by calculating
the absolute difference |E
A
− E
B
| of the two exponents and adjusting the significand S
accordingly. This is possible because the significand is stored in binary and therefore it
can be divided by 2. The following example displays this adjustment.
A = 16 ∗ 2
7
B = 32 ∗ 2
6
The absolute difference of the two exponents is 1, so the smaller exponent needs to
adjust.
B = 32 ∗ 2
6
= 32 ∗ 2
−1
∗ 2
6
∗ 2
+1
= 16 ∗ 2
7
This is only possible because the significand will always be a multiple of 2 and therefore
a simple shift to the right by the corresponding amount will adjust the number. The
next step is to simply add or subtract the significands and normalize the result. This
procedure is standard but the actual implementations in hardware vary. The schematic