Design and Synthesis of Single Precision Floating Point Division based on Newton-Raphson Algorithm on FPGA

. This paper describes a single precision floating point division based on Newton-Raphson computational division algorithm. The Newton-Raphson computational algorithm is implemented using 32-bit floating point multiplier and subtractor. The salient feature of this proposed design is that the module for computing mantissa in 32-floating point multiplier is designed using a 24-bit Vedic multiplication (Urdhva-triyakbhyam-sutra) technique. 32-bit floating point multiplier, designed using Vedic multiplication technique, yields a higher computational speed, hence, is efficiently used in floating point divider. Another important feature is the efficient use of device utilization parameters and reduced power consumption. An advantage of the Newton-Raphson algorithm is the higher versatility and precision. For representing 32-bit floating point numbers, IEEE 754 standard format is used. ISim simulator is used for simulation. The proposed floating point divider is designed using Verilog Hardware Description Language (HDL) and is verified on Xilinx Spartan 6 SP605 Evaluation Platform FPGA.


Introduction
The role of a reconfigurable processor in embedded system design has increased greatly from the past decades.Due to the advancement of field programmable gate array (FPGA), we have reached a point where the architecture of processors can be modified instantaneously.Reconfigurable computing processing provides very versatile high-speed computing.The enhanced feature of Spartan-6 voluntarily reduces the cost per logic cell designed.Newton-Raphson computational algorithm requires mathematical operations such as, multiplication and subtraction.Here, mathematical operations used to find the reciprocal of the denominator (D) and multiply that reciprocal by the numerator (N) to find the final quotient (Q).Newton-Raphson computational algorithm initializes with an approximation close to the final value of the quotient (Q) and produces twice as many digits of the final quotient after each iteration.An iterative process which is based on complex operation used for division in many signalprocessing algorithms, where not only precision to be maintained, but also the precision is to be maintained for very large data intervals and should be high for better operation.This can be achieved by the design and implementation of floating point division by using Newton-Raphson algorithms.Several different algorithms described in literature [1][2][3].The Newton-Raphson algorithm computes three multiplicative inverse at the same time to provide high throughput [4].To implement and design 32-bit floating point division based on Newton-Raphson computational algorithm, 32-bit floating point multiplier and 32-bit floating point subtraction modules are used [5,6].For efficient implementation of the floating point multiplier Vedic multiplication is used for calculating mantissa part [7,8].The format for representing 32-bit and 64-bit floating point numbers are provided by the IEEE 754 standard [9,10].IEEE 754 uses a fixed number of bits for representing the 32-bit floating point number.The representation format divides into three parts, i.e., sign (b), exponent (e) and the mantissa (s).Table 1 shows the structure for IEEE 754 formats and describes the single and double precision.In IEEE 754 Single precision format the mantissa is represented by 23 bits, exponent is represented by 8-bits and MSB corresponds to sign bit.The Sign of the floating point number depends on the sign bit or MSB.The number is positive when the MSB bit is 0 and negative when the MSB bit is 1.The formulation of the paper is as follows.Section 2 explains the architecture of the floating point multiplier using Vedic multiplication.Section 3 presents the description of 32-bit floating point subtractor.Section 4 describes the Newton-Raphson computational algorithm.Sections 5 presents the Simulation Results of floating point division using Newton-Raphson algorithm.The Conclusion and References are presented in the final section.

Floating point multiplier
In Figure 1 shows the complete architecture of proposed 32-bit floating point multiplier.This multiplier module is designed using a Vedic multiplication technique, where mantissa calculation is done using a 24x24 bit Vedic multiplier.The main purpose of using Vedic multiplier is to improve the overall performance of the 32-bit floating point multiplier.IEEE 754 format presents a fixed number of bits for representing the sign, exponent and mantissa.The inputs given to the floating point multiplier

Mantissa unit
In mantissa unit, for calculation of mantissa a 24x24 bit Vedic multiplier is used efficiently for higher throughput and computation.The lower bits of inputs, m1 (A[22-0]) and m2 (B[22-0]) are given to the 24-bit Vedic multiplier which produces 24-bit normalized output and should have leading one as their MSB.

Exponent unit
In Exponent unit, the exponent calculation is done by using ripple carry adder.The exponent is computed by providing inputs e1 (A[30-23]) and e2 (B[30 -23]) to the 8-bit ripple carry adder unit and result is biased to 127.The overflow and underflow cases are carefully handled.

Newton-Raphson division algorithm
Newton-Raphson computational division algorithm computes the multiplicative inverse, which is calculated by the iterative process.Then the calculated multiplicative inverse is multiplied to the dividend to compute the final quotient (Q).In this proposed design, Newton-Raphson computational division algorithm designed by using a 32-floating point multiplier module and subtractor module.In this division algorithm minimal of maximal relative error can be achieved by scaling the divisor (D) in the interval (0.5, 1).Scaling of the divisor is done by shifting operation.To produce a precise result more iterations are required.For this purpose, fast ICAET 2016 division algorithms are developed.The Newton-Raphson algorithm converges much faster for computing one iteration.Thus, several multiplication and subtraction operations needed to perform for the continual iteration process.Figure 3 shows the flowchart of floating point division using the Newton-Raphson computational algorithm in which two multiplier and a subtraction module are used to produce one iteration.Newton Raphson computational algorithm produced optimized result by computing three iterations in one cycle.More iterations are performed to refine the multiplicative inverse.The calculation of multiplicative inverse of the divisor is performed by equation (1), where Z i is multiplicative inverse at iteration i, as given below.
) Newton-Raphson division algorithm uses a complex initialization for computing continual iterations.To minimize the maximum of the relative error in the interval (0.5, 1), Z i should be initialized.The initialization is represented by Z 0 and is initialized as follows.

Simulation Results
The simulation was performed on ISim. Figure 4 shows the simulation results of the proposed 32-bit floating point division of two numbers using the Newton-Raphson method.Table 3 shows the device utilization of the Xilinx Spartan 6 SP605 Evaluation Platform FPGA.Table 4 shows the Xilinx Power Estimator (XPE) -14.3 device summary report for the proposed 32-bit floating point division carried out for the Spartan-6 SP605 Evaluation Platform FPGA.In Figure 4, N represents the numerator, D represents the denominator, Z0 is the initial value, the Z1 is the first iteration result, Z2 is the second iteration result, Z3 is the final iteration result and Q represents the quotient.The two inputs N and D are given to the divider in IEEE 754 standard format as shown in Table 2.

Conclusion
The Single precision floating point division using Newton-Raphson computational algorithm is designed and synthesized on FPGA.This computational technique provides high computation speed and throughput by computing one multiplicative inverse in one iteration.
In sign unit, the sign bit is computed by xoring the 31 st bit of inputs, s1 (A[31]) and s2 (B[31]) of floating point inputs.The output of xor gate represents the sign of the floating point multiplier.The Vedic multiplication technique is efficiently used for high computational speed and throughput.The complete architecture of proposed 32-bit floating point multiplier is shown in the Figure1.This proposed multiplier module designed using Vedic multiplier is used in the Newton-Raphson computational algorithm for performing the iteration process.

Figure 1 .3
Figure 1.The proposed architecture for 32-bit Floating point multiplier.3Floating point subtractorIn the subtractor module, X[31-0] and Y[31-0] are given as inputs to the floating point subtractor.The sign (s), exponent (e) and mantissa (m) are represented in IEEE 754 format.32-bit floating point subtraction operation is done in a stepwise manner as explained further.First of all, the floating point numbers are unpacked.After unpacking, the sign, exponent and mantissa are identified

Figure 3 .
Figure 3.A Flowchart of the 32-bit floating point division using Newton-Raphson method.

Figure 4 .
Figure 4. Simulation result of Floating point division.

Table 2 .
Sample inputs and its output for simulation

Table 3 .
Device Utilization of the Xilinx Spartan 6 SP605 Evaluation Platform.