Array Multiplier and CIA based FIR Filter for DSP applications

In Field Programmable Gate Array (FPGA) platform, Finite Impulse Response (FIR) filter is one of the important applications in the context of Digital Signal Processing (DSP). The traditional FIR is designed using a number of the adders, multipliers which enlarge the area of the filter architecture. Generally, the multiplier and adder are required to design the FIR filter. In this research, Array Multiplier (AM) is used in the Processing Element (PE) for multiply the filter inputs with coefficients. This research employs the Carry Increment Adder (CIA) in the accumulator for the adding output of the PE. The proposed method is named as AM -CIA-FIR filter. Due to the usage of AM and CIA adder, the hardware utilization of the proposed work is improved. The AM -CIA-FIR filter is implemented in Xilinx ISE software by using Verilog code on different Virtex devices in terms of Virtex-4, Virtex-5, and Virtex-6. This experiment results showed that AM -CIA-FIR filter has reduced 14.01 % of the FPGA utilization compared to the PSA-FIR filter design.


Introduction
The FIR filtering is one of the basic step in several DSP applications: wireless communication, video processing, and image processing [1]. The FIR filter is detected extensive applications in the mobile communication systems to perform the stability and the linear phase properties [2], [3], [4]. In the past decades, several efficient techniques and structures have been analysed, that are widely known to the academic and industrial communities [5]. Nowadays, less area and power FIR filter is important in the DSP research area for efficient signal processing applications which performs in different taps [6], [7]. Moreover, the structural adders are employed in the FIR filter, which is more expensive [8]. The sensitivity driven algorithm is used FIR filter design to quantify the contribution of a non-zero digit of coefficient set to its frequency response characteristic is enhanced weight two-sub expression. But, the proposed algorithm has increased computational complexity [9]. The fixed FIR filter is designed by using Multiple Constant Multiplication (MCM) scheme. The derivation of flow graph for transpose form FIR filter block is reduced the complexity of register eventhough the mathematical analysis is complex for MCM scheme [10]. In this research, AM -CIA-FIR filter is designed to evaluate the number of Lookup Table ( 53 AM, sign of all the partial products bits are positive which need for compliment of each multiplier and multiplicand bit. AM is used in High Performance Multiplier (HPM) tree which inherits regular and repeating multipliers such as Baugh wooley multiplier, Wallace tree multiplier, etc. Due to the presence of less logical elements in AM, the overall power consumption, area, and delay of the entire architecture has improved. The signed values also performed the multiplication operation in AM. Moreover, CIA helps to reduce the propagation delay and perform the addition logic quickly. Due to the usage of CIA in the architecture, amount of time has reduced to determine the carry bit and no need to wait for carry for determine the sum output. With the help of AM and CIA, all the FPGA performances are improved in AM -CIA-FIR filter when compared to the conventional FIR filter. Rest of this paper as follows, Section-2 details the FIR filter based survey papers. In Section-3, proposed method architecture and internal module has explained. Section-4 shows the setup and results of existing and proposed AM-CIA-FIR strategies and Section-5 concludes the proposed work.

Literature review Pramod Patali and Shahana Thottathikkulam
Kassim [11] proposed two efficient structures of FIR filter with increased throughput reduced latency and hardware complexity. Two modified CSLA modules (linear CSLA and square root CSLA) were obtained by concatenating the ideas of improved carry select and carry skip adder. Critical path delay analysis was carried out for 2 CSLA modules and stated that square root CSLA had minimum CPD of about 0.23ns than linear CSLA module. Comparison results stated that CDP, Power, PDP and ADP for filter 2 were reduced by 71, 38, 82 and 78% respectively than filter 1. Though filter 2 achieved delay improvement, cost of area and power increased because delay efficient multiplier was suitable only for time consuming path. Thiruvenkadam Krishnan et al. [12] designed a high-speed area efficient RCA based 2-D bypassing multiplier for FIR implementation. It eliminated the carry multiplexer in all logic cells which was used in bypassing technique and worked based on divide and conquer principle to shorten the delay time. For example, a 4x4 multiplication was divided using two 4x2 bypassing multiplier where partial sum and carry outputs were computed simultaneously with reduced delay time. A 4-Tap FIR filter was designed using the proposed technique and implemented by Altera Quartus II tool with cyclone EP1C12F324C6. Results concluded that this module had a reduction of 15% LUT, 15% power and 10% increase in speed. Implementation of divide and conquer principle was not suitable for fast adder like CSA. Radha Rammohan S et al. [13] developed an approximate 4:2 compressor adder in memoryless DA based FIR filter architecture. The main emphasis of this design was to reduce the area and power consumption for hearing aid applications. Memoryless DA architecture was designed using compressor adder because the area of ROM significantly increases with respect to filter order. The proposed design was reconfigurable and the filter co-efficient can be changed during the run time. Using 90nm technology in synapsis ASIC design compiler it was synthesized and showed minimum area (14445µ 2 m), ADP(20011 µ 2 m x ns), MSP(1.32ns), MSF(648MHz) and PDP(11.48mW x ns). Problem arose that compressor produce only approximated value, not accurate value which affects the filtering performance. Samyuktha S and Chaitanya et al. [14] DL proposed an effective FIR filter using the multiplication principle of vedic mathematics and Ripple carry adder. Frequently, Single Constant Multiplication (SCM) and Multiple Constant Multi plication (MCM) were used in FIR implementation. But, time and efficiency had become a conflict in configuring an effective FIR filter. Thereby, vedic multiplier was used to deal with conflicts and reduced time lost by half. Even though conflicts were solved cross-checking of the results was difficult and also the identification of vedic mathematic classification was a problem. Shinwoong Park et al. [13][14][15] developed an analog FIR filter system specially designed for full bandwidth utilization in communication by using split capacitive DAC's. Split DAC's acted as multiplier co-efficient that were controlled by 7-bit codes to provide high linearity over the full frequency range. Analysis of noise and effect of 5channel time interleaved operation was also conducted. AFIR filter was implemented in 32-nm SOI CMOS technology and achieved 11dBm IIP over the frequency range with 0.9v supply and had better filtering performance. Significantly, aliasing issue occurred for higher order filter with increase in noise level, power consumption (10.6mW) and intricacy in clock distribution.
The major problems of FIR filter are mentioned below,  Normally, for designing the FIR filter more logical blocks are required which causes more area and power.
 The filter operation took more time due to the usage of unwanted blocks.
 Co-efficient and inputs are too difficult to store with allocate memory.
 Normal addition operation occupied large area. Solution: To conquer the above problems, an efficient AM-CIA-FIR filter is designed to improve the performance of proposed architecture. AM helps to perform the multiplication operation with high speed. CIA adder helps to perform the addition operation with less area. In this research work, the adder is used in accumulator module. So, the importance of addition process is more. So, here CIA adder used instead of using normal adder [16,17]

AM -CIA-FIR Filter Design
The multipliers and adders are more important to perform the digital FIR filters. These arithmetic circuits are highly responsible for area consumption. Furthermore, the multiplier is much capable for high-speed, low power, low area, and compact VLSI implementation. So, FIR filter is designed by using AM and CIA in this research.

A. Proposed FIR Filter Design
FIR filters are one of the non-recursive filters [17], which is used for adding of the input samples and it multiplied by constants. FIR filter is known as a convolution in DSP, which is represented by Eq. (1).

   
Here, Number of taps of the filter structure is denoted as , ( ) N y n represents filter output,   biis coefficient of the N -FIR filter length,   1 xn  represented the number of input sequence. Block diagram of the AM -CIA-FIR filter design is represented in the Fig. 1. In FIR, the channel coefficients has been put away in the ROM and channel input has been put away in the RAM. The address generator creates the information address and it assists with perusing the information from ROM, and RAM to acquire the channel information and coefficient information. The information reader module gives the input information; ceaseless tasks are done as follows. At first, figure the memory address of the new information, that information are empower to RAM and store the information in the RAM as indicated by the location. The coefficients are stored in ROM which is performed the PE operation with RAM data. The PE output is connected to the accumulator block to produce the FIR filter output. AM and CIA is utilized in PE module and accumulator module to improve architecture performance .

Fig.1 AM -CIA-FIR filter architecutre. B.Array Multiplier
In DSP applications, due to the advancements in current technologies design targets for better performance are mainly concentrated, where critical path delay and performance of the processor depends on multiplier block. Array multiplier is much suitable modest architecture due to its less design, time complexity and perform fast multiplication in pipelined manner. It is a digital combinational circuit to perform multiplication of two n-bit numbers based on ADD and SHIFT algorithm.
Step by step process block diagram is represented in the Fig. 2.

Fig.2.Block diagram of Multiplier
For n x n array multiplier, it requires n*n AND gates, n*(n-2) Full Adders (FA) and n Half Adders (HA). General hierarchical structure of 4x4 array multiplier is shown in the Fig. 3. Consider A as Multiplicand and B as Multiplier to produce P product terns which is shown in equation 2. P = A(Multiplicand bit) * B(Multiplier bit) (2) First step in the design is partial product generation. Each bit of multiplicand is ANDed with a single bit of multiplier to generate n 2 partial products (A j . B k ). The partial products are shifted based on the bit order and added. Then product bits are formed using adders in each column i=(j+k). Here, adders are arranged in carry save fashion in which carry out bits are fed to the next available adder in the column to the left. Final product is obtained from final adder thus improving delay and area. Interms of speed, array multiplier outperforms serial multiplication scheme as a parallel multiplication.
For 4x4 array multiplier, it requires 16 AND gates, 8 Full Adders and 4 Half Adders. Similarly, for 8x8 array multiplier, 64 AND gates, 48 Full Adders and 8 Half Adders are required.

C. Carry Increment Adder
Adder is also one of the basic building block of DSP processor where series of repeated additions are performed to speed up the multiplier operation. In order to speed up the multiplier operation, addition speed must be increased. Fast adders are used because it performs faster than conventional adders like RCA, CLA etc. Among the fast adders Carry Increment Adder (CIA) has better delay performance an important attribute in the high speed devices.

Fig.3 Logical design of 4X4 array multiplier
Carry Incremental Adder (CIA) contains two essential blocks one is RCA and other one is incremental circuitry block. Incremental circuit is designed in a sequential flow using half adders in ripple carry chain. Here, addition operation is carried out by several RCA's and splitting the total number of bits into groups of 4-bits. Ripple Carry Adder (RCA) has cascade structure of multiple full adders for n bit input sequence. Thus carry will be generated in each full adder block. First stage carry output is rippled to second stage full adder acting as carry input and the process continues upto last stage shown in Fig. 4

Fig.4 Logical diagram of RCA
While in carry select scheme two partial sums are computed and the correct one is selected with the help of multiplexers. This increases the area and affects the speed. Thus incremental circuit replaces the second adder and multiplexer block and calculates only one partial sum and increment it if necessary. The block diagram representation of 8bit CIA is shown in the Fig. 5.

D. DSP applications
In DSP applications, the filtering method is one of the important process to eliminate the unwanted information which is present in the input data. This filtering process is possible to implement by using this proposed work. Normally, the filtering process is used to remove the noises (Salt and pepper/ Flicker noise) in the images and the bio medical signals such as Electrocardiogram (ECG), Electroencephalogram (EEG), and Electro myogram (EMG). Due to the usage of AM with CIA architecture, the FIR filter can reduce the noises which are present in the Images or signals. For the noise reduction, the noisy input images or noisy biomedical signals are read in MATLAB software to convert into binary values which are stored in the ROM for performing the processing element. This proposed FIR architecture removed the noisy pixels values and generated the noise free image or signals in the output terminal. So, the proposed FIR filter architecture also used in DSP applications.

Results and Discussion
The proposed AM-CIA-FIR filter is designed and analyzed in different FPGA devices.
The architecture module has implemented in Modelsim 10.5 software to verify the output waveform. Xilinx ISE 14.7 software helps to calculate the FPGA performances In this research, the AM-CIA-FIR and existing FIRs filter designs also implemented in Xilinx tool that results are tabulated, which is shown in the table 1. The number of LUT, slice, flip-flop, frequency and Input-Output Block (IOB) are analyzed through the FPGA implementation. The performance of the AM-CIA-FIR filter is analyzed in Virtex-4 xc4vfx12, Virtex-5xc5vlx20T and Virtex-6 xc6vcx75t. In table 1, Virtex-6 is given best results compared to Virtex-4 and Virtex-5. Performance reduction of the Vertex-6, 3.81% of the LUT, 9.9 % of the flip-flop, 29.81% of the slice compared to the PSA-FIR filter design, respectively. The 0.166 (W) of power is consumed at a frequency of 247.78 MHz.

TABLE.1 EXPERIMENT RESULTS OF FPGA PERFORMANCE FOR EXISTING AND AM-CIA-FIR FILTER DESIGN
The filter has designed different taps such as 8 taps, 16 taps and 32 taps. Fig.6, 7 and 8 shows the comparison graph of the LUT, flip-flop and slice performance for PSA-FIR [16] and AM-CIA-FIR filter on Virtex-6. This pictorial representation graph is clearly states that the proposed method has better FPGA performance when compared to the conventional designs Fig.9  initially. After performing the initial clock FIR filter, the output is added with y and stored in same y register. Finally, the accumulator produced the FIR filter outputs. The number of the output bits depends on the number of input bits. The AM-CIA-FIR filter is designed based of 8-bit input and a8-bit coefficient and it gives filter output as 16bit. In this research, FIR filter is designed by using 16-bit AM and 16-bit CIA design.

Conclusion
In this work, BW multiplier and CIA have been used to expand the FIR filter operation. This optimal multiplier and optimal adders have been used in the processing element module. The proposed AM-CIA-FIR filter was implemented in different FPGA devices FPGA has developed as a platform of the choice for efficient and faster realization of the computer intensive applications. The main aim of the proposed method is to mitigate the area complexity of the product accumulation block using 16-bit CIA. This proposed architecture was designed for different taps such as 8, 16, and 32. In the 8-tap Virtex-6 performance, 3.81% of the LUT, 9.9 % of the flipflop, 29.81% of the slice compared to the PSA-FIR filter design, respectivelyy. This proposed FIR filter is much suitable for DSP applications because it's required less area. In future work, the FIR architecture will be implemented with the help of optimal multiplier and optimal adder to further improve the FPGA performances.