In this paper, we present the design optimization of one- and two-dimensional fully pipelined computing structures for area-delay-power-efficient implementation of finite-impulse-response (FIR) filter by systolic decomposition of distributed arithmetic (DA)-based inner-product computation. The systolic decomposition scheme is found to offer a flexible choice of the address length of the lookup tables (LUT) for DA-based computation to decide on suitable area time tradeoff. It is observed that by using smaller address lengths for DA-based computing units, it is possible to reduce the memory size, but on the other hand that leads to increase of adder complexity and the latency. For efficient DA-based realization of FIR filters of different orders, the flexible linear systolic design is implemented on a Xilinx Virtex-E XCV2000E FPGA using a hybrid combination of Handel-C and parameterizable VHDL cores. Various key performance metrics such as number of slices, maximum usable frequency, dynamic power consumption, energy density, and energy throughput are estimated for different filter orders and address lengths. Analysis of the results obtained indicate that performance metrics of the proposed implementation is broadly in line with theoretical expectations. It is found that the choice of address length yields the best of area-delay-power-efficient realizations of the FIR filter for various filter orders. Moreover, the proposed FPGA implementation is found to involve significantly less area-delay complexity compared with the existing DA-based implementations of FIR filter.