In deep submicron technology, wires are equally or more important than logic components since wire-related problems such as crosstalk noise is much critical in system-on-chip design. Recently, a method for generating a partial product reduction tree with optimal-timing using bit-level adders to implement arithmetic circuits has been proposed, which outperforms the current best designs. However, in the conventional approaches, interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messy wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carry-save-adder (CSA) modules, which not only optimizes the circuit timing but also generates a much more regular interconnect topology of the final circuits. Specifically, we propose a two-step algorithm: (Phase 1: CSA module generation) we propose an optimal-timing CSA module generation algorithm for an arithmetic expression under a general CSA timing model; then (Phase 2: Bit-level interconnect refinements), we optimally refine the interconnects between the CSA modules while retaining the global CSA-tree structure produced by Phase 1. We show that the timing of the circuits produced by our approach is equal or almost close to that in most test cases (even without including the interconnect delay), and at the same time, the interconnects in layout are short and regular.