This paper presents the design and analysis of a power and area efficient transpose memory structure for use in adaptive signal processing systems. The proposed architecture achieves significant improvements in system throughput over competing designs. We demonstrate the throughput performance of the proposed memory on FPGA as well as ASIC implementations. The memory was employed in a watermarking architecture previously proposed. The new memory design allows for 2X speed up in performance for the watermarking algorithm and up to 10X speedup for 2D DCT and IDCT algorithms compared to previously published work, while consuming significantly lower power and area.