This paper presents an on-chip stack based memory organization that effectively reduces the energy dissipation in programmable embedded system architectures. Most embedded-systems use the notion of stack for implementation of function calls. However such stack data is stored in processor address space, typically in the main memory and accessed through caches. Our analysis of several benchmarks show that the callee saved registers and return addresses for function calls constitute a significant portion of the total memory accesses. We propose a separate stack-based memory organization to store these registers and return addresses. Our experimental results show that effective use of such stack-based memories yield significant reductions in system power/energy, while simultaneously improving the system performance. Application of our approach to the SPECint95 and MediaBench benchmark suites show up to 32.5% reduction in energy in L1 data caches, with marginal improvements in system performance.