We propose a code scratchpad memory (SPM) management technique with demand paging for embedded systems that have no memory management unit. Based on profiling information, a postpass optimizer analyzes and optimizes application binaries in a fully automated process. It classifies the code of the application including libraries into three classes based on a mixed integer linear programming formulation: External code is executed directly from the external memory. Pinned code is loaded into the SPM when the application starts and stays in the SPM. Paged code is loaded into/unloaded from the SPM on demand. We evaluate the proposed technique by running 14 embedded applications both on a cycle-accurate ARM processor simulator and an ARM1136JF-S core. On the simulator, the reference case, a four-way set-associative cache, is compared to a direct-mapped cache and an SPM of comparable die area. On average, we observe an improvement of 12 percent in runtime performance and a 21 percent reduction in energy consumption. On the ARM11 board, the reference case run on the 16-KB four-way set-associative cache is compared to the demand paging solution on the 16-KB SPM, optionally supported by the cache. The measured results show both a runtime performance improvement and a reduction of the energy consumption by 23 percent, on average.