The popularity of mobile/wireless embedded systems running multimedia applications is growing. MPEG4 is an important and demanding multimedia application. With improved CPU, memory subsystem deficiency is the major barrier to improving the system performance. Studies show that there is sufficient reuse of values for caching to significantly reduce the raw required memory bandwidth for video data. Decoding MPEG4 video data in software generates many times more cache-memory traffic than required. Proper understanding of the decoding algorithm and the composition of its data set is obvious to improve the performance of such a system. The focus of this paper is to enhance MPEG4 decoding performance through cache optimization of a mobile device. The architecture we simulate includes a digital signal processor (DSP) to run the decoding algorithm and a two-level cache system. Level-1 cache is split into data (D1) and instruction (I1) caches and level-2 (CL2) is a unified cache. We use Cachegrind and VisualSim simulation tools to optimize cache size, line size, associativity, and levels of caches for a wireless device decoding MPEG4 video.