A 3D graphics system integrating two symmetric unified shader cores for mobile application is presented. To utilize instruction, data, and task level parallelism, a dual-core, dual-issue VLIW and multi-threading method is adopted. For efficient processing, an IEEE-754 compliant fast 4D vector inner product arithmetic unit for matrix multiplication, an internal bus system and a configurable texture cache technique to reduce power consumption in texture unit are proposed. By these methods, the proposed processor achieves 143 Mvertices/s and 2.3 Gtexels/s consuming the power of 367 mW. Also, 45% performance improvement and 26% increase in performance per power ratio are achieved.