Register file (RF) in modern embedded processors contributes a substantial budget in the energy consumption due to its large switching capacitance and long working time. For embedded processors, on average 25% of registers count for 83% of RF accessing time. This motivates us to partition the RF into hot and cold regions, with the most frequently used registers placed in the hot region, and the rarely accessed ones in the cold region. We employ the techniques of bit-line splitting and drowsy register cell to reduce the overall accessing power of RF. We propose a novel approach to partition the RF in a way that can achieve the largest power saving. We formulate the RF partitioning process into a graph partitioning problem, and apply an effective algorithm to obtain the optimal result. We evaluate our algorithm on MiBench and SPEC2000 applications, and an average saving of 58.3% and 54.4% over the non-partitioned RF accessing power is achieved for the SimpleScalar PISA system, respectively. The area overhead is negligible, and the execution time overhead is acceptable.