A rising horizon in chip fabrication is the 3D integration technology. It stacks two or more dies vertically with a dense, high-speed interface to increase the device density and reduce the delay of interconnects significantly across the dies. However, a major challenge in 3D technology is the increased power density, which gives rise to the concern of heat dissipation within the processor. High temperatures trigger voltage and frequency throttlings in hardware, which degrade the chip performance. Moreover, high temperatures impair the processor's reliability and reduce its lifetime. To alleviate this problem, we propose in this paper an OS-level scheduling algorithm that performs thermal-aware task scheduling on a 3D chip. Our algorithm leverages the inherent thermal variations within and across different tasks, and schedules them to keep the chip temperature low. We observed that vertically adjacent dies have strong thermal correlations and the scheduler should consider them jointly. Compared with other intuitive algorithms such as a Random and a Round-Robin algorithm, our proposed algorithm brings lower peak temperature and average temperature on-chip. Moreover, it can remove, on average, 46 percent of thermal emergency time and result in 5.11 percent (4.78 percent) performance improvement over the base case on thermally homogeneous (heterogeneous) floorplans.