Multi-modal imaging sensors are employed in advanced surveillance systems in the recent years. The performance of surveillance systems can be enhanced by using information beyond the visible spectrum, for example, infrared imaging. To ensure correctness of low- or high-level processing, multi-modal imagers must be fully calibrated or registered. In this paper, an algorithm is proposed to register the video sequences acquired by an infrared and an electro-optical (CCD) camera. The registration method is based on the silhouette extracted by differencing adjacent frames. This difference is found by an image structural similarity measurement. Initial registration is implemented by tracing the top head points in consecutive frames. Finally, an optimization procedure to maximize mutual information is employed to refine the registration results.