Successfully tracking targets in a video sequence has many important applications, including unmanned aerial vehicle (UAV) surveillance. A robust and efficient video tracking algorithm based on the continuous wavelet transform (CWT) is presented, which has been proven to be effective in capturing motion information over multiple frames and excellent in velocity selectivity. The CWT converts target trajectories in a spatio-temporal domain into target energy volumes in a wavenumber-frequency domain. By integrating over different motion parameters, three target energy densities are obtained, which then serve as cost functions for estimating target trajectories and sizes. Because of excellent velocity selectivity, the energy-based tracker has the capability of detecting and tracking targets with a particular velocity range. To best handle target interferences among multiple nearby or crossing targets, a novel joint processing technique using expectation-maximization-based Gaussian mixture estimation is developed. A global nearest neighbourhood algorithm is employed to perform data association and maintain continuous kinematic trajectories. In addition to computer simulations, the developed energy-based algorithm is applied to a UAV surveillance application where multiple vehicles move closely to each other on a multi-lane road.