We propose a general framework for aligning continuous (oblique) video onto 3D sensor data. We align a point cloud computed from the video onto the point cloud directly obtained from a 3D sensor. This is in contrast to existing techniques where the 2D images are aligned to a 3D model derived from the 3D sensor data. Using point clouds enables the alignment for scenes full of objects that are difficult to model; for example, trees. To compute 3D point clouds from video, motion stereo is used along with a state-of-the-art algorithm for camera pose estimation. Our experiments with real data demonstrate the advantages of the proposed registration algorithm for texturing models in large-scale semi-urban environments. The capability to align video before a 3D model is built from the 3D sensor data offers new practical opportunities for 3D modeling. We introduce a novel modeling-through-registration approach that fuses 3D information from both the 3D sensor and the video. Initial experiments with real data illustrate the potential of the proposed approach.