Cameras face a fundamental trade-off between spatial and temporal resolution. Digital still cameras can capture images with high spatial resolution, but most high-speed video cameras have relatively low spatial resolution. It is hard to overcome this trade-off without incurring a significant increase in hardware costs. In this paper, we propose techniques for sampling, representing, and reconstructing the space-time volume to overcome this trade-off. Our approach has two important distinctions compared to previous works: 1) We achieve sparse representation of videos by learning an overcomplete dictionary on video patches, and 2) we adhere to practical hardware constraints on sampling schemes imposed by architectures of current image sensors, which means that our sampling function can be implemented on CMOS image sensors with modified control units in the future. We evaluate components of our approach, sampling function and sparse representation, by comparing them to several existing approaches. We also implement a prototype imaging system with pixel-wise coded exposure control using a liquid crystal on silicon device. System characteristics such as field of view and modulation transfer function are evaluated for our imaging system. Both simulations and experiments on a wide range of scenes show that our method can effectively reconstruct a video from a single coded image while maintaining high spatial resolution.