We explore two fundamental questions at the intersection of sampling theory and information theory: how channel capacity is affected by sampling below the channel's Nyquist rate, and what sub-Nyquist sampling strategy should be employed to maximize capacity. In particular, we derive the capacity of sampled analog channels for three prevalent sampling strategies: sampling with filtering, sampling with filter banks, and sampling with modulation and filter banks. These sampling mechanisms subsume most nonuniform sampling techniques applied in practice. Our analyses illuminate interesting connections between undersampled channels and multiple-input multiple-output channels. The optimal sampling structures are shown to extract out the frequencies with the highest SNR from each aliased frequency set, while suppressing aliasing and out-of-band noise. We also highlight connections between undersampled channel capacity and minimum mean-squared error (MSE) estimation from sampled data. In particular, we show that the filters maximizing capacity and the ones minimizing MSE are equivalent under both filtering and filter-bank sampling strategies. These results demonstrate the effect upon channel capacity of sub-Nyquist sampling techniques, and characterize the tradeoff between information rate and sampling rate.