Different kinds of high-dimensional visual features can be extracted from a single image. Images can thus be treated as multiple view data when taking each type of extracted high-dimensional visual feature as a particular understanding of images. In this paper, we propose a framework of sparse unsupervised dimensionality reduction for multiple view data. The goal of our framework is to find a low-dimensional optimal consensus representation from multiple heterogeneous features by multiview learning. In this framework, we first learn low-dimensional patterns individually from each view, considering the specific statistical property of each view. We construct a low-dimensional optimal consensus representation from those learned patterns, the goal of which is to leverage the complementary nature of the multiple views. We formulate the construction of the low-dimensional consensus representation to approximate the matrix of patterns by means of a low-dimensional consensus base matrix and a loading matrix. To select the most discriminative features for the spectral embedding of multiple views, we propose to add an l1-norm into the loading matrix's columns and impose orthogonal constraints on the base matrix. We develop a new alternating algorithm, i.e., spectral sparse multiview embedding, to efficiently obtain the solution. Each row of the loading matrix encodes structured information corresponding to multiple patterns. In order to gain flexibility in sharing information across subsets of the views, we impose a novel structured sparsity-inducing norm penalty on the loading matrix's rows. This penalty makes the loading coefficients adaptively load shared information across subsets of the learned patterns. We call this method structured sparse multiview dimensionality reduction. Experiments on a toy benchmark image data set and two real-world Web image data sets demonstrate the effectiveness of the proposed algorithms.