An Accelerated Procrustean Markov Process Model With Coherent Constraint for Non-Rigid Structure From Motion

Non-Rigid Structure from Motion (NRSfM) is the task of reconstructing the 3D point set of a non-rigid object from an ensemble of images with 2D correspondences, which has been a long-lasting challenging research topic. Compared to the state-of-the-art methods for NRSfM, the Procrustean Markov Process (PMP) model has obtained a relatively good performance. However, the estimation error and the convergence time of the PMP model will increase simultaneously when noise is present. To address this problem, in this paper, a coherent constraint is constructed to suppress the noise in the initialization step of the PMP algorithm. Moreover, an Accelerated Expectation Maximization (AEM) algorithm is devised to optimize the PMP estimation model. Experimental results on several widely used sequences demonstrate that our proposed algorithm achieves state-of-the-art performance, as well as its effectiveness and feasibility.


I. INTRODUCTION
Reconstructing the 3D object shapes from a set of 2D images has become a valuable approach to enhance the tasks in computer vision, such as virtual reality [1], object recognition [2], biometrics [3], human-computer interaction [4], [5], etc. Non-Rigid Structure from Motion (NRSfM) provides a useful approach to simultaneously estimate the 3D time-varying deformed object and the relative camera motion from the corresponding 2D observation points in a sequence of images. Although many effective algorithms have been proposed for NRSfM in the past few decades, it is still a very complex and ill-posed problem due to the lack of prior information about the 3D structure.
The associate editor coordinating the review of this manuscript and approving it for publication was Ziyan Wu .
In order to solve the uncertainty in NRSfM, many different a priori information, assumptions and constraints have been utilized in reconstructing the 3D shapes. Inspired by the factorization technique for Structure from Motion (SfM) [6], a low-rank constraint was proposed in [7] to model the unknown time-varying deformable 3D shapes, represented as a linear combination of a small number of 3D shape bases. In the matrix factorization method, the 2D observed matrix was factored into a 3D pose matrix and a 3D shape basis matrix. Subsequently, many works have been proposed based on the low-rank shape model. In [8], a closed-form solution was reported, which considers both the low-rank constraint and the rotation constraint. An approximate rank-3K solution was derived in [9] by utilizing a Gaussian prior and a probabilistic principal component analysis shape model. In [10], an approximate rank-3 solution was proposed to VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ solve NRSfM by considering very small semi-definite programming and a nuclear-norm minimization problem. Furthermore, a multilinear factorization algorithm was presented in [11], which incorporates the shape basis assumption and a time-independent latent smoothness characteristic of the unknown 3D non-rigid shapes.
A dual approach to the shape basis representation was proposed in [12] to reduce the number of unknown parameters. The dual approach assumes that the 3D point trajectories are constrained to lie in a linear trajectory space. The linear space is compactly spanned by 3K predefined independent basis trajectories, obtained via the Discrete Cosine Transform (DCT). Nevertheless, the rank-3K constraint has limited capability to model high-frequency deformation, represented by trajectories. In [13] and [14], a better reconstruction of high-frequency deformation was achieved without relaxing the rank-3K constraint, by modeling a smoothly deforming 3D shape as a single point moving along a smooth timetrajectory within a linear shape space. The predefined DCT was applied to represent the coefficients of shape basis.
Based on the trajectory representation, a scalable monocular surface reconstruction method was proposed in [15] to solve the NRSfM problem, for both sparse and dense data. The optimized solution was obtained through singular value thresholding, proximal gradient and alternating direction method of multipliers. In [16], the dense NRSfM problem with complex non-rigid deformations was solved based on the Grassmann manifold. The complex non-rigid deformation was assumed lying on a union of local linear subspaces, both spatially and temporally. In addition, a scalable, efficient, and accurate solution was proposed in [17] to solve the NRSfM problem, by combining the existing point-trajectory low-rank models with a probabilistic framework for matrix normal distributions.
For the trajectory-based methods, how to determine the optimal number of shape bases is a difficult problem. In [18], a Procrustean Normal Distribution (PND) was proposed to represent the distribution of shape deformations by strictly separating the motion and deformation components. The 3D structure can be accurately reconstructed via an EM algorithm, without requiring any additional constraints or prior knowledge. Although [18] and the improved version (PND2) [19] have achieved a relatively good reconstruction performance on most commonly used datasets, they do not work well for shapes with some large drastic deformations and noise, due to the lack of smoothing constraints.
In [20], a Procrustean Markov Process (PMP) model was proposed to enforce the smoothness constraint between two adjacent frames. The sequence of 3D shapes is considered as a simple stationary Markov process based on Procrustes alignment. Nevertheless, the convergence of the EM algorithm is relatively slow. Moreover, the PMP model is sensitive to noise. In this paper, an accelerated PMP model with a coherent constraint is proposed to improve the robustness and the convergence speed of the EM algorithm. Experimental results on several commonly used sequences verify the effectiveness and feasibility of the proposed algorithm.
The key contributions of the proposed approach are two aspects, as follows: • In order to suppress the noise, a coherent constraint, corresponding to a displacement function, is proposed to preserve the global structure of each shape by constraining adjacent points to move coherently.
• An Accelerated Expectation Maximization (AEM) algorithm is proposed to achieve faster convergence, when optimizing the PMP estimation model.
The remainder of the paper is organized as follows. A detailed description of the proposed method is presented in Section II. Experimental results are given in Section III. Finally, conclusions are made in Section IV.

II. METHODOLOGY
The proposed algorithm is composed of three main components: formulation of the PMP model [20], initialization of the PMP model with a coherent constraint, and the optimization of the PMP model using the proposed accelerated EM algorithm.

A. FORMULATION OF THE PMP MODEL
For the i th (i = 1, · · · , n s ) frame in an image sequence, the observed 3D structure D i can be represented as a collection of 3D coordinates (x, y, z) of n p feature points, i.e.
Under the orthographic projection model, the z coordinates are unknown for D i . Define a 3×n p binary weight matrix W i , whose elements in the first two rows and the third row are all ones and zeros, respectively. Then, (1) can be represented as, where the 3×n p hidden variable X i denotes the true 3D shape of the i th frame, t i ∈ R 3×1 is the translation, 1 ∈ R n p ×1 is a vector with elements of one, and m i ∈ R 3×n p is a zero-mean Gaussian noise with standard deviation σ . The operator denotes the Hadamard product. As in [20], given the scale s i and the rotation matrix R i ∈ R 3×3 , X i can be aligned to Y i ∈ R 3×n p , as follows: For Y i , the first-order linear Markov process can be given as follows: vec where vec(Y i ) is a vectorization form of Y i . Y i−1 and Y ∈ R 3×n p are the (i − 1) th aligned 3D shape and the mean shape of Y i (i = 1, · · · , n s ), respectively [20]. The smoothness parameter α is the transition probability of Y i moving through the successive time periods. The noise term n i ∈ R 3n p ×1 is a Gaussian random vector with independent and identical distribution. In (4), the smoothness assumption can effectively reduce the effect of large deformation. The aligned 3D shapes Y i obeys the procrustean normal distribution [18], i.e.
where the symbol N P (·, ·) denotes the procrustean normal distribution, and R ∈ R 3n p ×3n p denotes the covariance matrix of Y i (i = 1, · · · , n s ). As done in [18], in order to solve the singular problem, R is decomposed as, where Q ∈ R 3n p ×(3n p −7) and ∈ R (3n p −7)×(3n p −7) are an orthogonal matrix and a non-singular positive definite symmetric matrix, respectively.
Combining (4) and (5), the distribution of n i is given as follows: where the symbol N (·, ·) denotes the normal distribution, and is an unknown positive definite symmetric matrix [20]. Furthermore, the mean can be computed as follows: Considering (5), (8) and (9), the probability p({Y i }) (i = {1, 2, ..., n s }) can be given as, Referring to [20], the unknown parameters can be estimated by maximizing the following log-likelihood function, Considering the coherent constraint, a good initial value for can be obtained for PMP via the following optimization model, where and I 3 is a 3 × 3 identity matrix. The matrix G i ∈ R n p ×n p is a kernel matrix, whose element g mn is computed as follows: (13) where X i,m and X i,n are the m th and the n th point in X i , respectively; and β defines the width of the Gaussian kernel function [25]. The transformation C i G i is assumed to be a displacement function. The regularization term tr(C i G i C T i ) is a global structure constraint, following the motion coherence theory [23], [24], which can constrain the smoothness of C i G i [25]. The parameter λ makes a trade-off between Procrustes alignment and regularization.
First, X and t i can be obtained by combining (12) and the constraint term X1 = 0, Substitute (15) into (12) and let B = I n p − 1 n p 11 T , where B ∈ R n p ×n p , we can get For (16), considering the orthogonal Procrustes problem, we have where ] T , and svd(·) represents the singular value decomposition [26].
As each shape variation is assumed to be orthogonal to the mean shape [18], we have Considering According to (19), s i can be computed as follows As in [20], the true 3D shape, X i , can be decomposed as follows: where L(z i ) transforms z i into a 3 × n p matrix, in which the elements of the first two rows are zeros and the elements of the third row are z i . Furthermore, vec(L(z i )) = Wz i , where W is a truncated version of (I−diag(vec(W))) removing all-zero columns.

VOLUME 7, 2019
Considering (15) and (21), (12) can be rewritten as follows: Then, we compute the one-order partial derivative of (22) with respect to C i and z i , respectively. As a result, C i and z i can be derived by setting these two partial derivatives to zeros, as follows: where the operators ⊗ and † denote the Kronecker product and the pseudo inverse, respectively.

C. THE PMP MODEL OPTIMIZATION USING AN ACCELERATED EM ALGORITHM
For the model (11), an accelerated EM algorithm is proposed to derive the solutions. Let µ i|n s and C i|n s be the mean and covariance of p(X i |D i , ..., D n s ), respectively. The cross-covariance of vec(X i ) and vec(X i+1 ) is denoted as C i,i+1|n s . The variables µ i|n s , C i|n s and C i,i+1|n s can be computed by the Kalman smoothing in the E-step [20].
In the M-step, all the unknown parameters in (11) are updated by maximizing the expectation of (11), i.e.
Then, each element of t at the (t + 1) th iteration can be obtained as follows: The E-step and M-step of the original EM algorithm are repeated to produce a series of estimates ( t+1 , t , t−1 ). Denote φ as a vectorized variable of . Referring to [27], for the accelerated EM algorithm, φ can be updated as follows: where the operation [·] −1 is defined as follows: In (27), a problem is addressed here. In (11), H and are both required to be positive definite symmetric matrices. In order to satisfy this condition, the upper or lower triangular part of H t and t are first extracted and vectorized. After updated by (27), they are transformed into the corresponding positive definite symmetric matrices. As a result, we can obtain a set of new variables t new according to (26) and (27).
The iterations are repeated until, where τ is the maximum number of iterations. The pseudocode of the PMP-CAEM algorithm is given in Algorithm 1.

III. EXPERIMENTS A. EXPERIMENT DATASETS AND SET-UP
The performance of the proposed method is evaluated on eleven widely used motion sequences: walking, jaws, dance, face1, face2, pickup, stretch, yoga, drink, Face Recognition Grand Challenge (FRGC), and capoeira. These sequences are publicly available from [9], [12], [18], [21]. Note that the FRGC is a 3D facial-landmark dataset from [18], by adding random rotation and scaling to the original FRGC 2.0 database without temporal dependence [22]. For these sequences, the corresponding number of frames (n s ) and the number of points tracked (n p ) are listed in Table 1.  The 3D reconstruction error ε of eleven sequences without noise for six different methods, and the corresponding mean and standard deviation (µ ± σ ) for each method on all the sequences. Figure 1 shows one frame of these eleven image sequences. All simulations were conducted using MATLAB, running on an ordinary personal computer.
In order to evaluate the reconstruction performance, the normalized error ε of the 3D coordinates between the estimated 3D shape (X i ) and the ground-true 3D shape (X i ) is used as the performance index, i.e., Smaller ε means that the estimation is more accurate.

B. COMPARISON TO RECENTLY REPORTED RESULTS
In order to evaluate the effectiveness of the proposed method, denoted as PMP-CAEM, we compare it with several state-of-the-art NRSfM algorithms, including the well-known block matrix method (denoted as BMM) [10], the column-space-fitting method (denoted as CSF) [13], the CSF2 method [14], the procrustean normal distribution method (denoted as PND2) [19] and the procrustean Markov process method (denoted as PMP) [20].  Among these methods, the low-rank parameter K has a significant influence on the final estimation performance for CSF, CSF2 and BMM. For a fair comparison, the parameter K is successively set at {1, 2, · · · 13} for these three methods. The parameter value corresponding to the smallest estimation error is selected as the approximate optimum parameter value of K . Table 2 shows the 3D reconstruction errors ε on eleven sequences without noise for the six methods. In order to easily compare the performances of different algorithms, the best result and the second-best result in Table 2 are highlighted in red and blue, respectively. The reconstruction errors of PND2, PMP and PMP-CAEM are generally lower than that of other methods for most sequences. Moreover, the reconstruction errors of PMP-CAEM are close to that of PMP. Table 3 shows the computation times (seconds) of the different methods on the eleven sequences without noise. We can see that the computation runtimes of PND2, PMP and PMP-CAEM are obviously longer than that of other methods. However, the computation times of PMP-CAEM are significantly lower than that of PMP. This shows that the computation runtimes required by PMP can be greatly reduced by the use of the accelerated expectation maximization algorithm. In order to investigate the robustness to noise, we conducted the experiments with the addition of the Gaussian noise on the original sequences. The standard deviation or level of the Gaussian noise is set as σ noise = a noise max i,j,k {|d ijk |}, where the noise rate a noise is set at {0.2, 0.22, 0.24, 0.26, 0.28}, respectively, and d ijk is the (j, k) th elements of D i , where i = 1, · · · , n s and j = 1, 2, 3; k = 1, · · · , n p . Figure 2 shows one frame of the sequences jaws and stretch with and without noise. The symbols '•' and '+' represent the points of the ground truth and points with noise, respectively. We can see that the positions of the points are randomly changed when noise is added to the original data.
As an example, Table 4 shows the 3D reconstruction errors ε of the six methods on the eleven sequences for the six methods when a noise is set at 0.26. We can see from Tables 2 and 4 that the reconstruction errors of the various algorithms are significantly increased when noises are added. The reconstruction errors of PMP and PMP-CAEM are obviously lower than that of other methods for most of the sequences. For CSF, CSF2, and BMM, a 3D shape is assumed to be composed by a linear combination of K shape bases. Such a model cannot achieve a satisfactory result because the deformation and translation caused by the noise are random and irregular for the different points. For PMP and PMP-CAEM, the smoothing constraint can suppress the partial deformation and deviation caused by noise. Different from PMP and PMP-CAEM, the smoothing constraint is not considered in the PND2 model. Therefore, the noise has a more serious effect on its final estimation results. From Table 4, it can be seen that the performance of PND2 is not yet as good as PMP and PMP-CAEM. Moreover, the reconstruction errors of PMP-CAEM are lower than that of PMP. Table 5 shows the number of iterations for the EM algorithm used in PMP and the accelerated EM used in PMP-CAEM, when a noise is set at 0.26. The number of iterations of PMP-CAEM is obviously lower than that of PMP. Therefore, the accelerated expectation maximization algorithm can significantly decrease the convergence time of PMP. Figure 3 shows the 3D reconstruction errors ε of the six methods on the eleven sequences, when a noise is set at different values. Table 6 tabulates the corresponding mean and standard deviation (µ ± σ ) of the 3D reconstruction errors for different noise rates. The reconstruction errors of PMP and PMP-CAEM are obviously lower than that of other methods for most sequences. Moreover, the reconstruction errors of PMP-CAEM are lower than that of PMP, due to the use of the coherence constraint.

IV. CONCLUSION
In this paper, an accelerated PMP model, with a coherent constraint, is proposed for non-rigid structure from motion. The experimental results demonstrated that the proposed method can simultaneously decrease the estimation error and the convergence time of EM algorithm for PMP. He is currently a Professor with the School of Automation, Huazhong University of Science and Technology, and also with the Key Laboratory of Image Processing and Intelligent Control of the Education Ministry of China, Wuhan, China. He has authored or coauthored more than 100 international journal articles. His current research interests include the theory of functional differential equations and differential equations with discontinuous right-hand sides, and their applications to the dynamics of neural networks, memristive systems, and control systems. Dr