A Robust Approach to ARMA Factor Modeling

This paper deals with the dynamic factor analysis problem for an ARMA process. To robustly estimate the number of factors, we construct a confidence region centered in a finite sample estimate of the underlying model which contains the true model with a prescribed probability. In this confidence region, the problem, formulated as a rank minimization of a suitable spectral density, is efficiently approximated via a trace norm convex relaxation. The latter is addressed by resorting to the Lagrange duality theory, which allows to prove the existence of solutions. Finally, a numerical algorithm to solve the dual problem is presented. The effectiveness of the proposed estimator is assessed through simulation studies both with synthetic and real data.


I. INTRODUCTION
We deal with the problem of constructing a dynamical model from a high dimensional stream of data that are assumed to be noisy observations of a process depending on a small number of hidden variables.In the static case, this problem is known as factor analysis.Its origins can be traced back to the beginning of the last century and the amount of literature produced on this topic is impressive: we refer the readers to the recent papers [1], [2], [3], [4] for an overview of the literature and a rich list of references.The solution of factor analysis problems may be obtained by decomposing the covariance matrix of the observed data as the sum of a diagonal positive definite matrix (accounting for the noise covariance) and a positive semidefinite matrix whose rank must be as small as possible since it equals the number of hidden variables in the model.The main problem of this solution is that it is inherently fragile; in fact, even a minuscule variation in the covariance matrix of the observed data usually leads to a substantial variation of the number of hidden variables which is the key feature of the modelling procedure.On the other hand such a matrix must be estimated and is therefore subject to errors.To address this fragility issue, a robust method has been recently proposed, [5], that has been generalized with good results also to the dynamic framework of learning latent variable dynamic graphical models, [6].
Dynamic Factor Analysis (DFA) has been addressed much more recently, the first contribution in this field being apparently [7].We refer to the surveys [8], [9] and to the recent paper [10] for an overview of the literature on this subject.In [11] an interesting generalization is applied to modelization of dynamical systems.
In this paper, we address the dynamic autoregressive moving average (ARMA) case with the aim of extracting, from the observed data, a model featuring a small number of hidden variables.This is important both from the point of view of the model simplicity and to uncover the structure of the mechanism generating the data.The problem may be mathematically formulated as that of decomposing the spectral density of the process generating the data as the sum of a diagonal spectral density and a low-rank one.The fragility issue in this case is even more severe.We address the problem as follows: • Given the observed data, we compute by standard methods (e.g.truncated periodogram) a raw estimate Φ of the spectral density Φ generating the data.
• We compute a neighbourhood N of Φ that contains Φ with prescribed probability; clearly the size of N depends on the sample size.
• We compute a refined estimate Φ • ∈ N by imposing that it admits an additive decomposition as a diagonal spectral density and a spectral density with the lowest possible rank.To this end we set up an optimization problem that we address by resorting to duality.In particular, we prove existence of solutions and provide a numerical algorithm to compute a solution.
Our work may be cast in the rich stream of literature devoted to learning dynamic models having a topological structure describing the presence or the absence of interactions among the variables of the systems; see the former works [12], [13], [14] as well as their extensions to reciprocal processes [15], [16], sparse plus low rank graphical models [17], [18], [6], the Bayesian viewpoint proposed in [19], [20] and the case of oriented graphical models [21], [22].
The contribution of this paper is twofold; first, we propose a procedure to estimate the number of latent factors in dynamic ARMA factor models: this is the most delicate aspect of factor analysis problems; second, we derive an identification method to estimate the parameters of a factor model describing the observed data.
The outline of the paper is as follows.In Section II we introduce the DFA problem for moving average (MA) models.Section III shows that such a problem admits solution by means of duality theory, while Section IV shows how to reconstruct the solution of the primal problem from the dual one.In Section V we propose an algorithm to compute the solution of the dual problem.In Section VI we extend the previous ideas to ARMA models.Section VII presents some numerical results.Finally, in Section VIII we draw the conclusions.

A. Notation
Given a matrix M , we denote its transpose by M and by M (i,j) the element of M in the i-th row and j-th column.If M is a square matrix, tr(M ), |M | and σ(M ) denote its trace, its determinant and its spectrum, respectively.The symbol • stands for the Frobenius norm.For A, B ∈ R m×m , we define their inner product as A, B := tr(A B).Let Q m be the space of real symmetric matrices of size m; if M ∈ Q m is positive definite or positive semidefinite, then we write M 0 or M 0, respectively.We denote by (•) * the complex conjugate transpose.we write Φ(e iϑ ) 0 ( 0).Integrals are always defined from −π to π with respect to the normalized Lebesgue measure dϑ/2π.

II. IDENTIFICATION OF MA FACTOR MODELS
Consider the MA factor model whose order is n: where where By construction, rank(Φ L ) = r, where rank denotes the normal rank (i.e. the rank almost everywhere), and Φ D is diagonal.Therefore, y represents a factor model if its spectral density can be decomposed as "low rank plus diagonal" as in (2).
Assume to collect a finite length realization of y defined in (1), say y N = { y(1) . . .y(N ) } where the order n is known.We want to estimate the corresponding factor model, that is the decomposition in (2) as well as the number of factors r.To this aim, given our data y N , we first compute the sample covariance lags Rj as Then, an estimate Φ of Φ is obtained by the truncated periodogram: Notice that Φ could be not positive definite for all ϑ; in that case, we can add εI m to the right side of Equation ( 3), with the constant ε > 0 chosen in such a way as to ensure the positivity of Φ.On the other hand, Φ may not admit a low rank plus diagonal decomposition.Thus, we estimate directly the two terms Φ L and Φ D of the decomposition (2) by solving the following optimization problem: July 9, 2021 DRAFT Here, the objective function promotes a solution for Φ L having low rank, see [17].The first three constraints impose that Φ L and Φ D provide a genuine spectral density decomposition of type (2).The last constraint, in which S IS (Φ|| Φ) is the Itakura-Saito divergence defined by imposes that Φ belongs to a set "centered" in the nominal spectral density Φ and with prescribed tolerance δ.Notice that Φ D is uniquely determined by Φ and Φ L .Thus, Problem (4) can be rewritten by removing Φ D : A. The Choice of δ Before solving our problem, we deal with the choice of the tolerance parameter δ appearing in the constraint of (5).This choice should reflect the accuracy of the estimate Φ of Φ.This can be accomplished by choosing a desired probability α ∈ (0, 1) and considering a ball of radius δ α (in the Itakura-Saito topology) centered in Φ and containing the true spectrum Φ with probability α.The estimation of δ α is not an easy task because we do not know the true power spectral density Φ. Next, we propose a resampling-based method to estimate it.
The idea is to approximate Φ with Φ, and use this model to perform a resampling operation. Let be the minimum phase spectral factor of Φ and define the process ŷ = {ŷ(t), t ∈ Z} as ŷ(t) := W (e iϑ )e(t), where e(t) is an m-dimensional normalized white noise.The truncated periodogram (understood as estimator) based on a sample of the process ŷ of length N is where the subscript "r" stands for resampling, as it is the means by which we perform the resampling operation.By generating a realization ŷN = { ŷ(1) . . .ŷ(N ) } from Φ (i.e. by resampling the data), we can easily obtain a realization of the random variable S IS ( Φ|| Φr ).Accordingly, it is possible to compute numerically δ α such that Pr(S IS ( Φ|| Φr ) ≤ δ α ) = α by a standard Monte Carlo procedure.Numerical simulations show that this technique indeed provides a good estimate of δ.
It is worth noting that if the chosen α is too large with respect to the data length N , the resulting δ α may be too generous yielding to a diagonal Φ obeying S IS (Φ|| Φ) ≤ δ α .In this case Problem (5) admits the trivial solution Φ L = 0 and Φ D = Φ.To rule out this trivial case, δ in (5) must be be strictly smaller than the upper bound where S + m denotes the family of bounded and coercive functions defined on the unit circle and taking values in the cone of positive definite m × m Hermitian matrices.Since Φ must be diagonal, by denoting with φ i and by γi the i-th element in the diagonal of Φ and of Φ−1 , respectively, we have where diag 2 (•) is the (orthogonal projection) operator mapping a square matrix M into a diagonal matrix of the same size having the same main diagonal of M .Therefore, since the Itakura-Saito divergence is nonnegative, the solution corresponds to φ opt i (e iϑ ) = (γ i (e iϑ )) −1 , i = 1, ..., m for which S IS (φ opt i ||γ −1 i ) = 0. Accordingly, The derivation of the aforementioned result is based on reasonings similar to [6, Section IV].
A more generous upper bound can be derived by assuming that Φ is the spectrum of an MA process of order n.However, numerical experiments showed that δ max δ α even in the case that N is relatively small.

III. PROBLEM SOLUTION
In this section we first provide a finite dimensional matrix parametrization of Problem (5).
The latter is then analyzed by resorting to the Lagrange duality theory, which allows us to prove the existence of a solution.

A. Matricial Reparametrization of the Problem
To study Problem (5) it is convenient to introduce the following matrix parametrization for where ∆(e iθ ) is the so-called shift operator: X and L are matrices in Q m(n+1) and X ij denotes the block of X in position i, j with i, j = 0, . . ., n, so that .
Moreover, M m,n denotes the vector space of matrices of the form The linear mapping T : M m,n → Q m(n+1) constructs a symmetric block-Toeplitz matrix from its first block row so that if Y is given by ( 9), The adjoint of T is the mapping Next, the objective is to provide a more convenient formulation of Problem (5) in terms of X and L. To this end, we have to take into account the following points.P ∈ Q m(n+1) such that ∆P ∆ * and P 0. Therefore, we replace the conditions Φ L 0 with L 0, the condition Φ − Φ L 0 with X − L 0. Note that these conditions only guarantees X 0 and thus Φ to be positive semidefinite, however we will show that this is sufficient to guarantee that Φ 0 a.e. at the optimum.
2) Constraint Φ − Φ L diagonal: Let ofd : R m×m → R m×m denote the linear operator such that, given A ∈ R m×m , ofd(A) is the matrix in which each off-diagonal element is equal to the corresponding element of A and each diagonal element is zero.We define the "block ofd" linear It is not difficult that ofd B is a self-adjoint operator, since ofd is self-adjoint as well.Then, it is easy to see that the condition 3) The Low Rank Regularizer: We have where we exploited the fact that e ijϑ = 1 if j = 0, and e ijϑ = 0 otherwise.

4) The Divergence Constraint:
A convenient matrix parameterization of the Itakura-Saito divergence S IS (Φ|| Φ) can be obtained by making use of the following facts.
First, since Φ = ∆X∆ * with X 0, there exists A ∈ R m×m(n+1) such that X = A A. Then, by using the Jensen-Kolmogorov formula we obtain which holds provided that X 00 0 and Φ is coercive (i.e.|Φ| is bounded away from zero on the unit circle).We need to generalize this result to spectral densities that may be singular on the unit circle.This is possible because the zeros of a rational spectral density, if any, have finite multiplicity so that the logarithm of the determinant of a rational spectral Φ is integrable as long as the normal rank of Φ is full.
be such that X 0, X 00 0, and Φ = ∆X∆ * .Then The proof is deferred to the appendix.
A second observation in order to conveniently parameterize the Itakura-Saito divergence constraint is that, by exploiting the cyclic property of the trace, where P is defined from the expansion Summing up, we get the following matrix re-parametrization of Problem ( 5): We remark once again that to prove the equivalence between ( 5) and (11) we still need to show that Φ 0 a.e. at the optimum: this fact will be established after the variational analysis.

B. The Dual Problem
We reformulate the constrained minimization problem in (11) as an unconstrained problem by means of Duality Theory.
Note that we have not included the constraint X 00 0 because, as we will show later on, this condition is automatically met by the solution of the dual problem.
The dual function is defined as the infimum of L over X and L. Thanks to the convexity of the Lagrangian, we rely on standard variational methods to characterize the minimum.
• Partial minimization with respect to L: L depends on L only through Thus, we get that −∞ otherwise.
• Partial minimization with respect to X: The terms in X 00 are bounded below only if and are minimized if λ > 0 and The Lagrangian is linear in the remaining variables X lh , for (l, h) = (0, 0), and therefore bounded below only if Therefore, the minimization of the Lagrangian with respect to X and L is finite if and only if ( 13), (14), and ( 16) hold in which case Otherwise the Lagrangian has no minimum and its infimum is −∞.
To simplify the notation, let us define the vector space O as: since Z always appears in the form ofd B (Z), we can replace it with Z ∈ O.Then, we can formulate the dual problem for the Lagrangian (12) as where and the feasible set C is given by: Note that the constraints I − U + V − T (Z) = 0 and U 0 are equivalent to the constraint Thus, we can eliminate the redundant variable U ; moreover, by changing the sign to the objective function J and observing that V 00 ), we can rewrite (17) as a minimization problem: where and the corresponding feasible set C is: July 9, 2021 DRAFT

C. Existence of solutions
The aim of this section is to show that (18) admits solution.The set C is not compact, as it is neither closed nor bounded.We show that we can restrict the search of the minimum of J over a compact set.Then, since the objective function is continuous over C (and hence over the restricted compact set), we can use Weierstrass's Theorem to conclude that the problem does admit a minimum.
The first step consists in showing that we can restrict C to a subset where λ ≥ ε with ε > 0 a positive constant.
) k∈N be a sequence of elements in C such that Then, such a sequence cannot be an infimizing sequence.
The proof is essentially the same as the proof of Proposition 6.1 in [6] and it is therefore omitted.
As a consequence, minimizing the dual functional over the set C is equivalent to minimize it over the set: Next we show that we can restrict C 1 to a subset in which both (T (Z) − V ) and λ cannot diverge.
) k∈N be a sequence of elements in C 1 such that either or both.Then, such a sequence cannot be an infimizing sequence.
The above result is obtained by following arguments similar to the proof of Proposition 6.2 in [6] with a few small differences; we refer the interested reader to [23, Appendix C] for the detailed proof.
It follows from the previous proposition that there exists β ∈ R with | β |< ∞ such that T (Z) − V βI, and 0 < γ < ∞ such that λ ≤ γ.Therefore, the set C 1 can be further restricted to the set: In addition, it is not possible for V and Z to diverge while keeping the difference T (Z) − V finite.Accordingly, we can further restrict the search for the optimal solution to a subset C 3 in which neither V nor Z can diverge: or or both.Then, such a sequence cannot be an infimizing sequence.
The proof can be found in the appendix.
Proof.Equivalence of the two problems has already been proven by the previous arguments.
Since C C is closed and bounded, hence compact, and J is continuous over C C , by the Weierstrass's Theorem the minimum exists.

IV. SOLUTION OF THE PRIMAL PROBLEM
In this section, after proving that the primal problem ( 5) and its matrix reformulation (11) are equivalent, we show how to recover the solution of the primal problem.
Since X • 00 is positive definite, log |X • 00 | is finite.By Lemma 3.1, at the optimum log |Φ| must be finite as well; this implies that Φ(e iϑ ), ϑ ∈ [−π, +π] , may be singular at most on a set of zero measure, or, in other terms, ∆X • ∆ * 0 a.e..This observation leads to the following proposition: Proposition 4.1: Let (X • , L • ) be a solution of (11).Then ∆X • ∆ * 0 a.e.. Accordingly, (5) and (11) are equivalent.Now we are ready to show how to recover the solution of the primal problem; to this aim we need the following result, see [24].
Since the duality gap between ( 11) and ( 18) is equal to zero, we have that V Then, there exists a full-row rank matrix A ∈ R m×m(n+1) such that By (24), it follows that )×l denote the matrix whose columns form a basis of ker(V • ).Note that the dimension l of the null space of V • is at least m because Im(A ) ⊆ ker(V • ) and rank(A ) = m; also l ≤ m because rank(V • ) ≥ mn.
Rewriting the matrix A as A = Y D S with S ∈ R l× m, from ( 25) we obtain with In a similar fashion, by the zero duality gap between ( 11) and ( 18), the complementary slackness condition for the multiplier associated to the positive semi-definiteness of L reads as U • , L • = 0, which in turn implies U • L • = 0. Repeating the same reasoning as before, it can be seen that, if the dimension of the null space of U • is r with r ≥ r and )×r is a matrix whose columns form a basis of ker(U • ), then L • can be written as with Q L ∈ Q r unknown.Plugging ( 27) into (26), we then obtain Assume now that each block of Remark 1: We can make the previous assumption without loss of generality.Indeed, let (Φ • , Φ • L ) be the solution of Problem ( 5) and We can always consider a different matrix parametrization ( X, L, D) for Φ • , Φ • L and Φ • D as follows.First notice that there always exists a matrix D with all diagonal blocks such that Φ • D = ∆ D∆ * ; in other words, we can always find δD ∈ Q m(n+1) such that ∆δD∆ * = 0 and D := D + δD satisfies ofd( D hk ) = 0 for h, k = 0, ..., n.Now, let δX ∈ Q m(n+1) such that ∆δX∆ * = 0 and X := X + δX satisfies (15).Define L = X − D = X − D + δL with δL := δX − δD.It is easy to see that Φ • = ∆ X∆ * and ΦL = ∆ L∆ * .This means that ( X, L) is still a solution of Problem (11) and it allows us to restrict to solutions of (11) for which (29) holds.
By applying the ofd operator to both sides of ( 28) and exploiting the assumption (29), it is not difficult to obtain: which is a system of m(m − 1)/2 linear equations in the r(r + 1)/2 unknowns Q L .Notice that X 00 is given by (15).Finally, once L • is computed, in order to retrieve Q D we exploit (29) and the following system of m(m + 1)/2 linear equations: Since both the dual and the primal problem admit solution, the resulting systems of equations ( 29), ( 30) and (31) do admit solutions.

V. THE PROPOSED ALGORITHM
In this section we propose an algorithm to solve numerically the dual problem.To start with, as observed in Section IV, we rewrite (18) in a different fashion by getting rid of the slack . This is done by introducing a new variable W ∈ Q m defined, similarly to (23), as such that, as in (22), the variable V can be expressed as Accordingly, the dual problem ( 18) can be expressed in terms of the variables λ, W and Z as follows: where and the corresponding feasible set C is: We can further simplify our problem as follows.First, we observe that the constraint In a similar fashion, the last matricial inequality constraint in C can be equivalently expressed as W R(λ) where R(λ) := I + T 0,0 (λ P ) − T 0,1:n (λ P ) I + T 1:n,1:n (λ P ) −1 × T 0,1:n (λ P ).
Therefore, Problem ( 18) can be formulated as where Solving Problem (36) simultaneously for λ, W, and Z is not trivial because the inequality constraints W Q(λ, Z) and W R(λ) both depend on λ.On the other hand, once we fix the dual variable λ to a positive constant λ > 0, the problem: with can be efficiently solved by resorting to the ADMM algorithm [25].To this aim, we rewrite where and Q + m denotes the cone of symmetric positive semidefinite matrices of size m × m.The augmented Lagrangian for (38) is: where M ∈ Q m is the Lagrange multiplier, and ρ > 0 is the penalty parameter.Accordingly, given the initial guesses W (0) , Z (0) , Y (0) and M (0) , the ADMM updates are: Problem (39) does not admit a closed form solution, therefore we approximate the optimal solution by a gradient projection step: where: • ∇ W L ρ (W, Z, Y, M ) denotes the gradient of the augmented Lagrangian with respect to W : • ∇ Z L ρ (W, Z, Y, M ) denotes the gradient of the augmented Lagrangian with respect to Z: where the omitted argument of the operators T 0,1:n and T 1:n,1:n is intended to be equal to (Z + λ P ).
• Π O denotes the projection operator onto O: • Π denotes the projection operator onto the convex cone {S ∈ Q m : S R( λ)}.It is not difficult to see that where Π + is the projection operator onto the cone Q + m .• the step-size t k is determined at each step k in an iterative fashion: we start by setting t k = 1 and we decrease it progressively of a factor β, with 0 < β < 1, until the conditions 0 and T 1:n,1:n (Z (k+1) + λ P ) 0 are met and the Armijo's condition [26] is satisfied.
Problem (40) admits a closed form solution, which can be easily computed as: To define the stopping criterion, we need to introduce the following quantities which are referred to as the primal and dual residual, respectively.Notice that the omitted argument of the operators T 0,1:n and T 1:n,1:n is intended to be equal to (Z (k+1) + λ P ).
Then, the algorithm stops when the following conditions are met: where ε ABS and ε REL are the desired absolute and relative tolerances.
It remains to determine the optimal value λ • for λ which solves Problem (36).To this aim, we exploit the following result (see [26, pp.87-88]): and C is a convex non-empty set, then the function is convex in x, provided that g(x) > −∞ for some x.The domain of g is the projection of dom(f ) on its x-coordinates.
This result guarantees that the function is convex in λ.Hence, in order to determine λ • = arg min λ>0 g(λ) we can choose an initial interval of uncertainty [a, b] containing λ • , and we progressively reduce it by evaluating g(λ) at two points within the interval placed symmetrically, each at distance h > 0 from the midpoint.This is repeated until the width of the uncertainty interval is smaller than a certain tolerance l > 0.
The overall procedure to solve the dual problem (36) is summarized in Algorithm 1.

VI. IDENTIFICATION OF ARMA FACTOR MODELS
In this section we extend the proposed approach to ARMA processes.Consider the ARMA factor model: where a k e −iϑk , a k ∈ R and W L , W D , u and w are defined analogously to (1).Notice that y M A (t) := ay(t) = W L u(t) + W D w(t) is a MA process of order n whose spectral density Φ = W L W * L + W D W * D ∈ Q m,n admits a low rank plus diagonal decomposition.Finally, it is worth noting that it is not restrictive to assume that the autoregressive part in (42) is characterized by a scalar filter a; Indeed, any ARMA factor model can be written in the form of (42).
Assume now to collect a realization y N = { y(1) . . .y(N ) } of numerosity N of the process y.Our aim is to estimate the factor model ( 42) and the number of factors r.Before proceeding, the following observation needs to be made: there is an identifiability issue in the problem.
Indeed, if we multiply a(z), W L and W D by an arbitrary non-zero real number c, the model remains the same.We can easily eliminate this uninteresting degree of freedom by normalizing the polynomial a(z), so that from now on we assume a 0 = 1.
The idea is to estimate first a, and then Φ L and Φ D by preprocessing y N through a.In more detail, the proposed solution consists of the following two steps: 1) The AR dynamic estimation.Given the realization y N , we estimate the p parameters of the filter a by applying the maximum likelihood estimator proposed in [27,Section II.b].
In doing so, we are estimating an AR process whose spectral density is a −1 (a −1 ) * I m .
2) The MA dynamic factor analysis.Let y N M A be the finite length trajectory obtained by passing through the filter a • (e iϑ ) the trajectory y N with zero initial conditions.After computing the truncated periodogram Φ ∈ Q m,n from y N M A , we solve Problem (5) with Φ in order to recover the number of latent factors.
Although the above procedure is suboptimal, the numerical simulations showed that the resulting estimator of the number of factors performs well, see Section VII-B.

VII. NUMERICAL SIMULATIONS
In this section, we test the performance of the proposed approach both for MA and ARMA factor models.In all the simulations, the parameter δ is computed according to the empirical procedure of Section II-A for α = 0.5.Then, Problem (36) is solved by applying Algorithm 1 with l = 7 and h = 3.In regard to the ADMM algorithm, we set ε ABS = 10 −4 , ε REL = 10 −4 and the penalty term ρ = 0.05.)) = 31.29,that is the idiosyncratic component is not negligible with respect to the latent variable.We generate from the model a sample y N of length N = 5000 and we apply the proposed identification procedure to estimate the number of common factors.We define L (e iθ )) where σ j (Φ • L (e iθ ) denotes the j − th largest eigenvalue of Φ • L at frequency θ.It is clear that s j represents the integral of the j − th largest normalized singular value of Φ • L over the unit circle.The quantities s j are plotted in Figure 1; we can notice that there is a knee point at j = 2, so that the numerical rank of Φ • L is equal to 2 and, in doing so, we can recover the exact number of common factors.As showed in Figure 2, even in this case we are able to estimate the correct number of latent variables.
Finally, we obtained similar results with different samples and by changing the "true" factor model from which we generated the data.

B. Synthetic Example -ARMA factor models
To provide empirical evidence of the estimation performance of the algorithm of Section VI, a Monte Carlo simulation study composed of 50 experiments is performed.We randomly build an ARMA factor model (42) with m = 40, r = 2, n = 2 and p = 2; without loss of generality we fix a 0 = 1.Then, for each Monte Carlo experiment a data sequence of length N = 5000 is randomly generated from the model and the ARMA factor model identification procedure is performed.The boxplot of the quantities s j for the estimated Φ • L 's are shown in Figure 3 and it reveals that the proposed identification procedure is able to successfully recover the number of latent factors.

C. Smart Building Dataset
The SMLsystem is a house built in Valencia at the Universidad CEU Cardenal Herrera (CEU-UCH).It is a modular house that integrates a whole range of different technologies to improve energy efficiency, with the objective to construct a near zero-energy house.A complex monitoring system has been used in the SMLsystem: it has indoor sensors for temperature, humidity and carbon dioxide; outdoor sensors are also available for lighting measurements, wind speed, rain, sun irradiance and temperature.We refer the reader to [28] for a detailed description of the building and its monitoring system.Two datasets from the SMLsystem are available for download at the UCI Machine Learning repository http://archive.ics.uci.edu/ml.We take into account m = t=N 1 +1 y j (t) and ŷj (t|t−1) is the one-step ahead prediction at time t computed with zero initial conditions.The figure shows that the ARMA factor model matches quite well the measurement data y N 2 , reaching fit values similar to the PEM estimate.This is a remarkable result as the performances of the two approaches are essentially the same, but the factor model is parameterized by 257 coefficients, much less than the 869 coefficients of the PEM estimate.

VIII. CONCLUSION
A procedure to estimate the number of factors and to learn ARMA factor models has been proposed.This method is based on the solution of an optimization problem whose solution has been proven to exist via dual analysis.The simulations results applying the procedure both to synthetic and real data provide evidence of a good performance.Φ n := Φ + 1 n I with n ∈ N and let W n := ∆A n be a spectral factor of Φ n with A n ∈ R m×m(n+1) .Clearly, lim n→+∞ Φ n = Φ; accordingly, lim n→+∞ W n = W and lim n→+∞ A n = A. Since Φ n 0 ∀ϑ we can exploit (10)  To conclude the proof, it remains to show that in the left side of the previous equation it is possible to interchange the limit and the integral operators.To this aim, we introduce the sequence {f n } +∞

Fig. 1 :
Fig. 1: Estimated MA factor model with n = 2, m = 40, and r = 2. Integral over the unit circle of the first 20 normalized singular values of Φ • L with N = 5000.
Fig. 2: Estimated MA factor model with n = 2, m = 40, and r = 4. Integral over the unit circle of the first 30 normalized singular values of Φ • L with N = 5000

Fig. 3 :
Fig. 3: Estimated ARMA factor model with m = 40, r = 2, n = 2 and p = 2. Box-plot of the integral over the unit circle of the first 15 normalized singular values of Φ • L with N = 5000.

17  1 −N 1 +N 2 t=N 1 2 N 1 +N 2 t=N 1 2   where ȳj := 1 N 2 N 1
sensor signals extracted from these datasets: the indoor temperature (in • C) of the dinningroom and of the room, the weather forecast temperature (in • C), the carbon dioxide (in ppm) in the dinning room and in the room, the relative humidity (in %) in the dinning room and the room, the lighting in the dinning room and the room (in lx), the sun dusk, the wind (in cm/sec), the sun light (in klx) in the west, east and south facade, the sun irradiance (in dW), the outdoor temperature (in • C) and finally the outdoor relative humidity (in %).The data are sampled with a period of T = 15min and each sample is the mean of the last quarter, reducing in this way the signal noise.The first dataset y N 1 = { y(1), . . ., y(N 1 ) } was captured during March 2011 and has N 1 = 2764 points (≈ 28 days), while the second dataset y N 2 = { y(N 1 +1), . . ., y(N 1 +N 2 ) } has N 2 = 1373 points (≈ 14 days) collected in June 2011.It is reasonable to expect that the variability of the considered signals may be successfully explained by a smaller number of factors.Motivated by this reason, we apply the ARMA factor model identification procedure with parameters n = 2 and p = 2 using the realization y N 1 .As shown in Figure4, we obtain an estimate of r = 4 latent factors.For the sake of comparison, we also use the Matlab function armax() of the System Identification Toolbox to compute the prediction-error method (PEM) estimate for an ARMA model with polynomials A(z) and B(z) of order 2 and A(z) diagonal from the realization y N 1 .Finally, the second dataset y N 2 is used in the validation step to test the prediction capability of the two estimated ARMA models.The results are summarized in Figure5which displays for each output channel j = 1, . . ., m the fit (percentage) term: J F IT,j := 100 +1 (y j (t) − ŷj (t|t − 1)) +1 (y j (t) − ȳj ) +N 2

Fig. 4 :Fig. 5 :
Fig. 4: Application of the ARMA factor models identification procedure by using the measurements y N 1 from the SMLsystem as training data.The figure shows the integral over the unit circle of the normalized singular values of Φ o L .