Skip to Main Content
We consider the estimation of the number of hidden states (the order) of a discrete-time finite-alphabet hidden Markov model (HMM). The estimators we investigate are related to code-based order estimators: penalized maximum-likelihood (ML) estimators and penalized versions of the mixture estimator introduced by Liu and Narayan (1994). We prove strong consistency of those estimators without assuming any a priori upper bound on the order and smaller penalties than previous works. We prove a version of Stein's lemma for HMM order estimation and derive an upper bound on underestimation exponents. Then we prove that this upper bound can be achieved by the penalized ML estimator and by the penalized mixture estimator. The proof of the latter result gets around the elusive nature of the ML in HMM by resorting to large-deviation techniques for empirical processes. Finally, we prove that for any consistent HMM order estimator, for most HMM, the overestimation exponent is .