I. Introduction
Feedforward neural networks with random initialization of hidden weights have been used widely in classification and regression problems. Universal approximation ability [1]–[4] of the randomization based models has lead to their suc-cess. Normally feedforward neural networks consist of three layers-input layer, middle layer (hidden layer) and classifi-cation/output layer. Traditionally, backpropagation algorithms [5], [6] have been used to train the feed forward neural networks. However, these iterative methods suffer due to local minima problem, slow convergence and learning rate sensitivity. To overcome these issues, closed form solution [7]–[10] based randomized algorithms have been used. These models are faster to train and have shown better performance [11]–[13]. Standard extreme learning machine (ELM) [14] and standard random vector functional link network (RVFL) [15], [16] are among the randomization based algorithms. Both ELM and RVFL initialize the weights and biases randomly and keep these weights of hidden layer fixed while optimizing the final layer weights with closed form solution [14], [15], [17]. ELM has been successfully used in applications like action recognition [18], however, singularity of matrix from middle layer limits the applicability of the ELM. To overcome the singularity issue, the effective ELM [19] uses diagonally dominant criteria for choosing the network weights and biases. Optimization based regularized ELM method [14] showed improved performance by avoiding the full rank assumption. Exploitation of the training data dispersion in minimum vari-ance ELM (MVELM) for action recognition of humans [20] by minimizing the norm of output layer weights and the dispersion of the data in the projection space leads to improved performance. To further extend the generalization of the ELM, Hierarchical ELM (H-ELM) [21] is a multiple layer neural network which consists of multiple autoencoders wherein the weights are initialized randomly and the output network weights are trained independently. This architecture has been successfully used in semi-supervised [22] and unsupervised learning [23].