<![CDATA[ IEEE Transactions on Neural Networks and Learning Systems - new TOC ]]>
http://ieeexplore.ieee.org
TOC Alert for Publication# 5962385 2018May 24<![CDATA[Table of contents]]>296C12037122<![CDATA[IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS publication information]]>296C2C285<![CDATA[Special Issue on Deep Reinforcement Learning and Adaptive Dynamic Programming]]>Nature 2015, Google DeepMind published a paper “Human-level control through deep reinforcement learning.” Furthermore, in the first issue of Nature 2016, it published a cover paper “Mastering the game of Go with deep neural networks and tree search” and proposed the computer Go program, AlphaGo. In March 2016, AlphaGo beat the world’s top Go player Lee Sedol by 4:1. This becomes a new milestone in artificial intelligence history, the core of which is the algorithm of deep reinforcement learning (RL).]]>296203820411533<![CDATA[Optimal and Autonomous Control Using Reinforcement Learning: A Survey]]>$mathcal {H}_{2}$ and $mathcal {H}_infty $ control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.]]>296204220621384<![CDATA[Applications of Deep Learning and Reinforcement Learning to Biological Data]]>omics, bioimaging, medical imaging, and (brain/body)–machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.]]>296206320793024<![CDATA[Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion]]>Centipede and Crossy Road. Our results indicate that our approach yields better performing policies in fewer episodes than stochastic-based exploration strategies. We show that the training rate for our approach can be further improved by using the policy cross entropy to guide our criterion’s hyperparameter selection.]]>296208020983961<![CDATA[Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure]]>$Q^{(0)}(x,a)geqslant 0 $ . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.]]>296209921111789<![CDATA[Optimal Guaranteed Cost Sliding Mode Control for Constrained-Input Nonlinear Systems With Matched and Unmatched Disturbances]]>296211221261005<![CDATA[Robust ADP Design for Continuous-Time Nonlinear Systems With Output Constraints]]>296212721381228<![CDATA[Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning]]>296213921533758<![CDATA[Approximate Dynamic Programming: Combining Regional and Local State Following Approximations]]>296215421662941<![CDATA[Suboptimal Scheduling in Switched Systems With Continuous-Time Dynamics: A Least Squares Approach]]>296216721782638<![CDATA[Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on Adaptive Critic Design]]>296217921911394<![CDATA[Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning]]>296219222031991<![CDATA[Reusable Reinforcement Learning via Shallow Trails]]>metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks.]]>296220422151637<![CDATA[Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning]]>296221622261880<![CDATA[Multisource Transfer Double DQN Based on Actor Learning]]>296222722381857<![CDATA[Action-Driven Visual Object Tracking With Deep Reinforcement Learning]]>296223922523518<![CDATA[Extreme Trust Region Policy Optimization for Active Object Recognition]]>29622532258632<![CDATA[Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning]]>296225922702167<![CDATA[A Discrete-Time Recurrent Neural Network for Solving Rank-Deficient Matrix Equations With an Application to Output Regulation of Linear Systems]]>29622712277954<![CDATA[Online Learning Algorithm Based on Adaptive Control Theory]]>Theorems 2–4 in this paper.]]>296227822932437<![CDATA[User Preference-Based Dual-Memory Neural Model With Memory Consolidation Approach]]>296229423083243<![CDATA[Online Hashing]]>296230923221579<![CDATA[GoDec+: Fast and Robust Low-Rank Matrix Decomposition Based on Maximum Correntropy]]>296232323364026<![CDATA[A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine]]>$mathbf {hat {U}}$ decomposition algorithm, and matrix $mathbf {V}$ decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an $8.71times$ speedup on a cluster with ten nodes, and reaches a $13.79times$ speedup with 15 nodes, an $18.74times$ speedup with 20 nodes, a $23.79times$ speedup with 25 nodes, a $28.89times$ speedup with 30 nodes, and a $33.81times$ speedup with 35 nodes.]]>296233723512549<![CDATA[Nonlinear Decoupling Control With ANFIS-Based Unmodeled Dynamics Compensation for a Class of Complex Industrial Processes]]>296235223662309<![CDATA[Online Learning Algorithms Can Converge Comparably Fast as Batch Learning]]>$p$ -norm hinge loss functions with $pin [{1,2}]$ , the learning rates are the same as those for Tikhonov regularization and can be of order $O(T^{- {(1 / 2)}} log T)$ , which are nearly optimal up to a logarithmic factor. Our novelty lies in a sharp estimate for the expected values of norms of the learning sequence (or an inductive argument to uniformly bound the expected risks of the learning sequence in expectation) and a refined error decomposition for online learning algorithms.]]>29623672378471<![CDATA[Spiking, Bursting, and Population Dynamics in a Network of Growth Transform Neurons]]>$Sigma Delta $ modulators, and for designing SVMs that learn to encode information using spikes and bursts. It is demonstrated that the emergent switching, spiking, and burst dynamics produced by each neuron encodes its respective margin of separation from a classification hyperplane whose parameters are encoded by the network population dynamics. We believe that the proposed growth transform neuron model and the underlying geometric framework could serve as an important tool to connect well-established machine learning algorithms like SVMs to neuromorphic principles like spiking, bursting, population encoding, and noise shaping.]]>296237923915717<![CDATA[Uncertain Data Clustering in Distributed Peer-to-Peer Networks]]>296239224061291<![CDATA[Distributed Optimal Consensus Over Resource Allocation Network and Its Application to Dynamical Economic Dispatch]]>$theta$ -logarithmic barrier. By the facilitation of the graph Laplacian, a fully distributed continuous-time multiagent system is developed for solving the problem. Specifically, to avoid high singularity of the $theta$ -logarithmic barrier at boundary, an adaptive parameter switching strategy is introduced into this dynamical multiagent system. The convergence rate of the distributed algorithm is obtained. Moreover, a novel distributed primal–dual dynamical multiagent system is designed in a smart grid scenario to seek the saddle point of dynamical economic dispatch, which coincides with the optimal solution. The dual decomposition technique is applied to transform the optimization problem into easily solvable resource allocation subproblems with local inequality constraints. The good performance of the new dynamical systems is, respectively, verified by a numerical example and the IEEE six-bus test system-based simulations.]]>296240724181327<![CDATA[Distributed Adaptive Containment Control for a Class of Nonlinear Multiagent Systems With Input Quantization]]>296241924281749<![CDATA[Data-Driven Learning Control for Stochastic Nonlinear Systems: Multiple Communication Constraints and Limited Storage]]>296242924401336<![CDATA[Reversed Spectral Hashing]]>296244124495362<![CDATA[Structure Learning for Deep Neural Networks Based on Multiobjective Optimization]]>296245024633069<![CDATA[On the Dynamics of Hopfield Neural Networks on Unit Quaternions]]>et al. Contrary to what was expected, we show that the MV-QHNN, as well as one of its variation, does not always come to rest at an equilibrium state under the usual conditions. In fact, we provide simple examples in which the network yields a periodic sequence of quaternionic state vectors. Afterward, we turn our attention to the continuous-valued quaternionic Hopfield neural network (CV-QHNN), which can be derived from the MV-QHNN by means of a limit process. The CV-QHNN can be implemented more easily than the MV-QHNN model. Furthermore, the asynchronous CV-QHNN always settles down into an equilibrium state under the usual conditions. Theoretical issues are all illustrated by examples in this paper.]]>296246424711056<![CDATA[End-to-End Feature-Aware Label Space Encoding for Multilabel Classification With Many Classes]]>2FE), to perform LSDR. Instead of requiring an encoding function like most previous works, E^{2}FE directly learns a code matrix formed by code vectors of the training instances in an end-to-end manner. Another distinct property of E^{2}FE is its feature awareness attributable to the fact that the code matrix is learned by jointly maximizing the recoverability of the label space and the predictability of the latent space. Based on the learned code matrix, E^{2}FE further trains predictive models to map instance features into code vectors, and also learns a linear decoding matrix for efficiently recovering the label vector of any unseen instance from its predicted code vector. Theoretical analyses show that both the code matrix and the linear decoding matrix in E^{2}FE can be efficiently learned. Moreover, similar to previous works, E^{2}FE can be specified to learn an encoding function. And it can also be extended with kernel tricks to handle nonlinear correlations between the feature space and the latent space. Comprehensive experiments conducted on diverse benchmark data sets with many classes show consistent performance gains of E^{2}FE over the state-of-the-art methods.]]>296247224873540<![CDATA[Improved Stability and Stabilization Results for Stochastic Synchronization of Continuous-Time Semi-Markovian Jump Neural Networks With Time-Varying Delay]]>296248825012023<![CDATA[Robust Latent Subspace Learning for Image Classification]]>296250225153635<![CDATA[New Splitting Criteria for Decision Trees in Stationary Data Streams]]>$I$ splitting criteria guarantee, with high probability, the highest expected value of split measure. Type-$II$ criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.]]>296251625292488<![CDATA[A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection]]>296253025444395<![CDATA[Substructural Regularization With Data-Sensitive Granularity for Sequence Transfer Learning]]>296254525572633<![CDATA[Exponential Synchronization of Networked Chaotic Delayed Neural Network by a Hybrid Event Trigger Scheme]]>$mathcal {X}$ , $mathcal {Y}$ , $mathcal {Z}$ )-dissipativity performance index. Moreover, hybrid event trigger scheme and controller are codesigned for network-based delayed neural network to guarantee the exponential synchronization between the master and slave systems. The effectiveness and potential of the proposed results are demonstrated through a numerical example.]]>296255825671184<![CDATA[Multiclass Learning With Partially Corrupted Labels]]>296256825801944<![CDATA[Boundary-Eliminated Pseudoinverse Linear Discriminant for Imbalanced Problems]]>296258125946145<![CDATA[An Information-Theoretic-Cluster Visualization for Self-Organizing Maps]]>296259526134645<![CDATA[Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems]]>a priori knowledge of the system dynamics. Fundamentally different from adaptive optimal stabilization problems, the solution to a Hamilton-Jacobi–Bellman (HJB) equation, not necessarily a positive definite function, cannot be approximated through the existing iterative methods. This paper proposes a novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis. A two-phase data-driven learning method is developed and implemented online by ADP. The efficacy of the proposed adaptive optimal tracking control methodology is demonstrated via a Van der Pol oscillator with time-varying exogenous signals.]]>296261426241419<![CDATA[On the Impact of Regularization Variation on Localized Multiple Kernel Learning]]>$ell _{p}$ -norm LMKL, matrix-regularized $(r,p)$ -norm LMKL, and samplewise $ell _{p}$ -norm LMKL. Further comparison of these bounds helps to qualitatively reveal the performance differences produced by these regularization methods, that is, matrix-regularized LMKL achieves superior performance, followed by vector $ell _{p}$ -norm LMKL and samplewise $ell _{p}$ -norm LMKL. Finally, a set of experimental results on ten benchmark machine learning UCI data sets is reported and shown to empirically support our theoretical analysis.]]>29626252630481<![CDATA[Structured Learning of Tree Potentials in CRF for Image Segmentation]]>linear combination of some predefined parametric models, and then, methods, such as structured support vector machines, are applied to learn those linear coefficients. We instead formulate the unary and pairwise potentials as nonparametric forests—ensembles of decision trees, and learn the ensemble parameters and the trees in a unified optimization problem within the large-margin framework. In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs. Moreover, we learn classwise decision trees for each object that appears in the image. Experimental results on several public segmentation data sets demonstrate the power of the learned nonlinear nonparametric potentials.]]>296263126371098<![CDATA[Adaptive Backstepping-Based Neural Tracking Control for MIMO Nonlinear Switched Systems Subject to Input Delays]]>29626382644521<![CDATA[Memcomputing Numerical Inversion With Self-Organizing Logic Gates]]>$n$ -bit precision in the output requires extending the circuit by at most $n$ bits. This type of numerical inversion can be implemented by DMM units in hardware; it is scalable, and thus of great benefit to any real-time computing application.]]>296264526501124<![CDATA[Graph Regularized Restricted Boltzmann Machine]]>29626512659874<![CDATA[A Self-Paced Regularization Framework for Multilabel Learning]]>29626602666912<![CDATA[IEEE Computational Intelligence Society Information]]>296C3C358<![CDATA[IEEE Transactions on Neural Networks information for authors]]>296C4C4132