Abstract
The problem of determining the proper size of an artificial neural network is recognized to be crucial. One popular approach is pruning which means training a larger than necessary network and removing unnecessary weights/nodes. Though pruning is commonly used in architecture learning of neural network, there is still no theoretical framework about it. We give an information geometric explanation of pruning. In information geometric framework, most kinds of neural networks form exponential or mixture manifold which has a natural hierarchical structure. In a hierarchical set of systems, a lower order system is included in the parameter space of a large one as a submanifold. Such a parameter space has rich geometrical structures that are responsible for the dynamic behaviors of learning. The pruning problem is formulated in iterative m-projections from the current manifold to its submanifold in which the divergence between the two manifolds is minimized, and it means meaning the network performance does not worsen over the entire pruning process. The result gives a geometric understanding and an information geometric guideline of pruning, which has more authentic theoretic foundation.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.