We present a view of the learning phase of statistical pattern recognition as a problem in optimum mode switching for learning systems which can operate in the supervised and nonsupervised modes. We assume the standardJ-category statistical pattern recognition model, in which patterns are represented as points in Euclideann-space and the learning problem is to estimate the unknowns in the problem probability structure. More specifically, we assume each learning sample can be processed in either mode, but the machine incurs a cost for this processing--a larger cost for processing in the supervised mode than in the nonsupervised mode. The goal is to have the machine make the decision for each learning pattern concerning mode usage that results in minimum expected cost to learn the unknowns to a predetermined accuracy. We treat the parametric problem as a problem in stochastic control. Simple closed-form expressions partially describing system performance are derived for very general problem probability structures for the case of good learning, or, equivalently, large number of learning samples. Among the results obtained for identifiable probability structures for this case are i) expressions for purely supervised and purely nonsupervised learning costs; ii) a proof that supervised learning is always faster (though not necessarily less costly) than is nonsupervised learning; iii) an example showing that, depending on the relative costs of the two mode usages as well as on the problem probability structure, the learning cost of an optimum combined-mode learning system can be remarkably lower than that of a pure-mode learning system; iv) an argument to the effect that the a posteriori distribution of the unknown parameter vector is asymptotically Gaussian for a wide range of mode usage policies; v) a fairly simple functional equation that can be solved numerically for the optimum mode usage policy (for some probability structures the nature of the optimum mode usage policy can be inferred without resorting to computer calculation); vi) the conclusion that in general optimum mode usage involves mode switching, i.e., pure-mode learning is not optimum. For the most general discretized nonidentifiable prob- ability structure, we show that dual-mode learning may be significantly less costly than is purely supervised learning. This example also illustrates the effectiveness of making use of hard constraints, imposed by prior knowledge or experimentation, in reducing learning cost.