By Topic

Stochastic Approximation and NonLinear Regression

Cover Image Copyright Year: 2003
Author(s): Arthur E. Albert; Leland A. Gardner
Publisher: MIT Press
Content Type : Books & eBooks
Topics: General Topics for Engineers
  • Print

Abstract

This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42

  •   Click to expandTable of Contents

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Frontmatter

      Arthur E. Albert ; Leland A. Gardner Page(s): i - xv
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Half Title, Title, Copyright, Dedication, Foreword, Preface, Contents View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Introduction

      Arthur E. Albert ; Leland A. Gardner Page(s): 1 - 6
      Copyright Year: 2003

      MIT Press eBook Chapters

      This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42 View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      The Scalar-Parameter Case

      Arthur E. Albert ; Leland A. Gardner Page(s): 7
      Copyright Year: 2003

      MIT Press eBook Chapters

      This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42 View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Probability-One and Mean-Square Convergence

      Arthur E. Albert ; Leland A. Gardner Page(s): 9 - 26
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: The Basic Assumptions (A1 Through A5‴), Theorems Concerning Probability-One and Mean-Square Convergence for General Gains, The Prototype Deterministic Gain, Reduction in the Linear Case, Gains That Use Prior Knowledge, Random Gains, Theorems Concerning Probability-One and Mean-Square Convergence for Particular Gains; Application to Polynomial Regression, Trigonometric Regression, Exponential Regression View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Moment Convergence Rates

      Arthur E. Albert ; Leland A. Gardner Page(s): 27 - 37
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Restricted Gain Sequence, Theorems Concerning Moment Convergence Rates, Power-Law Derivative, Relevance to Stochastic Approximation, Generalization View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Asymptotic Distribution Theory

      Arthur E. Albert ; Leland A. Gardner Page(s): 38 - 59
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Notation for and Relations Between Modes of Convergence, Theorems Concerning Asymptotic Normality for General Gains, Alternative to the Continuous Convergence Assumption, Large-Sample Variances for Particular Gains, Other Gains, Gain Comparison and Choice of Gain Constants, A General Stochastic Approximation Theorem View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Asymptotic Efficiency

      Arthur E. Albert ; Leland A. Gardner Page(s): 60 - 77
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Asymptotic Linearity, Increased Efficiency via Transformation of the Parameter Space, Asymptotic Efficiency and Summary Theorem, Increased Efficiency, Large-Sample Confidence Intervals, Choice of Indexing Sequence, A Single-Parameter Estimation Problem View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      The Vector-Parameter Case

      Arthur E. Albert ; Leland A. Gardner Page(s): 79
      Copyright Year: 2003

      MIT Press eBook Chapters

      This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42 View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Mean-Square and Probability-One Convergence

      Arthur E. Albert ; Leland A. Gardner Page(s): 81 - 108
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Theorem Concerning Divergence to Zero of Products of Elementary Matrices and Assumptions (B1 Through B5), Discussion of Assumptions and Proof, Theorems Concerning Mean-Square and Probability-One Convergence for General Gains and Assumptions (C1 Through C6′ and D1 Through D5), Truncated Vector Iterations, Conjectured Theorem and Assumptions (E1 Through E6′), Batch Processing View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Complements and Details

      Arthur E. Albert ; Leland A. Gardner Page(s): 109 - 145
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Optimum Gains for Recursive Linear Regression, “Quick and Dirty” Recursive Linear Regression, Optimum Gains for Recursive Linear Regression. Batch Processing, “Quick and Dirty” Linear Regression. Batch Processing, Gain Sequences for Recursive Nonlinear Regression. The Method of Linearization, Sufficient Conditions for Assumptions E1 ThroughE E6′ (E6) When the Gains (Equations 7.48) Are Used, Limitations of the Recursive Method. III Conditioning, Response Surfaces View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Applications

      Arthur E. Albert ; Leland A. Gardner Page(s): 146 - 181
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Vector Observations and Time-Homogeneous Regression, Estimating the Initial State of a Linear System via Noisy Nonlinear Observations, Estimating Input Amplitude Through an Unknown Saturating Amplifier, Estimating the Parameters of a Time-Invariant Linear System, Elliptical Trajectory Parameter Estimation View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Open Problems

      Arthur E. Albert ; Leland A. Gardner Page(s): 182 - 188
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Proof of the Conjectured Theorem, Extensions of Chapters 3 Through 5 to the Vector-Parameter Case, Kalman-Type Filtering Theory for Nonlinear Systems View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Lemmas 1 Through 8

      Arthur E. Albert ; Leland A. Gardner Page(s): 189 - 199
      Copyright Year: 2003

      MIT Press eBook Chapters

      This chapter contains sections titled: Lemma 1, Lemma 2, Lemma 3, Lemma 4, Lemma 5. Lemma 6, Lemma 7, Lemma 8 View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      References

      Arthur E. Albert ; Leland A. Gardner Page(s): 200 - 201
      Copyright Year: 2003

      MIT Press eBook Chapters

      This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42 View full abstract»

    • Full text access may be available. Click article title to sign in or learn about subscription options.

      Index

      Arthur E. Albert ; Leland A. Gardner Page(s): 203 - 204
      Copyright Year: 2003

      MIT Press eBook Chapters

      This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. It examines the problem of nonlinear regression, where observations are made on a time series whose mean-value function is known except for a vector parameter. In contrast to the traditional formulation, data are imagined to arrive in temporal succession. The estimation is carried out in real time so that, at each instant, the parameter estimate fully reflects all available data.Specifically, the monograph focuses on estimator sequences of the so-called differential correction type. The term "differential correction" refers to the fact that the difference between the components of the updated and previous estimators is proportional to the difference between the current observation and the value that would be predicted by the regression function if the previous estimate were in fact the true value of the unknown vector parameter. The vector of proportionality factors (which is generally time varying and can depend upon previous estimates) is called the "gain" or "smoothing" vector.The main purpose of this research is to relate the large-sample statistical behavior of such estimates (consistency, rate of convergence, large-sample distribution theory, asymptotic efficiency) to the properties of the regression function and the choice of smoothing vectors. Furthermore, consideration is given to the tradeoff that can be effected between computational simplicity and statistical efficiency through the choice of gains.Part I deals with the special cases of an unknown scalar parameter-discussing probability-one and mean-square convergence, rates of mean-square convergence, and asymptotic distribution theory of the estimators for various ch oices of the smoothing sequence. Part II examines the probability-one and mean-square convergence of the estimators in the vector case for various choices of smoothing vectors. Examples are liberally sprinkled throughout the book. Indeed, the last chapter is devoted entirely to the discussion of examples at varying levels of generality.If one views the stochastic approximation literature as a study in the asymptotic behavior of solutions to a certain class of nonlinear first-order difference equations with stochastic driving terms, then the results of this monograph also serve to extend and complement many of the results in that literature, which accounts for the authors' choice of title.The book is written at the first-year graduate level, although this level of maturity is not required uniformly. Certainly the reader should understand the concept of a limit both in the deterministic and probabilistic senses (i.e., almost sure and quadratic mean convergence). This much will assure a comfortable journey through the first fourth of the book. Chapters 4 and 5 require an acquaintance with a few selected central limit theorems. A familiarity with the standard techniques of large-sample theory will also prove useful but is not essential. Part II, Chapters 6 through 9, is couched in the language of matrix algebra, but none of the "classical" results used are deep. The reader who appreciates the elementary properties of eigenvalues, eigenvectors, and matrix norms will feel at home.MIT Press Research Monograph No. 42 View full abstract»