Chapter Abstract:
This chapter examines the process of updating weights more closely and present some strategies for selecting the updates efficiently. Stochastic gradient descent (SGD) is...Show MoreMetadata
Chapter Abstract:
This chapter examines the process of updating weights more closely and present some strategies for selecting the updates efficiently. Stochastic gradient descent (SGD) is an alternative means of training artificial neural networks. SGD attempts to approximate the net gradient by sampling a subset of the gradients, that is, by only using a subset of the training set between weight updates. Batch training usually exhibits a monotonically decreasing loss function. In contrast, SGD losses can jump around and exhibit jittery behavior. Newton invented an iterative method for convex optimization in the seventeenth century. Newton's method for optimization approximates a solution by starting with an initial guess, then iteratively improving it. Computing the Jacobean also requires a modification to the usual training epoch. Instead of computing the net gradient by summing the per weight derivatives over the examples, the Jacobean records each derivative individually.
Page(s): 87 - 109
Copyright Year: 2024
Edition: 1
ISBN Information: