Geometric Back-Propagation in Morphological Neural Networks

This paper provides a definition of back-propagation through geometric correspondences for morphological neural networks. In addition, dilation layers are shown to learn probe geometry by erosion of layer inputs and outputs. A proof-of-principle is provided, in which predictions and convergence of morphological networks significantly outperform convolutional networks.


I. INTRODUCTION
There exists much image-like data that is produced by contact probing: examples are depth by LiDAR or from time-of-flight sensors, radar images, and scanning microscopy.Since Serra's investigations [1], [2], the underlying algebraic structure of such data has been known as mathematical morphology.A coherent set of operations forms a consistent alternative framework to the convolution way of linear diffusion probing [3], [4], [5].The latter forms the basis of the convolutional neural network (CNN); and people have wondered about developing analogous morphological neural networks (MNNs) to process the morphological type of data [6], [7], [8].Convolution does not inherently respect the separation between pixels in 3D space, treating them as equidistant neighbors at all times, and cannot process occlusion naturally.On the other hand, morphological operations such as dilation, allow data to be probed by structuring elements in space, respecting separation and occlusion.
MNNs have recently seen a variety of successes in complex vision tasks: [9], [10] show that MNNs have vastly higher parameter efficiency in tasks such as digit recognition; [11] successfully removes artefacts in images caused by rain droplets and their method is extended in [12] introducing opening-closing networks; [13] shows that morphological operations and convolutions can supplement each other and achieve state-of-the-art performance in object boundary recognition; [14] uses deep MNNs to solve classification and multi-class segmentation tasks.The authors note that training becomes increasingly challenging as networks are deepened with more complex topology; [15] extends the work of [16] on equivariant scale networks by the use of morphological scale spaces, though only the first module of their network is actually morphological.
Even outside the scope of neural networks, morphology is used to encode rich computational features that are useful in a variety of contact-related tasks: [17] encodes surfaces of fractured archaeological objects into a set of morphological features with the goal of automatically fitting fragments; [18] develops a morphological variant of Deep r A geometric definition of the back-propagation of morphological operations that does not rely on linear approximation of morphological operations as previous works did, but rather on matching slopes of (locally convex) functions.
r A morphological definition for probe geometry learning by error bounding, especially suited to data acquired by an essentially morphological process.
r Confirmation of the theory in practical use on probe geometry es- timation in Scanning Probe Microscopy (SPM) and depth infilling on NYUv2.

II. METHOD
The goal of this paper is the direct application of morphological operations to neural networks and to back-propagate errors during training.As a brief recap, neural networks use the back-propagation algorithm [24] to update parameters and approximate a function for which data samples are available.Consider a network built up of L layers as a composite function f n=0 and let f (x n ) = y n be the output of the network.The network is trained through minimizing a number of training objectives E(t n , y n ), where t n is the target output.The advantage of the back-propagation algorithm is that updating the parameters of a single layer can be agnostic of the network architecture, as long as the local derivative of the error ∂E ∂f + (y) is known at the required point.For the remainder of this paper, a per-layer notation is therefore used, with input f − , output f + , probe p, and corresponding derivatives, as in Fig. 1.For MNNs, the terms   x) .From the chain rule, only the terms ∂E ∂f + (y) are required to obtain the derivatives of the error with respect to the input f − and the parameterized probe p (or kernel) of ⊗.
Dilation for functions is defined on the semi-ring (R −∞ , ∨, +) where ∨ denotes the supremum operation and + is addition.This algebraic system extends the set of reals R with minus infinity: In MNNs, the multiplication-addition scheme of convolution is replaced by an addition-supremum scheme of dilation [6].
A layer input signal f − : R D → R −∞ indexed by indicator variable x, and a structuring element (or probe) p : R D → R −∞ indexed by indicator variable z are combined to produce the morphological dilation as the layer output signal: ( Morphological back-propagation is derived in Section II-C through slope correspondences explained in Section II-A.As an intermediate step, the morphological derivative is given in Section II-B.Finally, in Section II-E, elements that do not contribute to any output in the forward pass, are bounded by means of morphological erosion in the backward pass of back-propagation.For brevity, a dilation layer is used in all derivations, but by morphological duality [1], [2] all arguments can be made for an erosion layer as well.

A. Slope Correspondences
Geometrically speaking, it is intuitive to regard morphological dilation as probing a signal with a mirrored and flipped structuring element p T (z) ≡ −p(−z) from above, lowering it until there is at least a single point of contact.As the probe moves, the reference point of the probe p T traces out the output signal f + .There may be several points of contact or even entire ranges where f − and p T can be in touch.On the other hand, it is never allowed that p T and f − intersect.A graphical example is shown in Fig. 2.
Not all locations x − on the input signal f − lead to an output point (x + , f + (x + )) since not all f − (x − ) can be touched by p T .However, all x + can be traced back to at least one location x − .For back-propagation of the error -as required for network learning-the goal is to map an error at x + back to any x − that caused it.
Theorem 1 (Originally From [3]).There is a provenance relationship between the slope of any point on the output signal f + (x + ), and the points that caused it through contact of the input signal f − (x − ) and the probe p(z − ).The contact location points are related through: and the slopes (gradients) obey Proof: For a point (x + , f + (x + )), there is always at least one input location x − at the input signal f − for which the supremum More specifically, the supremum is attained when the first derivative of f + with respect to z − at x + is zero (there is also a second order condition to make it a supremum rather than an infimum).As a consequence: The f − -slope at x − now relates to the p-slope at z − : At contact, those slopes are also related to the slope of the output signal f + , by differentiating (4) to x + : Combining ( 6) and ( 7) with the provenance (4) completes the proof.

B. Morphological Derivatives
During back-propagation all (x + , f + (x + )) are known, but their corresponding x − must be determined.These correspondences, henceforth called the provenance of points, can be obtained by matching slopes using probe p.
Theorem 2: The morphological derivative of a single-layer f + with respect to the input f − is where ∀[. . .] denotes the set of points (x + , x − ) for which the equality holds, and In any other case, the derivative is zero.Moreover, as a consequence of Theorem 1 each layer output location x + relates to a location x − on the input layer f − that caused the corresponding f + (x + ) given the current p by means of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
There are four subtleties captured in Theorem 2 that are not immediately apparent.The first concerns undefined provenances; the latter three concern the for all (i.e., ∀[. . .]) statement.
Undefined Provenance: There may be locations x − that have caused not a single x + in the forward pass, resulting in a derivative in the backward pass that is zero for that x − .At those locations x − the provenance (i.e., the correspondence between x − and x + ) is undefined.Therefore, ( 8) is called a sub-gradient [10], since it is not a local rate of change with any x − , but rather a zero-valued derivative resulting from an undefined provenance between x + to x − .
Multiple x + , Single x − : Multiple x + may have been caused by a single x − .Equation ( 8) allows to be back-propagated to a single x − even when it had caused multiple x + in the forward pass; the derivative ) is 1 when there exists at least one pair for which the equality ∇p(x + − x − ) = ∇f + (x + ) holds.
Single x + , Multiple x − : There may be an f + (x + ) caused by multiple x − , then f + at x + is not differentiable [25].At these singular points a one-sided derivative (e.g., left and right-sided derivatives in 1D) needs to be used to obtain valid slopes matching an x + to each x − .
Invertible ∇p.For a strictly convex probe, the location x − can directly be inferred from matching the slopes of the probe p and the output signal f + since then ∇p is invertible.Using (3), the invertibility of p due to convexity, and isolating x − : which for a convex p implies

C. Morphological Back-Propagation
With these subtleties noted, the goal is now to define ∂E ∂f− (x − ).The derivative of the error with respect to the input signal f − is more complicated than the morphological derivatives because the derivative of E with respect to the output layer f + may be different at each x + caused by a single x − .Since sets of distinct ∂E ∂f + (x + ) cannot be back-propagated to a single x − , they have to be aggregated.Dilation acts as an absolute effect on its input; a morphological error thus acts absolutely on the terms, not relative to their magnitudes as in convolution.Therefore, only the worst case error should back-propagate.To facilitate this, let EV : R D −∞ × R → R −∞ return the signed most extreme value of a function f over a subset of the domain x ⊆ dom(f ): where denotes the infimum operation.Using the signed most extreme value function EV and combining it with Theorem 2 provides: In summary, the back-propagated error ∂E ∂f− (x − ) is the transfer of the worst case positive or negative ∂E ∂f + (x + ) from the locations x + to the location(s) x − that caused x + ; these locations are found by matching slopes of the output signal f + and probe p, i.e., through their provenance.Parameters of the probe are updated using gradient descent.
The derivative of the error in the input f + due to the structuring element p is obtained similarly, since the dilation of ( 1) is symmetric in f − and p. Observe now that the back-propagated transfer of the error is to the provenance vector z − = x + − x − .Therefore the derivative of the error with respect to p(z) can be written as In practice, for discrete structures such as images, the (relative) provenance z − has to be recorded for each x + .This bookkeeping can be memory-intensive since an arbitrary number of x − may have caused any one x + due to the multi-valued nature of dilation, but it prevents issues in approximating slopes from sampled data.Conversely, for data that can reasonably be expected to be locally convex -like SPM-slope matching can theoretically yield a much higher quality gradient with sub-pixel accuracy.

D. Mean Back-Propagation
It is worth noting that auto-differentiation tools, such as Py-Torch [26], use another aggregation of linear back-propagation over multi-valued dilations (and similarly for max-pooling): where |x − | is the number of locations x − for which the maximum occurred.
The error term is averaged over multiple x − that caused a single x + to deal with the multi-valued nature of dilation.Then, all ∂E ∂f− (x + ) are summed for all x + that are caused by x − to yield a single scalar.In the present context, averaging over the provenance, denoted by 1  |x−| in (14), is considered to be a non-morphological operation and therefore it is avoided.
It can be surmised, however, that there are practical advantages especially when networks are composed mainly of (linear) convolutions.

E. Probe Learning by Error Bounding
The main shortcoming of sub-gradient-based morphological backpropagation is that it fails to propagate information about the elements that did not cause the error, yielding undefined provenances at those locations.In morphological back-propagation, undefined provenance results in a zero-valued derivative, but there may be better approximations of the error by bounding: since there are points that did not cause the supremum in the forward pass, it means that the error term propagated to those points in (1) can be upper bounded by the points that did cause the supremum.
To see how, consider a dilation with input f − , output f + , and structuring element p as before.The objective of a single layer in the network is to output a f * + such that the composite function ultimately minimizes a difference function Q(x) = f L (x) − t(x), where t(x) is the target function at x.Note here that where the morphological back-propagation from Section II-C was agnostic to the form of the error function E, the function Q is purely a difference function used to infer f * + from t.The input f − and the ideal output f * + for the layer considered are related through an unknown optimal probe: where p * is the (optimal) probe to achieve f * + from f − .This equation is used to bound f * + from below by z f − (x − z) + p * (z), even when no probe p * exists to construct f * + from f − or when data is sparse.For a particular x ∈ x, the upper bound output f + * is given by: and this in turn implies a bound on p * : Letting x and hence y take all possible values, ( 17) can be written as an erosion: Therefore, the optimal probe p * (z) is bounded by the erosion of the desired output f + * with the input f − .Similarly to the update rule in gradient descent for convolutional networks, the structuring element p is updated over iterations i.Let p i denote the structuring element at a particular iteration.The update rule for the parameters of p can be given as where λ is a gain parameter.This method of learning by bounding will be referred to as Probe Learning.

III. PROOF-OF-PRINCIPLE
As a proof-of-principle, this paper shows that when the input data to the MNN is suitable to morphological operations, networks trained by the proposed Morphological Back-propagation and Probe Learning outperform any convolutional network by a large margin.The modality of choice is data resulting from the imaging process of scanning probe microscopy (SPM).The working principle of SPM is positioning a probing element above a surface sample and maintaining constant force in the subclass atomic force microscopy (AFM) or constant current in the subclass scanning tunneling microscopy (STM).These surface-probe interactions are naturally expressed through mathematical morphology [27], [28], [29], [30], since that is the mathematics of touch probing [4] rather than kernel-based diffusion.
Consider an atomic surface function S : R 2 → R and manufactured probe function p : R 2 → R. The resulting image function I obtained from SPM is the morphological dilation of the surface S with the probe p: where x, z index spatial locations on the sampling plane for the surface and probe respectively, and is the supremum operation.If the geometry of any two of I, S, or p is known, the third is related by morphological dilation or erosion.Even when only the scanned image I is known, blind reconstruction techniques [30], [31], [32] may be used to recover an upper bound of the probe p and surface S. For a graphical overview, see Fig. 3.In view of (20), data from SPM should have excellent characteristics for testing morphological networks.
In this section, Morphological Back-propagation and Probe Learning are evaluated on synthetic SPM data.The purpose is twofold: to validate the theoretical insights and to show that for appropriate data it is indeed advantageous to incorporate morphological operations in neural networks.

A. Background
Deep learning is commonly applied in automated analysis of SPM data [33], [34].Even though it was established decades ago that mathematical morphology can model probe-surface interactions leading to the scanned images (see e.g., [28], [30], [31]), the default operation in automated analysis is still the convolution operator.One argument for using CNNs is that the SPM imaging process is not fully described using morphology: additional noise may be introduced through variance in the tunneling gap in STM or slight cantilever oscillation in AFM [27].This can partially be modeled by additive Gaussian noise [31], which is inherently difficult for morphology to process.
Specifically for AFM, contact between probe and surface can bring about wear of the material during data collection.As an example, double apex forming [35] -i.e., loss of probe convexity-may happen at any time.Consequently, the quality of the probe geometry has to be monitored constantly.Estimating probe and surface abrasion is ill-defined because it may happen to either or both structures [36]: the image signal results from both the probe and the surface.Estimating abrasion effects using MLPs and CNNs has previously been studied in [35], [36], [37], [38], [39], [40].While results are promising, directly using the morphological nature of the problem in its solution is likely to be beneficial.
For the proof-of-principle of the proposed method, SPM imagesurface pairs are required.These are not trivially generated and no public dataset exists that provides them.For example, the authors of [41] provide tools to generate data, but make use of an idealized spherical probe implicit in their atomic representation.[40] provides a binary classification task with negative samples due to a variety of reasons: sample drift, no probe contact, scanning problems, etc.These artefacts are not described using mathematical morphology.

B. Implementation
All proposed methods, notably Morphological Back-propagation and Probe Learning, are implemented using PyTorch [26].To create the dataset, three primitives are chosen to replicate three distinct probe shapes: a parabolic probe, a pyramid probe, and a parabolic probe with double apex.Especially the third primitive is relevant for practical applications since non-convex probes may negatively affect the quality and validity of AFM measurements.For each primitive, 8 datasets (N train =1000, N test =1000) of synthetic 2D train are generated with randomized probe geometry.Each sample consists of a scanned image I, an artificial surface S, and a best reconstruction R which is the erosion of the image I with known probe p.This theoretically best reconstruction R is the least upper bound surface that can be recovered from I taking into account the non-invertibility of the measurement.As a result, all evaluations are done against the best reconstruction R since it cannot reasonably be expected that a network predicts a surface better than R. If it would, it could hallucinate erroneous details of the true atomic surface S. Training is performed on both I → R and I → S. See Fig. 4 for an example of the 2.5D data, along with a cross-section

TABLE I PERFORMANCE OF LINEAR AND MORPHOLOGICAL NETWORKS ON SYNTHETIC 2D SPM DATASETS
In the second column, the number of parameters is shown; e.g., ResNet-50 [42] has roughly 44m parameters, whereas the simple networks have only 289.In the third column, the average magnitude of iterations is shown until the model converged.In the fourth through ninth column RMSE is shown for predictions measured against the best reconstruction R. The networks were independently trained and tested 8 times on different datasets for each configuration.In all cases the morphological networks outperform linear networks.Probe learning performs best, shown in boldface.along an arbitrary scanline.The structuring elements of the MNNs are initialized at zero.Initial experimentation shows no impact on performance for random initialization, although convergence time may be slightly affected.For further details, see the publicly available code at github.com/rickgroen/probe-learning.

C. Learning Probe Geometry
Morphological Back-propagation and Probe Learning by error bounding are evaluated against linear methods and mean backpropagation in Table I.Qualitative examples are shown in Fig. 5.There are four aspects to take note of: first, the single-layer morphological networks outperform, by a large margin, the linear methods.Probe Learning performs best in terms of prediction quality (Fig. 5(a)).For linear methods, high frequency noise is hallucinated around object edges.This effect is more pronounced for U-Net and ResNet (Fig. 5(d)) than for a single-layer CNN (Fig. 5(c)).Second, convolution can be made to perform better by using more parameters and introducing additional tricks such as residual connections.Even so, increasing the amount of parameters in the network to 44 million (shown in the second column in Table I) does not guarantee learning the data; by contrast, the single-layer morphological networks can learn the task by just 289 parameters from a 17×17 probe.Third, the third column in Table I shows the average magnitude until learning converges.Early stopping after RMSE convergence over 100 iterations on the training set is applied.Probe Learning appears to converge always within 100 iterations of training.Fourth, morphological back-propagation fits the data perfectly when trained on I → R, but provides wrong vertical scaling when trained on I → S, though overall shape is predicted correctly (Fig. 5(b)).A second MM layer could compensate for this vertical offset or a bias term could be used; alternatively, we could change the error aggregation from (13) to averages (Section II-D) making the method less sensitive to extremes.

D. Double Apex Detection
Probe or material abrasion is a challenging issue in obtaining highquality scans [35], [36], [37], [40], [44] in AFM.In Table I, in the sixth and ninth column it is shown that Probe Learning by bounding recovers the best reconstruction R.Besides surface predictions, the geometric properties of the probe are learned by the morphological layer.See Fig. 6.The proposed method of probe learning recovers the upper bound of the shape of the probe, within some margin of uncertainty between the two peaks.Numerical analysis of the probe could be integrated with measuring software to determine double-apex forming without the need for complex CNN architectures: a 300-parameter MNN suffices.

IV. OBSERVATIONS ON GENERALIZATION
Back-propagation naturally generalizes to networks with an arbitrary number of layers.The simulated SPM experiments only address the method on idealized noise-free haptic data; noise impacts network learning.To demonstrate the proposed method, it is applied to depth infilling, i.e., augmenting incomplete depth data.Though specialized algorithms exist [45], infilling is fundamentally a morphological task since it is one of separating shapes and overcoming occlusion.
The depth infilling experiment is performed on NYUv2 (N train =795, N test =654) [46].All networks consist of six layers: the first three layers down-sample to 1 8 th resolution, the latter three layers up-sample to full resolution.MNNs use dilation and erosion layers alternately, CNNs use convolutions without non-linearities.Networks are trained for 40 epochs using an SGD optimizer and an L2 loss objective.Quantitative results are shown in Table II, visual results are shown in Fig. 7. Since the data are gathered from real-world indoor scenes, it can be expected that sensor noise complicates learning.While morphological networks may be suited to dealing with missing data, noisy data is challenging: morphological operations deal with additive noise by estimating an envelope.For a sequence of morphological layers, without aggregation, the envelope is slightly vertically displaced with respect to the signal.To compensate for this non-morphological type of noise, the morphological layers have to be extended by a vertical bias term; convolutional layers use an identical term to deal with vertical off-set.

V. CONCLUSION
In this paper, a geometric definition of Morphological Backpropagation is proposed that does not rely on linear approximation of morphological operations but rather on the geometric provenance of slope correspondences.Second, Morphological Probe Learning is proposed based on the natural bounding properties of morphology.Two experiments (SPM surface reconstruction and NYU depth infilling) confirm that problems of a morphological nature can be solved accurately with much smaller MMNs than CNNs, and compete with dedicated solutions.In both experiments, CNNs are not able to approximate the ground truth even when many more parameters were introduced.In the case of SPM data, MNNs also converged orders of magnitude faster than their linear counterparts.
As of now, only relatively simple single-channel networks were examined.In future research, the proposed update rules (morphological back-propagation and probe learning) or combinations of both could feature more prominently in larger morphological networks.They should then take into account the composition of the morphological operations in subsequent layers.Moreover, the morphological update rules should be adapted to be less sensitive to noise.In conclusion, when data can reasonably be modelled to result from probing touch (e.g., haptic data from SPM or LiDAR), morphological operations are strongly recommended in the construction of network architectures.
∂f + ∂f− , ∂f + ∂p have to be redefined because they differ from those in CNNs.This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Schematic overview of a single layer in a neural network.It shows forward propagation of f − and p by some operation ⊗, and backward propagation of the error ∂E ∂f + (x) .From the chain rule, only the terms ∂E ∂f + (y) are required to obtain the derivatives of the error with respect to the input f − and the parameterized probe p (or kernel) of ⊗.

Fig. 2 .
Fig. 2. Depiction of the forward propagation of a 1D signal f − using morphological dilation as a single-layer of a larger network.The dilation of a signal f with (convex) probe p is shown resulting in f + .Clearly here ∇f + (x + ) = ∇f − (x − ) = ∇p(z − ) which is proven in the main text.z f − (x − z) + p(z) is attained.The contact location x − implies the existence of a location z − on the probe p that satisfies z − = x + − x − where z − ∈ dom(p).Therefore, (1) can be rewritten in terms of the locations on f + , f − , and p where the supremum occurs:

Fig. 3 .
Fig. 3. Graphical depiction of SPM data in 1D.The line depicts an artificial atomic surface S, the dotted line is the image I after dilating the true surface with the probing element p.The dilation is the natural mathematical model of the equipotential (in STM) movement of the probe across the surface.The striped line depicts the least upperbound on the recoverable surface R. Notice especially region (A) the image I and surface S have the same shape at the maxima disregarding some offset in y; and (B) a blunt probe cannot fully recover surfaces within crevices between atoms.

Fig. 4 .
Fig. 4. Data example.(left) 2D sample used for training and testing generated by a parabolic probe.Vertical offset added for visualization; (right) cross-section along a scanline to show image I, surface S, and reconstruction R. Either S or R may be used as ground truth.

Fig. 5 .
Fig. 5. (a) Predictions from a single-layer MNN that learned probe geometry by means of the method in Section II-E.The blue prediction lines up with the orange target S; predictions are draped around the true peaks since the network estimates at most within a theoretical upper bound of reconstructability.(b) Morphological back-propagation from Section II-C.(c) Single-layer CNN; (d) ResNet-50.The two CNNs hallucinate erroneous high frequency details around peaks which is more pronounced for (d).

Fig. 6 .
Fig. 6.Qualitative example of double-apex forming in AFM.(left) Atomic surface S in orange, resulting image I in green.Artefacts arise as false double peaks around the structures in I. Measuring software could mistake these for surface, whereas they actually arise from a low-quality probe.(right) The probe used to create the synthetic data in orange, the predicted reconstruction of the shape in blue.The reconstructed probe is an upper bound on the true geometry of the probe.The network thus learns to predict (upper bound) probe geometry.

Fig. 7 .
Fig. 7. (left-upper) input raw depth; (right-upper) ground truth infilled depth; (left-lower) CNN (ii) prediction; (right-lower) MNN (ii) prediction.MNNs can deal with sparse data.The size of the structuring element determines how much missing data is filled for each layer.

TABLE II DEPTH
INFILLING PERFORMANCE OF CNNS AND MNNS ON NYU