I. Introduction
Autonomous sonar target recognition is important towards maintaining real-time sensor-based marine situational aware-ness, and thus create a sustainable marine environment free of anthropogenic debris such as macroplastics and other sonar-detectable human waste. It is also important for monitoring large targets underwater such as oil rigs used in offshore drilling. Accurate real-time target recognition needs reliable feature representation and segmentation such that sonar features of a target of interest may be autonomously identifi-able, and readily interpretable by a domain expert. Generally, most target features manifest as multi -dimensional shapes in some spectral domain derived from the time series of the target response. A typical example are so-called acoustic color images which localize target features of interest in the time-frequency (or similar two-dimensional) domain, e.g. sonar ping responses plotted as a function of tracks and frequency. Artificial intelligence (AI) architectures, especially supervised machine learning (ML) techniques such as deep neural networks (DNNs) and their popular implementations using convolutional neural networks (CNNs), have been ap-plied extensively across the last decade to extract, segment and classify multi-dimensional features, especially using training images. However, they suffer from typical challenges of model uncertainty and training bias in learning morphing features for diverse sonar targets. This is particularly important in the context of structured and dynamic background interference from the oceanic environment, e.g. in coastal areas with significant acoustic interference from seabed, marine wildlife, shipping noise, macroplastic and other anthropogenic debris and other natural structures, which can pose significant un-certainties towards reliable feature extraction, segmentation and interpretation. One of the well-known challenges in sonar target recognition is the difficulty of segmenting and disentan-gling salient target features embedded in multi -dimensional spectragraphic datasets. Of specific interest are features that cluster together but also intertwine in non-linear ways and also change dynamically in their spectral morphology. Current state of the art in machine learning is not effective to apply under these circumstances due to the dearth of reliable ground truths and robust large-scale training datasets.