Detection of Dolphin Whistle-Like Biomimicking Signals by Phase Analysis

The application of low probability of detection (LPD) in underwater acoustic communication is challenging due to the limited bandwidth and frequency band available that allows simple interception using energy detection. Confronting that, recent LPD schemes offer to disguise the communication signal as a vocalization of a marine mammal. This way, the signals can be transmitted at high power, while the interceptor believes these are biological sounds. In this paper, we propose a first interceptor tailored to distinguish between anthropogenic and biological sounds. Our main assumption is that, due to limitations on the dump-off factor of the acoustic projector, the phase of a real whistle would be much more diverse than that of a disguised whistle-like signal. We therefore propose as a classification measure the randomness of the signal’s phase. The phase is calculated by a phase-locked-loop, while the phase’s randomness is measured by entropy. Our results show that the approximate and sample entropies, which both uncover regularities in a signal, are good classification metrics. Analysis of data obtained from two sea experiments and from a large database of tagged dolphins’ whistles, shows that our interception scheme can well distinguish between real and biomimicked signals.


I. INTRODUCTION
The extensive usage of underwater acoustic communications (UWAC) in security-related applications has led to the development of low probability of detection (LPD) schemes. LPD communication is relevant to a wide range of security-based applications, including search and survey by submarines [1], report-back communication by scubadivers [2], and communication with deployed autonomous vehicles [3]. Such applications often require the communication to be concealed such that transmissions are not detected by an interceptor. Common LPD communication approaches achieve covertness by hiding the emitted signal below the ambient noise's floor or by disguising the signals as noises that are characteristic of the environment. Then, by sharing a secret methodology between the transmitter and receiver to detect the LPD signal, a processing gain is achieved such that the minimum The associate editor coordinating the review of this manuscript and approving it for publication was Maged Abdullah Esmail . signal-to-noise ratio (SNR) still managed by the receiver is below that of the interceptor.
A widely used approach for LPD UWAC is directsequence-spread-spectrum (DSSS) [4]. In DSSS, a spreading sequence is multiplied by the modulation signal such that it spreads across the frequency domain. The signal's bandwidth is then expanded according to the length of the spreading sequence and the signal is received below the noise floor. The spreading sequence is chosen such that it is hard to 1) identify the signal through e.g., cyclostationary analysis and 2) reproduce it for interception through e.g., a bank of matched filters. The first attribute calls for a narrow autocorrelation response of the spreading sequence, while the second requires a non-linear operation during the generation of the spreading sequence [5]. While DSSS is widely used for LPD communications (e.g., in [6] and [7]), their spectral usage is low and so is the data rate [8]. Other spreading approaches are orthogonal-frequency-division-multiplexing (OFDM) [9], or the use of chirp signals whose frequency changes over time [10]. In both cases, the transmitted signals have narrow auto-correlation and high processing gain, and can thus tolerate extensive multipath [11]. To overcome the data rate limitation, [12] uses a chaotic sequence modulation that, even for high power level, makes the received signal appear similar to ambient noise. Other approaches hide the communication signal by means of frequency hopping [13] or by spatial focusing using acoustic arrays or time interval techniques [14]. However, these approaches are sensitive to noise and their decoding performance is poor for low power SNR [15]. Further, using noise-like signals for underwater LPD communications faces a main limitation as the source of the transmission can be traced by an interceptor to a single location, e.g., by means of beamforming, whereas the ambient noise is assumed to be isotropic [16]. Here, in the context of underwater acoustic LPD, opportunity lies in bio-mimicking.
In bio-mimicking LPD communication, the transmitted signal is structured to resemble a biological sound, such that a possible interceptor cannot distinguish between that and a real bio-vocalization. The modulation signal is similar in structure to (or recorded from) the vocalization of a marine mammal. These vocalizations can be dolphin clicks or whistles, whale calls or seal sounds; all with the common feature of having a complex and diverse structure. In [17], dolphin clicks are used as modulation signals for a pulse-positionmodulation (PPM), where the information is encoded in the position of the high-frequency clicks. Similarly, [18] uses sea lion clicks as an information carrier and utilizes the intra-pulse rate for modulation. For higher transmission rates, complex signals such as dolphins' whistles are used. A simple method of mapping the information bits to whistles is described in [19], where an alphabet of whistles is used for symbol modulation. Another mapping method generates the modulation signal as a synthetic chirp with a non-linear curve to resemble a dolphin's whistle [20]. An on-off-keying (OOK) modulation based on the codas of Humpback whales is used in [21]. Other methods use the biologic signal as a strong signal to hide another signal. In [22], a DSSS signal is embedded on top of a dolphin's whistle, and [23] uses Humpback whale calls to hide active sonar signaling. The main advantage of the bio-mimicking UWAC is that they can be transmitted with high SNR. This way, assuming the interceptor detects the signal but refers to it as a biological sound, the communication can be of high data rate but still be LPD. The spectrograms in Figs. 1 and 2 show an example of how similar real and biomimicked signals are, respectively.
While there exist interceptors for UWAC LPD signals, these focus on detecting a signal under the noise floor and not necessarily on discriminating one signal from another to intercept biomimicking communications. In particular, as the biomimicked signals are deliberately transmitted at high SNR, it is not the problem of detecting a weak signal. For example, multiband energy detectors can detect signals spread in frequency by some pseudo-random sequence or via frequency hopping [24], but would not suffice for differentiating between biological signals and artificial ones.  Other interception approaches that use feature analysis to, e.g., discover fluctuations in time [25] or detect a signal's pattern [26] may also fail because the signal is made to be very similar to real bio-vocalization. Further, since both the biomimicked signal and the real biological signal stem from a single source, interception by tracking or localization to separate a single source from an isometric noise will not assist in the signal's discrimination. Instead, the challenge of designing an interceptor for biomimicking communication is VOLUME 10, 2022 in its classification. That is, once a biological-like signal is detected, the goal is to classify it as originating from either a real biological source or from an artificial one.
Given the high potential of biomimicking as an LPD approach and its usages for UWAC, in this work we explore the question how to intercept biomimicking UWAC signals. This should not only serve as a counter measure for such an elusive communication technique, but can also serve as a benchmark to prove the efficiency of these LPD approaches. The main challenge we see is, regardless of the communication protocol, to find a metric that can separate between real signals and mimicked ones. We focus on the interception of biomimicking techniques that use dolphins' whistles for modulation. Still, our method supports the detection of a wide range of biomimicking techniques ranging from synthetic mammal-like vocalization to playback of such signals.
Our approach is developed from our intuition that the diversity in the signal's phase is much greater for real biological signals compared to biomimicked signals. Due to the limited dump-off factor of the acoustic hardware [27], we argue that the above assumption holds also for the playback of biological signals. As such, we rely on the statistics of the signal's phase as a clustering metric. Our approach uses a prior detection procedure such as PAMGUARD [28], which produces a time-window of samples containing a detected whistle-like signal. Assuming the signal is transmitted at high power, we ignore the error probability of the detector and further assume that the detected whistle occupies the entire time window. Then, using a phase locked loop (PLL), we estimate the phase of the signal and calculate its approximate entropy [29] as a metric to quantify the phase's randomness. To the best of our knowledge, ours is the first interception technique that can separate between biomimicked UWAC signals and real biological ones. Our contribution is three-fold: 1) a first attempt to distinguish between a real dolphin whistle and a bio-mimicked whistle-like signal; 2) a low complexity interception test that can be operated in real time; 3) a method to evaluate the randomness of an acoustic signal by measuring the entropy of the signal's phase.
We evaluate the performance of our method by analyzing a database of real whistles, and by emitting biomimicked signals in a sea experiment. The experiment included transmissions of synthetic whistle-like signals and playbacks of real recorded dolphin whistles. A false alarm, in terms of detecting a biomimicked signal, is evaluated by running the interceptor over the real dolphin's whistles. The results show that our method can distinguish well between the real and biomimicked whistles. For reproducibility, we share both our interception code and our database of real dolphin whistles.
The rest of this paper is organized as follows. In Section II, we present our system model, the main assumptions, and preliminary discussion about detection of dolphin whistles. Section III describes our proposed interception method in detail. Performance evaluation is provided in Section IV, and conclusions are drawn in Section V.

II. SYSTEM MODEL A. SYSTEM SETUP
Our system comprises a single omni-directional acoustic modem transmitting a sequence of signals that, as a test case, resemble the whistle sounds emitted by dolphins. The signals are detected by a receiver who is aware that the whistlelike signals are biomimicked UWAC symbols. A modulation scheme agreed by the transmitter and receiver but unknown to the interceptor allows the receiver to decode the symbols. Since the LPD feature hides the symbols as dolphin whistles, the signals are transmitted at high power so that the communication is expected to be well received.
Stationed within the range of detection, the interceptor overhears and detects the transmitted whistle-like signals. The interceptor is not aware of the transmitter's or receiver's existence and is thus not certain if the detected signals are real dolphin whistles or biomimicked ones. The interceptor then makes a binary decision whether the detected signals are biomimicked or real whistles. Our model is of a simple interceptor, e.g., an omni-directional single hydrophone, and thus solutions in the form of localization or source tracking (cf. [30]) to classify the source by e.g., its motion pattern, are not considered here.

B. MAIN ASSUMPTIONS
We make the assumption that either a long whistle or more than one whistle is transmitted. This assumption enables us to obtain enough statistics about the entropy of the received signal's phase at the interceptor. However, we do not limit the communication scheme and allow any protocol that uses dolphin signals as a modulation signal. For example, the symbols can be drawn by a fixed alphabet of real dolphin whistles that are played back in the water [19]. Alternatively, the symbols can be modulated on top of a synthetic whistlelike signal such as the one we show in Fig. 2. Other options are phase modulation [20], or time-difference modulation [17]. Knowing the structure of the communication, the receiver can detect the signals by a matched filter. Alternatively, the interceptor performs detection by looking for whistle patterns in the raw acoustic data. In the following preliminaries, we provide some options for such detection. We assume the signals are received at SNR high enough for a successful detection. The outcome of this detection at the interceptor is a time-synchronized sequence of either a dolphin's whistles or a whistle-like communication symbol.
Our key assumption is that a real dolphin's whistle holds stochastic characterizations and should thus include more random characteristics than a playbacked or a synthetic whistle. Specifically, we focus on the continuity of the whistle's phase. While a marine mammal like a dolphin holds a remarkable capability to emit signals of fast temporal changes [31], acoustic projectors and power amplifiers have built-in limitations in the form of a low dump-off factor [27]. As a result, signals produced by acoustic instruments require a low peak-to-average power ratio (PAPR) and their phase is expected to be much more continuous than that of a real dolphin's whistle. We thus use the phase of the received signal as an indicator to the source of the signal being real biological or biomimicked. Note that this conclusion holds for both synthetic and playback whistles.
To explore the differences we expect in the phase of real and biomimicked signals, we show in Figs. 3 and 4 an example of the phase of real and synthetic whistle-like signals, respectively. The phase was measured by a PLL as described below. We observe a significant difference between the phase of the two signals. Specifically, we observe that while the phase of both signals concentrates around 0.5 (normalized phase), the phase of the real whistle experiences transients whose gradient is far greater than that of the synthetic whistle signal. An intense analysis of the phase's randomness is shown in our results below for a large database of manually tagged real whistles.

C. PRELIMINARIES: DETECTION OF DOLPHIN'S WHISTLES
To calculate the signal's phase, we rely on a preliminary whistle detection algorithm. The most common whistle detector is the PamGuard toolbox [28]. PamGuard is an open source passive acoustic monitoring software. It includes three phases: 1) energy summation, 2) spectrogram analysis, and 3) matched filtering. The first step performs an energy detector to identify a signal within the assumed bandwidth of the whistle (commonly, 5 kHz-20 kHz). The second step involves estimating the contour of the detected whistle by analyzing the spectrogram matrix for spectral positions of high intensity. The third step focuses on validating the detected signal to be a whistle by correlating the evaluated contour line with templates of whistles. Other approaches for whistle detection follow the contour of the signal to evaluate the likelihood of it being a whistle [32], or use an entropy detector followed by a time correlator [33]. In [34], whistles are characterized by having maxima in the signal's contour, and in [35] a filter is used to track time-varying dominant frequencies in the contour curve. Other pattern recognition approaches are used to obtain a posterior estimate over the contour trace [32].

A. KEY IDEA
Our scheme includes two steps: 1) phase measurement and 2) signal classification. The first step begins after a whistlelike signal has been detected by a preliminary detector. Phase measurement is performed by analyzing the error term of the voltage-controlled-oscillator (VCO), which is part of a PLL (see formalization below). Since the whistle-like signal is frequency and time varying, the signal's energy and phase are time-frequency dependent. In these conditions, to effectively track the phase of the signal the PLL must be of high frequency relative to the bandwidth of the signal. This can be guaranteed through passband modulation. Filtering the signal using a time-frequency mask obtained through contour tracking, e.g., [33], [36] may also help, 1 but may be unnecessary when the SNR is high.
Once the time-domain set of the measured phase is obtained, the second step of signal classification is executed. Here, soft-decision classification is based on quantifying the randomness of the normalized phase components. Finally, since both real dolphins' whistles and biomimicked ones are expected to arrive in a sequence, we combine the randomness metrics from each whistle-like symbol and threshold it to make a binary decision: a real or a biomimicked whistle. The above scheme is illustrated in the block diagram in Fig. 5, and the steps of the algorithm are described in detail in the next section.

B. MEASURING THE SIGNAL'S PHASE
Once a whistle-like signal is detected, we calculate a time series of the signal's phase. While the signal is expected to change in the time-frequency domain, it is still expected to be continuous with a smooth transition between time-frequency bins. We therefore evaluate the signal's phase using a PLL. A survey about PLL techniques can be found in [37]. Here, we give our implementation to the process. A PLL is a closedloop module which tracks the instantaneous frequency of a given signal, v in (n) n = 0, . . . , N where N is the number of samples of the detected signal. As illustrated in Fig. 6, this is an iterative process that involves the modulation of the input signal with an output sinewave, v out (n), produced by a VCO module. The frequency, F out (n), of this sinewave is set by where F c is the center frequency of the VCO, K v [Hz/V] is a sensitivity measure, and e 2 (n) is the VCO input. Hence, where A v is the VCO amplitude gain, and out (n) is the estimated phase. The phase out (n) is determined by the phase-detector (PD), which generates Denote the difference between the true frequency of v in (n) and F c as F, we obtain v in (n) = A c · sin(2π F c n + 2π Fn) where A c is the input gain, and for simplification, A c = 1. Plugging (4) into (3), we have e 1 (n) = A v · sin(2π F c n + in (n)) · cos(2π F c n + out (n)), Using trigonometric identities and after the low pass filter (LPF) we are left with When the frequency of v in (n), F in , is closed to F out , During its iterative procedure, the true frequency of v in (n) becomes closer to F c , and v out (n) = A v · cos(2π F c n + 2π Fn + err (n)), where err (n) is an error phase shift. We then have and When F in = F out , Here, and Since F is unknown and, after locking onto the frequency out is constant, for the purpose of randomness evaluation we settle with the output of the LPF, e 2 from (9), as a signal corresponding to in .
Since the absolute value of the argument within the arcsin function must be smaller than 1, we get from (12) the maximum frequency deviation that the PLL can lock onto: Thus, to allow a | F| of ±15 kHz, we perform a passband modulation by multiplying the signal with a cosine signal to yield F c = 60 kHz. Additionally, since e 2 in (9) should contain only low frequency content, we down-sample the output of the PLL processing, and perform the analysis in baseband for efficiency.

1) METRICS FOR EVALUATING RANDOMNESS
Once e 2 (n) n, 0, . . . , N is determined, we continue to evaluate its randomness. Metrics to evaluate the randomness of a signal can be found in the Kolmogorov-Smirnov (KS) test [38], conditional heteroscedastic models [39], and entropy evaluation [40]. The first is a nonparametric test to associate a sample with a probability distribution function (PDF), the second is a measure of the uncertainties within the signal, and the third metric is a quantification of the amount of information encapsulated within the signal. In this work, we focus on the entropy metric since we are interested in measuring the randomness of the signal rather then evaluating its distribution. Among the many forms of entropy metrics (cf. [40], we focus on the Approximate-entropy (ApEn) and its modified version the Sample-Entropy (SampEn), both fit for examining the information within short time-series. We also explore the use of the Tsallis entropy (TsEn) that describes the complexity of the time-series. In all three cases, a normalization of the input signal, e 2 (n), is required. The calculation is performed for each of the k = 1, . . . , K detected whistlelike signals to yield a measure R k . These measures are then summed up by where w k are weights set to reflect the reliability in the phase estimation of the kth whistle-like signal. Specifically, we provide lower weight to complex signals for which it is harder for the PLL to lock onto; and higher weight to signals of longer duration for which more samples are available.
Here, complexity of the signal is attributed to the shape of the signal, which we measure by the number, N k , of the inflection points within the time-frequency contour of the kth signal. Let L k be the duration of the kth whistle-like signal. Then, where α ≤ 1 is a control user-defined parameter. Finally, measure S in (14) is compared to a threshold that can be determined by a constant false alarm test that evaluates the distribution of real dolphin whistles from databases such as the one we share in [41].

2) DISCUSSION
Our approach relies on the assumed randomness in the phase of a real whistle. Clearly, this assumption depends on the type of emissions a dolphin makes, and thus we are exposed to false positives when determining a smoothed real whistle to be a biomimicked signal. Similarly, results are expected to deteriorate when the SNR is too low and the PLL fails to track the phase of the signal. We therefore have to limit our results below to the dataset of real whistles we analyzed and shared.

IV. PERFORMANCE ANALYSIS A. SETUP OF DATABASE
We explore the performance of our interceptor based on three sets: 1) recordings of synthetic whistle-like signals, 2) recordings of real dolphin whistles, and 3) recordings of a playback of real dolphin whistles. Without loss of generality, the transmissions of the first two signal types followed the biomimicking scheme in [20]. The first type of signal comprises a sequence of whistlelike synthetic symbols, whose carrier frequency and duration vary in the range of 5kHz to 24kHz and 200 ms to 2 sec, respectively. The synthetic signal is made to resemble a real dolphin whistle by structuring its time-frequency (TF) contour based on a non-repetitive set of TF masks extracted from real whistles. Guard intervals are placed between the symbols to reduce the inter-symbol-interference. The duration of each guard interval is randomly uniformly determined between 2 msec and 200 msec to break any structure in the signal.
Our dataset of real dolphin whistles is obtained from the 8 th DCLDE Workshop [42]. This dataset includes a few Terra Bytes of non-tagged whistles. As part of a civil science project, we received help from high school students from the ''Open School'', in Haifa, Israel, to obtain a few hundred tagged whistles. Each tagged whistle included the time window indication to mark the position of the whistle. Below we show results for 243 tagged whistles.
To obtain a realistic dataset of biomimicked signals, we performed two sea experiments where we transmitted the above defined synthetic whistles and a playback of real whistles. The transmissions took place in May 2020 and in May 2021 roughly 5 km across the shores of Haifa, Israel, at a water depth of 20 m. The sea level was roughly 2 and the sound speed in water was measured to be 1530 m/s and was roughly fixed along the water column. The transmitting vessel was separated roughly 1000 m from a receiving vessel, and deployed at a depth of 10 m an EvoLogics LF software defined modem from which the signals were emitted at a calibrated source level of 175 dB//1uPa@1m. In turn, the receiving vessel deployed at 10 m depth a selfmade acoustic recorder with a flat frequency response and a linear phase along the corresponding frequency range. The receiver continuously recorded throughout the experiment. The result was 121 whistle-like signals and 104 playback whistles, each of which received at an SNR exceeding 20 dB. Each transmission was preceded by the transmission of a linear chirp signal at the frequency band of 7-17 kHz. This chirp signal was used to time-synchronize the whistle-like and the whistle playback signals. We also used the chirp signal to evaluate the channel's impulse response and found that it included multiple significant taps with a delay spread ranging between 2 ms to 10 ms. Since the underwater acoustic channels for the real recorded dolphins was different than that in the sea experiment, we explore performance by calculating the histogram over all results obtained.

B. INTERCEPTION RESULTS
We start by demonstrating why for interception we rely on the phase of the signal rather than on its raw acoustic samples. In Fig. 7, we show the histogram of the approximate entropy [29] for the raw acoustic samples for the three types of signals. We observe that, while the histogram for the entropy of the synthetic whistle-like signals show that it is more diverse, good separation by thresholding is not possible. Since the reception was at high SNR, we argue that this is due to the effect of the channel that induces randomness in the received signals.  Next, we analyze the distribution of the entropy measures for the phase of the same signals. Results in Fig. 8 show the histogram of the approximate entropy for the phase of the inspected signals. Here, we observe that the three signals can be well separated. As expected, the real dolphin whistles show high entropy that is focused on a value close to 1.6. The playback whistles also show high entropy, but much lower than that of the real whistles with more diversity. Since the transmitted playback whistles were the same as the real whistles, and since, as were the playback whistles, the real signals were also received by a human-made hydrophone, we argue that the observed difference between the real and playback signals is due to the projector's hardware. Lastly, while made to be confused with real whistles, the results in Fig. 8 show that the synthetic whistle-like signals can be well separated from the real whistles. Designed by a structured modulation scheme and emitted by a man-made projector, these signals experience low entropy that is roughly focused around 0.5.
Due to their different distributions observed in Fig. 8, setting a threshold to segment the three signals would yield a perfect precision-recall curve. Thus, to quantitatively compare the histograms of the three signals we turn to the Kullback-Leibler divergence (KLD) [43] as a metric to compare two histograms, P and Q, where D(P||Q) is a positive scalar whose value increases as P and Q become more distinct. Another attribute of the KLD is that the affect of bias is smoothed such that similar distributions with different bias would still get a low KLD value. Results in Table 1 show the KLD matrix for separating among the three histograms. We observe the high values of the KLD measures that reflect on the differences between the approximate entropy measures of the phase of the three signal sources. Further, while the KLD is not a symmetric measure, we observe that results are almost symmetrical. This implies that identification is possible either of real whistles from biomimicked ones or vise versa. Finally, to comment on the process that drives the differences between the phase of real and biomimicked signals, we test the interception performance based on three entropy measures. Results in Fig. 9 show the histograms of the Approximate, Sample and Tesallis entropy measures for the estimated phase of the real, playback, and synthetic whistlelike signals. We observe that along with the already explored Approximate entropy, the Sample entropy also separates the 36874 VOLUME 10, 2022 three signals well. However, performance shows that, using the Tesallis entropy measure, interception performance is low. Since the Tesallis entropy gives more weight to higher probabilities while the Approximate and Sample entropies uncover regularities in the signal, we conclude that the phase of the real whistles does not include significant patterns in the time series.

V. CONCLUSION
In this work, we explored a new interception scheme to identify a received dolphin's whistle-like signal to be a real dolphin whistle or a biomimicked one. By exploiting differences between the phase of the signals, our scheme classifies three signal types: real whistles, playback whistles, and synthetic whistle-like signals. In particular, assuming dolphins can produce signals whose phase shows a higher random attribute than signals produced by man-made acoustic projectors, we separate the signals by exploring the entropy of the signal's phase. The phase is obtained by a PLL, while the entropy is calculated based on the phase of multiple received signals. Exploring the performance of our interceptor over 243 real whistles and 104 playback and synthetic whistle-like signals, we conclude that the signal's phase is a good classification measure and that our interceptor can well separate between the signal sources. Analyzing performance for different entropy measures shows that the Approximate entropy and the Sample entropy serve better for classification than the Tesallis entropy, and thus we conclude that the phase of real dolphin whistles does not include significant patterns. In 2009, he has received the Israel Excellent Worker First Place Award from the Israeli Presidential Institute. In 2010, he has received the NSERC Vanier Canada Graduate Scholarship. He has received three best paper awards and serves as an Associate Editor for the IEEE JOURNAL OF OCEANIC ENGINEERING.