I. Introduction
Magnitude spectra and their time sequences in the form of spectrograms are widely used for time-frequency representations of audio signals such as speech and music. The magnitude spectrum of a discrete-time signal is typically obtained from the short-time Fourier transform (STFT), which is defined as X(mS,\varpi)=\sum_{n=-\infty}^{\infty}x(n)w(n-mS)e^{-j\varpi n}\eqno{\hbox{(1)}}where is the analysis window, is the analysis step size, and is the index of the frames of the STFT. The complex-valued STFT is a complete and reversible time-frequency representation. The time-domain signal is uniquely determined by its STFT representation and vice versa. Using the STFT, the short-time Fourier transform magnitude (STFTM) spectrum of is\left\vert X(mS,\varpi)\right\vert=\left\vert\sum_{n=-\infty}^{\infty}x(n)w(n-mS)e^{-j\varpi n}\right\vert.\eqno{\hbox{(2)}}