Fraction-of-Time Probability: Advancing Beyond the Need for Stationarity and Ergodicity Assumptions

Time series arising from measurements in many fields of physics, engineering, chemistry, biology, and econometrics, are commonly modeled as sample paths from an ensemble which, together with a probability measure, is called a stochastic process. Stationarity and ergodicity assumptions about this model are generally made for analytical convenience and mathematical tractability of the model. In this article, it is shown that a dichotomy, which can be very misleading in practice, exists between the properties of a stochastic process and those of its individual sample paths. This dichotomy can be eliminated by adopting the fraction-of-time (FOT) probability approach reviewed in this article for which a probabilistic model is constructed from a single time series without introducing the abstraction of the stochastic process. Two FOT-probability models are reviewed. The first considers probabilistic functions that do not depend on time and employs the relative measure on the real line as a probability measure and the time average as an expectation operator. Such time series are called stationary signals. The second considers periodic, poly-periodic, and almost periodic probabilistic functions and employs the operator that extracts the finite-strength additive sine-wave components of its argument as an expectation operator. This latter model is appropriate for describing time series originating from phenomena involving a combination of periodic and random phenomena. Such time series are called cyclostationary, poly-cyclostationary, and almost cyclostationary signals. The FOT-probability alternative provides a means for circumventing two standard but undesirable practices: (1) Adopting the Kolmogorov stochastic process model by using its Axiom VI without being able to verify its validity for the specific application and (2) Assuming Birkhoff’s ergodicity condition holds without being able to verify its validity for the specific application.

The Periodogram is a well-known statistic comprised of the squared magnitude of the Fourier transform of a finite segment of data, normalized by the segment length, T . One of several equivalent standard definitions of the power spectral density function of a stationary stochastic process is that it is the limit as T approaches infinity of the expected value of the periodogram. A convergence issue arising in this definition that many statistical signal processing engineers are at least aware of is that one cannot interchange the order of expectation and limit. The reason is that, before expectation, the limit of the periodogram does not exist. In addition, the variance of the periodogram does not converge to zero as T approaches infinity.
These fairly well-known mathematical facts are responsible for the practical guideline that one cannot obtain a reliable power spectrum estimate from the periodogram, no matter how long the segment length is, unless that periodogram is either 1) frequency smoothed (averaged over frequency throughout every frequency sub-band of width much larger than the reciprocal of the length of the data segment, or 2) time smoothed (averaging of a sliding periodogram over an VOLUME 4, 2016 interval much longer than the segment length).
So, what may appear to some to be an esoteric mathematical curiosity-the invalidity of interchanging the order of the abstract expectation operation and abstract limit operationis, in fact, the reason why the practical guidelines for measurement of reliable power spectra are what they are, as summarized above. This important fact of statistical spectral analysis, which extends and generalizes to cross-spectrum analysis and spectral-correlation analysis of cyclostationary processes as I have shown in this article [42] becomes transparent only when studied within the FOT-probability framework as first explained in considerable detail in my advancedlevel textbook [37]. By comparison, attempted explanations in other textbooks, in terms of stochastic processes, cannot be said to be transparent. The reason for this is that properties of ensemble averages of sample paths are typically different than properties of individual sample paths, which is a theme of this article. This difference does not vanish as commonly thought if the stochastic process is ergodic.
Engineers often deal mathematically with expected values and deal in practice with single sample paths or their time averages, and they often tend to freely interchange these quantities by replacing expected values with finite-time averages and, in some cases, simply deleting the expectation operation in deriving algorithms. Sometimes this is theoretically justifiable, and sometimes it is not. What percentage of practicing statistical signal processing engineers know when and why this interchange is not justifiable? Whatever small percentage this might be, it is likely far larger than the percentage of those who are aware of the fact that this frequently used interchange is not only sometimes unjustifiable but is also avoidable by adopting a little-known analytical tool that replaces the unnecessarily abstract stochastic process tool with a less abstract alternative. The issue here is not resolved by simply using stochastic process models that are ergodic.
The purpose of this tutorial is to teach engineers about this alternative tool and provide a deep understanding of how it relates to the stochastic process.
Many alternative titles for this article were considered; a number of alternatives to the choice made had the advantage of being attention getters and, for that reason, I mention a few of them here: "Why Ergodic Stochastic Processes are Inappropriate Models in Many Empirical Studies", "The Untold Truth About Stochastic Processes: They Tell Us Nothing About Sample Path Behavior", and "How to Avoid Making Standard Unverifiable Assumptions in the Use of Stochastic Process Models".
-William A. Gardner

I. INTRODUCTION AND HISTORICAL PERSPECTIVE
I N many fields of physics, engineering, chemistry, biology, and econometrics, randomness in time-series measurements and observations on phenomena being studied has typically been modeled by resorting to the abstract concept of a stochastic process. That is, an empirical time series is modeled as a representative (sample path or realization) of an ensemble of time series with "similar characteristics", together with a probability measure defined on the set of ensemble members, namely, a stochastic process. A desirable property of the model is that the probabilistic functions comprising the stochastic process be estimable by measurements made on the single empirical time series.
That is, the "similar characteristics" of the sample paths should be such that the properties and, in particular, the probabilistic functions of the whole ensemble such as mean, covariance, and amplitude distribution, can be inferred by measurements made on any one of the sample paths, with the exception of a set of sample paths that occurs with zero probability. This desirable property is called ergodicity in classical stochastic process theory. This provides the allimportant tie for signal processing applications between the model and that which can be measured. However, it is shown in this review article that this tie is not as strong a tie as one would like for empirical signal processing purposes.
The ergodicity concept was first treated mathematically in 1931 by Birkhoff [14] and in 1932 by von Neumann [105] with reference to dynamical systems. They established conditions under which averaging, at a single time instant, a function of the variables of what was called the phase space across an ensemble of different copies of the same system is equivalent to averaging over time for a single system. See [87] for a discussion on the slightly different points of view of Birkhoff and von Neumann.
Subsequently, ergodicity has played a key theoretical role in multiple fields [3], [18], and has been the subject of much thought and discussion. As examples, see the recent treatments in the fields of economics [26], [19], [93], [94], atomic physics [65], and condensed matter systems [83]. Theoretical research on ergodicity has continued for nearly a century; see, for example, the more recent work by Boyles and Gardner [16], Shields [98], Katznelson and Weiss [63], and Gray [47], and the latest work by the Authors reported in this review. For a comprehensive bibliography on ergodicity, see [69].
The adoption of the stochastic process model requires a substantial abstraction, the hypothesized mathematical existence of the ensemble which, in various circumstances, creates serious conceptual problems [45, page 3.4]. In fact, for many applications in the various fields of science and engineering, there is only one record of real data; there is no ensemble of statistically independent random samples of data records. For stochastic processes, mixing (a type of asymptotic independence of time-samples with increasing separation in time) assumptions are typically used as sufficient conditions for ergodicity. But such conditions often cannot be verified for the adopted mathematical models; generally, they are simply assumed to hold on the basis of little more than faith. It is explained in this review that properties of probabilistic functions defined in terms of ensemble averages in most cases do not correspond to similar properties of analogous functions defined in terms of time averages on single sample paths, even if the stochastic process is ergodic. Consequently, there is a dichotomy between the properties of a stochastic process-the model-and the properties of its individual sample paths-the physical signals. The idea that averaging over an ensemble does not necessarily correspond to averaging over time-the lack of ergodicity of some stochastic process models-is receiving increasing attention in various fields of study [28], [83], [84]. Furthermore, it is explained in the following sections that the critical nature of some assumptions made in the classical probability theory of stochastic processes was already pointed out by Kolmogorov in his seminal work 90 years ago [66, p. 15]. An attempt to describe randomness without introducing an abstract sample space and probability was made by Kolmogorov himself by introducing the concept of complexity for a single random sequence [67]. Such an alternative approach, however, did not enjoy the same success probability theory did. See [21], [80], and [104] for development and discussions.
In this paper, more concrete models based on properties of time averages of individual signals, instead of averages over hypothetical ensembles of signals, are reviewed and new results on the concrete benefits of this alternative are reported. This approach is based on what is called fraction-of-time (FOT) probability, which was first introduced in earnest in 1987, with a comprehensive book [37] devoted exclusively to this theme by Gardner, and the duality between these two alternative models was a theme of the earlier book [34]. But, important advances in the theoretical underpinnings of the FOT-probability theory of signals have been made quite recently.
In this approach, starting from a single function of time, a valid distribution function and all other familiar probabilistic parameters such as means, moments, and cumulants are constructed. That is, formulas for calculating these functions/parameters directly from time series are developed. The approach can be put in a rigorous measure-theoretic framework built on the conceptual foundation of relative measure introduced by Kac and Steinhaus [60], [61], [62]. Developments were presented in [1], [30], [100], [101], [102]. More recently, the concept was revisited by Leśkow and Napolitano in [72]. Very recent (unpublished) work by Gardner on generalizations of ergodicity to cycloergodicity that ameliorate some of the drawbacks of the standard assumptions of stationarity and ergodicity are included in this review.
The FOT probability approach is methodological, it does not necessarily provide new signal processing recipes. However, in the paper it is shown that the adoption of this approach allows the experimenter to avoid dichotomies between properties of the stochastic process model and those of its sample paths. The dichotomies illustrated in the paper regard issues frequently encountered in signal processing applications: The expectation of a digitally modulated signal; the linear transformation of a Gaussian signal; and the concept of statistical independence. In addition, in the paper it is shown that some properties of the probability measure and the expectation operation render these entities more amenable to some mathematical operations than do the temporal counterparts of these properties for the sample paths. In contrast, the properties of the relative measure and the infinite time average are exactly the properties of the functions at the hand of the experimenter, the only abstraction being the concept of an infinite observation interval. (Infinitetime averages of single records of data are somewhat loosely said to be "at hand"; the justification for this is the fact that finite-time averages of single time records-the items that are truly "at hand"-can closely approximate infinite-time averages as explained in Sec. V-B; this close relationship between an empirical quantity and a corresponding theoretical quantity is to be contrasted with the major distinction made in this paper between the theoretical expected value (an ensemble property) and both finite-time and infinite-time averages (sample-path properties).) A precursor of the spectral analysis of a single time-series is due to Einstein in his work reported in 1914 [29] (see [36] for Gardner's introduction to and technical commentary on this work and the comments of Yaglom in [113]).
Roughly speaking, in the above-mentioned papers, an underlying stationary model is assumed. In fact, the FOT expectation operator is the infinite time average and the probabilistic functions derived therefrom are time invariant.
The extension of the FOT approach and GHA to periodic phenomena was made by Gardner in [37, Part II], [43]. Periodic phenomena are generated by the interaction of periodic mechanisms and random phenomena. The results are processes that are not periodic but whose probabilistic functions vary periodically with time. These signals have been referred to as cyclostationary or poly-cyclostationary if, respectively, only one periodicity or a finite number of incommensurate periodicities are present in the probabilistic functions. If the probabilistic functions are almost-periodic functions of time (which can generally be represented by infinite numbers of incommensurate periods), the signals have been named almost-cyclostationary [37, Part II] and provide an alternative to almost-cyclostationary stochastic processes first introduced in [33]. VOLUME 4, 2016 The extension of the FOT approach and GHA presented in [37,Part II], [43] is of great importance in practice due to the ubiquity of science data generated by periodic phenomena. In communications, radar, sonar, and telemetry, periodicities in the statistical functions arise from the modulation by random data of sinusoidal carriers or periodic pulse trains [37,Chap. 12], [90,Chap. 7]. In vibro-acoustic signals collected from mechanical machinery, periodicities in the statistics are due to rotations of gears, belts, and bearings [90,Sec. 10.6]. In econometrics, the weekly opening and closing of markets and the seasonal supply and demand of products give rise to periodicities in the statistical functions of prices and exchange rates [90,Sec. 10.7]. In radio astronomy, periodicities are due to the revolution and rotation of planets and pulsation of stars; and in human biological signals, by the heart pulsation and other biological rhythms. Periodicities are present in genome sequences, diffusion processes of molecular dynamics, and signals encountered in neuroscience (see [90,Chap. 10] and references therein, and also the results of an internet search producing 135,000 published research papers on cyclostationarity across essentially all fields of science and engineering [46]).
The extension of FOT and GHA to periodic phenomena made in [37,Part II], [43] is not obvious since Gardner's periodically or almost-periodically time-variant distribution is constructed from a single function of time by recognizing the non-obvious fact that the operator that extracts all the finitestrength additive sine-wave components of its argument is a valid expectation operator.
The reader is informed that the very brief use in the paper of a very few mathematical terms, like σ-field and Borel subsets, can be ignored without loss of comprehension. And the term Lebesgue measure can be thought of as nothing more than a formalization of standard measures of length, area, and volume in Euclidean space. Finally, the differences between the more familiar Riemann integral and the Lebesgue and Riemann-Stieltjes integrals can be ignored, or standard definitions of these integrals can be looked up on the Web, but adequate comprehension for the non-mathematician does not require this.
The paper is organized as follows. In Section II, motivation for introducing FOT probability is presented. In Section III, the relative measure-the analog of the classic probability measure-and its properties are reviewed. In Section IV, the extension/generalization of the FOT-probability theory to almost-periodic phenomena is treated. In Section V, new insight into the mostly unstudied topic of cycloergodicity is provided and the problem of statistical function estimation in the FOT approach is addressed. Examples of the dichotomy existing between the properties of a stochastic process and those of its sample paths are presented in Section VI. Conclusions are drawn in Section VII.

II. MOTIVATION
The relative measure µ R and the infinite-time average are the fraction-of-time (FOT) counterparts of the probability measure P and the ensemble average, respectively [62], [72].
Due to the differences between the relative measure, µ R , on the relatively measurable sets (which are a subset of the σ-field of Borel subsets of the real line) and the probability measure, P , on the σ-field of Borel subsets of a probability sample space, mathematical properties holding for stochastic processes do not necessarily have counterparts that hold for functions of time representing sample paths of these stochastic processes.
The key differences include: • The class of the P -measurable sets is closed under union and intersection; the class of the relatively measurable sets is not. • P is a σ-additive (additivity of countably infinite numbers of terms) measure; µ R is not. • Expectation is σ-linear (linearity of an operator applied to a linear combination of a countably infinite number of terms); infinite-time average is not. • Joint P -measurability of sample spaces is typically assumed but cannot be verified; joint relative measurability is a property of functions that can be verified. • The assumed σ-additivity property of the probability measure is typically unverifiable and restricts the admissible sample spaces of time functions in ways that the relative measure does not. • The relative measure is applied to the single time function at hand, and functions of this time function, with no restrictions other than its assumed existence. The fact that the relative measure cannot be guaranteed to be σadditive is a reflection of the reality of the time function at hand, not a deficiency.
These differences clearly show that the mathematical properties of the relative measure render it less amenable to mathematical study than do those of the probability measure P . This, however, does not constitute an obstacle to using the FOT approach for signal analysis but, rather, as explained in this paper, provides motivation for using this approach instead of the classical stochastic-process approach based on P . In fact, the σ-additivity of the probability measure and σ-linearity of the expectation provide mathematically desirable tractability. But, as explained below, they give rise to a dichotomy between the stochastic process properties and the properties of concrete individual sample paths of the stochastic process-the entities of primary interest to practitioners in many applications. In contrast, such dichotomies do not arise in the FOT probability approach. In addition, the adoption of the FOT approach overcomes all problems arising from the need to check sufficient conditions for validating assumptions for ergodicity-problems which occur frequently in time-series analysis applications.
The proposal to adopt the FOT-probability alternative to the Kolmogorov formulation of a stochastic process is by no means as outrageous as some may think. In fact, there is a long history of discontent with Kolmogorov's model, as discussed at length in [74].

A. RELATIVE MEASURE
Let us consider the set A ∈ B R , where B R is the σ-field of the Borel subsets and let µ be the Lebesgue measure on the real line R. The relative measure of A is defined by [62] provided that the limit exists. In such a case, the limit does not depend on t 0 and the set A is said to be relatively measurable (RM). For example, the set where Z is the set of positive and negative integers, is RM and µ R (A) = 1/3. The set is not RM since 1 T µ(A ∩ [−T /2, T /2]) oscillates between 0 and 1 in a continuous piecewise linear manner as T → ∞.
The relative measure is the Lebesgue measure normalized so that the relative measure of the real line is equal to 1, that is, µ R (R) = 1. Note that such a normalization is obtained by a limit operation (as T → ∞) since the Lebesgue measure of the real line is infinite. Therefore, only Lebesgue-measurable sets with infinite Lebesgue measure can have a finite relative measure. For a set of points on the real line to have nonzero relative measure, it must contain subintervals of nonzero Lebesgue measure that cannot be contained within any finite interval. As defined above, the relative measure of such a set is the limit of the ratio of 1) the total Lebesgue measure of those points in the set contained in a contiguous window of width T to 2) the window width, as T approaches infinity. In other words, it is the fraction of the real line that the set occupies.
The normalization µ R (R) = 1 makes the relative measure of subsets of the real line a counterpart of the probability measure P defined on sets belonging to the σ-field F of the sample space Ω. For the probability measure, however, the normalization P (Ω) = 1 is obtained without a limit operation. That is, the sample space, before the measure normalization, is assumed to have finite measure. Thus, the normalized probability measure of a set is obtained by the ratio of the original un-normalized measure and the assumedto-be-finite measure of the whole sample space.
Such a subtle property of the sample space of the classical probability measure has been surfaced by only a few authors (see e.g., Halmos [48] and [49, p. 31]) even if the criticality of considering an infinite sample space (although with finite measure) was already surfaced by Kolmogorov in his fundamental work on the theory of probability where he addressed the necessity to introduce Axiom VI (Axiom of Continuity) which cannot be derived from Axioms I-V.
Citing [66, p. 15]: "For infinite fields, on the other hand, the Axiom of Continuity, VI, [is] proved to be independent of Axioms I-V. Since the new axiom is essential for infinite fields of probability only, it is almost impossible to elucidate its empirical meaning, as has been done, for example, in the case of Axioms I-V in § 2 of the first chapter. For, in describing any observable random process we can obtain only finite fields of probability. Infinite fields of probability occur only as idealized models of real random processes. We limit ourselves, arbitrarily, to only those models that satisfy Axiom VI. This limitation has been found expedient in researches of the most different sort." The normalization of the relative measure obtained by a limit operation results in µ R having mathematical properties that render it less amenable to mathematical analysis than do those of the probability measure P , as explained subsequently.
The class C of the RM sets is not closed under union and intersection. That is, there exist A, B ∈ C such that A∩B ∈ C [72,

B. ABSENCE OF σ-ADDITIVITY OF THE RELATIVE MEASURE
Since the Lebesgue measure is additive, the relative measure is also additive. That is, if A, B ∈ C, and A ∩ B = ∅, then [72,Fact 2.4]. However, the relative measure is not σ-additive. In fact, following the Kac example [60, p. 46] The absence of the σ-additivity property makes µ R less attractive from a mathematical point of view than the probability measure P whose σ-additivity is a consequence of Axiom VI [66, p. 15]. The criticality of such an Axiom (and its consequences) has already been discussed in previous paragraphs. The importance of σ-additivity for ergodicity, and in particular in the Poincaré's proof of the Recurrence Theorem, is surfaced in [69, p. 17], [82, p. 80].
From the above considerations, it is clear that a probabilistic model built for a persistent single function of time starting from the relative measure will have mathematical properties less amenable to mathematical analysis than will the classical probabilistic model of a stochastic process which is built starting from the probability measure P . As explained below, this fact, which could initially appear to be a weakness of µ R in comparison to P , constitutes, instead, a motivation to adopt the FOT approach for signal analysis instead of the classical stochastic-process approach. VOLUME 4, 2016

C. RELATIVELY MEASURABLE FUNCTIONS
Let x(t) be a Lebesgue measurable function. The function x(t) is said to be relatively measurable if the set {t ∈ R : in all points ξ where the limit exists. In (4c), u(ξ) denotes the unit step function, that is, u(ξ) = 1 for ξ 0 and u(ξ) = 0 for ξ < 0. The function F x (ξ) is non decreasing and with values in [0, 1]. Thus, it has all the properties of a valid distribution function, except for the right-continuity property (in the discontinuity points). It represents the fraction-of-time (FOT) that the function x(t) is below the threshold ξ ( Fig. 1) [34], [37], [55]. For this reason, F x (ξ) is referred to as the FOT distribution of the function x(t).
Since the relative measure of finite sets is zero, every finiteenergy or transient function x(t) has the trivial distribution function F x (ξ) = u(ξ). Only finite-average-power or persistent functions can have a non-trivial FOT distribution. The FOT distribution of almost-periodic functions is studied in [64], [109], [111].
Sets that are non Lebesgue measurable are difficult to be visualized and are very rarely, if ever, encountered in applications. In contrast, non-RM sets are not rare or exotic (see (3)) and non-RM functions also can easily be constructed. In is RM if and only if A is a RM set and it follows that This fact has been exploited in [73] to design modulation formats for which statistical functions cannot be measured by time averages, for the purpose of obtaining secure communications. Let x(t) be a relatively measurable, not necessarily bounded function and let g(·) be continuous, bounded, and such that for any ∈ R, the equation g(ξ) = admits at most a finite number of solutions for ξ belonging to any finite interval. The following Fundamental Theorem of Expectation where the first integral is in the Lebesgue sense and does not depend on t 0 and the second one is in the Riemann-Stieltjes sense.
From (6) it follows that the infinite-time average is the expectation operator for the FOT distribution F x (ξ) and for every bounded x(t) we have The analogy of the FOT approach with the classical stochastic-process approach [34, Sec. 8.6], [40], is evident. On the nexus between the two approaches see also [31] and [81]. For a 1st-order strict-sense stationary process X(t) with distribution F X (ξ) P [X(t) ξ], the stochastic counterpart of (4c) is where E{·} is the ensemble average and the stochastic counterparts of (6) and (7) are and respectively. A necessary and sufficient condition for the relative measurability of a function is not known. However, if x(t) is a bounded function, the existence of the time average for every positive integer p is a necessary condition for the relative measurability of x(t). In addition, accounting for the Fundamental Theorem of Expectation, if x(t) is continuous and bounded and the left-hand side of exists for any positive integer p, then x(t) is relatively measurable, and equality (11) holds [110]. Finally, note that the absence of right-continuity of the FOT distribution is not important in applications where integrals in dF x (ξ) are of interest (see (6) and (7)). In the stochastic approach the right-continuity of the distribution is consequence of the σ-additivity of the probability measure P .

D. JOINTLY RELATIVELY MEASURABLE FUNCTIONS
By building counterexamples, it can easily be shown that the class of the RM functions is not closed under addition and multiplication [72,Theorem 3.5]. Thus, the class of the RM functions is not a function space. Such a results is in accordance with the fact that the class of finite-averagepower functions is not a linear vector space [4], [76], and [79].
Note that, in contrast, the linear combination of two stochastic processes is still a stochastic process, provided that the two sample spaces are assumed to be jointly measurable. This, however, is not an innocuous assumption. Consequently, even if there exists an analogy of results between the FOT and stochastic approaches (compare (7) and (10)), the stochastic process model for a single realization at hand should be used carefully, since properties of the stochastic process do not necessarily correspond to analogous properties of the function of time at hand. Such a deep difference between properties of stochastic processes and properties of functions constitutes a strong motivation for adopting the FOT approach for signal analysis. In fact, the experimenter is interested in effective properties of the time series at hand rather than better properties of an abstract model.
The uniformly almost-periodic functions constitute a linear vector space [22,Chap. 1]. However, such a class of function, albeit broad and useful for approximating every function on a finite observation interval, does not provide a suitable model for most signals encountered in practice. An example is the class of communications signals. Several attempts have been made to consider sub-classes of the class of the finite-average-power functions in order to obtain linear vector spaces [5], [76], [103], [115], [116], [117]. None of these approaches involve FOT distributions or allow one to construct a (non-stochastic) probabilistic model.
Alternatively, in the FOT approach, the joint characterization of two (or more) functions is made by introducing the concept of joint relative measurability of functions [72]. In particular, it can be shown that the sum and product of jointly RM functions is in turn a RM function. Thus, for such a function, an FOT probabilistic model can be constructed. The joint relative measurability is an analytical property of functions and is therefore easier to verify than is the analogous property in the stochastic process framework, that is, the joint measurability of sample spaces. The latter property, in fact, cannot easily be verified in practice since, generally, the sample spaces are not specified.
Two Lebesgue measurable functions x(t) and y(t) are said to be jointly RM [72, Definition 4.1] if the limit The function F xy has all the properties of a bivariate joint distribution function with the exception of right continuity in the discontinuity points. Such a definition extends naturally to n > 2 functions. Let x(t) and y(t) be jointly RM functions. Then each function is RM [72,Theorem 4.1]. In addition, the sum x(t) + y(t) and the product x(t) y(t) are RM, provided that at least one of the functions is bounded [72,Theorem 4.2].
An extension to the multivariate case of the fundamental theorem of expectation (see (6)) can be derived [72,Theorem 4.5]. In particular, if x(t) and y(t) are bounded functions and x(t + τ ) and y(t) are jointly RM for every τ , then the temporal cross-correlation function [106] of x and y is given by

E. ABSENCE OF σ-LINEARITY OF THE INFINITE-TIME AVERAGE
As a consequence of the absence of σ-additivity of the relative measure µ R , although the corresponding expectation operator is linear, it is not σ-linear. For example, accounting for the identity we have This result is different from the corresponding one in the stochastic approach where the expectation operator is σlinear, provided that the underlying infinite series of random variables is absolutely convergent [66].
As for the absence of σ-additivity of the relative measure µ R , the absence of σ-linearity of the corresponding expectation operator, the infinite-time average, could initially appear as to be a weakness of µ R in comparison with P ; however, it actually constitutes, a motivation to adopt the FOT approach for signal analysis instead of the classical stochastic-process approach. In fact, the realizations of a stochastic process do not necessarily exhibit properties analogous to those of the stochastic process (Fig. 2). Along ω ∈ Ω, the probability measure P (·) is σ-additive and the expectation operator, i.e., the ensemble average E{·}, is σ-linear. Along t ∈ R, the relative measure µR(·) is not σ-additive and the expectation operator, i.e., the infinite-time average · t, is not σ-linear.
The absence of σ-linearity of the expectation operator in the FOT approach is illustrated by a suitable example also with reference to the most general case, the almost-periodic component extraction operator, in Section VI-A.
In Appendix A, with reference to the signal considered in Section VI-A, it is shown that the absence of σ-linearity does not necessarily prevent one from performing calculations of infinite time averages; it simply prevents one from using the short cut of interchanging infinite time average and infinite summation operations.

F. CONDITIONAL RELATIVE MEASURABILITY AND INDEPENDENCE
Let A and B be Lebesgue measurable sets and {B n } be an arbitrary increasing sequence of Lebesgue measurable subsets of B with 0 < lim n→∞ µ(B n )/n < ∞, ∪ n∈N B n = B. The conditional relative measure of the set A given B is defined as [72,Def. 5 Let x(t) and y(t) be jointly RM. In [72,Theorem 5.2] it is proved that the functions x(t) and y(t) are independent if and only if, ∀(ξ 1 , ξ 2 ) ∈ R 2 except at most a countable set of straight lines, we have the equality As an example, two sine waves with incommensurate periods are independent [60], [62].
If x(t) and y(t) are independent, then, for every ξ 1 and ξ 2 , That is, the normalization of µ(A(ξ 1 )) to obtain a relative measure can be made by considering either subsets B n (ξ 2 ) of the set B(ξ 2 ) built from y(t) or subsets B n = [−n/2, n/2] (not depending on y(t)). Moreover, the subsets B n (ξ 2 ), can be arbitrary [72, Theorem 5.1], provided that they satisfy the conditions preceding (17). In other words, the function y(t) from which the normalizing sets B n (ξ 2 ) are constructed, has no influence on the relative measure µ R (A(ξ 1 )|B(ξ 2 )) and such a relative measure equals µ R (A(ξ 1 )). This result is in agreement with the intuitive concept of independence of two functions or signals in the sense that they have no link with each other.
The intuitive interpretation of the definition of independence in the FOT probability framework has no counterpart in the stochastic process framework where independence of processes is defined as the factorization of the joint distribution function into the product of the marginal ones [13], [27], [66]. In contrast, in the FOT probability framework such a factorization is proved to be true as a consequence of the intuitive definition in terms of conditional relative measurability [72,Theorem 5.2].
Note that, in classical probability theory, several authors define independence of events A and B by the condition P (A|B) = P (A), which is equivalent to the condition P (A, B) = P (A) P (B) due to the definition P (A|B) P (A, B)/P (B) of conditional probability. However, such a definition of independence, unlike the definition in the FOT approach, does not involve the sequence of subsets B n arbitrarily constructed starting from B (provided that they satisfy the conditions preceding (17)). Hence, the definition of independence in the classical probability theory does not lead to an intuitive interpretation of independence as does that in the FOT approach.
The concept of independent functions has been considered in [60] and [62], [100], and [101], where equation (18) is taken to be the definition of independence and, consequently, no link with an intuitive concept of independence is established. A similar point of view is adopted in [77], [78]. In [51], a condition equivalent to the joint relative measurability is considered before defining independence as the factorization of the joint distribution into the product of the marginals.

G. CENTRAL LIMIT THEOREM
If a sequence of independent zero-mean time-series {ϕ k (t)} k∈N satisfies some mild regularity assumptions, then the FOT distributions of the functions in this sequence approach a zero-mean normal distribution as n → ∞ [24, Theorem 3.5]. That is, we have the FOT Central Limit Theorem (CLT) with σ 2 equal to the average-over-k value of the FOT variances of the functions ϕ k (t).
The proof of this FOT CLT is based on the Taylor series expansion of the characteristic functions of x n (t) similarly to the proof of the CLT in the classical stochastic approach. The proof in the FOT approach, however, is more challenging due to the presence of the limit operation in the definition of the relative measure µ R , which limit is not present in the definition of probability measure P .
The exploitation of the FOT CLT theorem allows one to overcome some difficulties that arise in the classical stochastic-process approach in the derivation of a widely adopted model for a communication channel [24, Section VI]. In the stochastic approach, a multipath Doppler channel is shown to introduce normally distributed gains under mild assumptions on the input signal and the channel characteristics [32, Chap. 9.2-9.5, pp. 334-359], [96, Chap. 14-1, pp. 759-762], The justification, however, is only heuristic. In fact, in all the justifications, the input/output relationship of the channel is described in terms of deterministic signals and systems. Then, a stochastic model, whose statistical behavior should reproduce the time behavior of the deterministic model, is heuristically constructed.
In contrast, in the FOT approach in [24, Section VI], it is shown that the multipath Doppler channel introduces a time-varying gain which is RM with normal FOT distribution when the length of the observation interval approaches infinity and the number of paths approaches infinity. The order of these two limit operations cannot be interchanged. Moreover, even for a moderate number of paths, the distribution is almost normal, provided that the observation interval is sufficiently large.

H. WOLD'S ISOMORPHISM
In [112], Wold constructs a discrete-time stochastic process whose sample paths are time-shifted versions of a single time series, and he thereby establishes an isomorphism whose continuous-time counterpart is In such a case Ω = R. Thus, in order to have P (Ω) = 1 we must have P (·) ≡ µ R (·). This is an example of probability space for which measure normalization involves a limit operation. This fact, however, is not in agreement with the fundamental assumption that the probability space has finite measure (even before normalization), which assumption is made in the classical construction of a probability space [48,Sec. 5]. Therefore, Wold's isomorphism (for discrete time or extended to continuous time) is not compatible with the classical definition of a stochastic process.
Kolmogorov, in [66, p. 15], says that Axiom VI is necessary when the field is infinite. He does not point out the difference, at least in [66], between the case of infinite measure before normalization (e.g., the sample space is the real line) and the case of finite measure before normalization (e.g., the sample space is any finite interval). If the sample space has infinite measure before normalization, then the normalization P (Ω) = 1 is obtained by a limit operation as for the relative measure. In such a case, the probability measure is not σ-additive. Adding Axiom VI adds the σadditivity property, and this is generally allowable when the sample space is not specified and also allowable when it is specified and does not contradict this axiom. But, of course, Axiom VI is not allowable when the sample space is specified and contradicts the axiom. For example, the sample space specified in Wold's isomorphism requires that the probability measure be normalized through a limiting process, and this normalization does not allow the measure to be σ-additive.

VOLUME 4, 2016
Wold's isomorphism is extended in [37], [43] to cyclostationary time series. In such a case, a stochastic process is constructed such that its sample paths are time-shifted versions of a single time series with time shifts that are integer multiples of the period of cyclostationarity T 0 . That is, In [56], the details of Wold's isomorphism between cyclostationary stochastic sequences and cyclostationary numerical sequences are presented. It is shown how Hilbert-space representations of cyclostationary stochastic sequences are interpreted in the case of numerical cyclostationary sequences.

IV. ALMOST-PERIODIC PHENOMENA
For every time-series x(t) for which the sinusoidally weighted time average exists ∀η ∈ R, there exists the decomposition where E 1 is the countable set of frequencies η such that x η = 0 and the residual term x r (t) does not contain any finite-strength additive sine-wave component The periodic component with period T 0 contained in the time series x(t) can be extracted by the synchronized averaging theorem [34], [37] x T0 (t) = lim provided that the Fourier series is absolutely convergent. The periodic component extraction operator is generalized by the almost-periodic (AP) component extraction operator E {α} {·}, that is, the operator that extracts all the finitestrength additive sine-wave components of the function in its argument. If the AP component is poly-periodic with incommensurate periods T 1 , . . . , T P , then where each periodic component x Tp (t) is given by (27b) with T 0 replaced by T p . From decomposition (25) we have In [43] it is shown that, under mild assumptions on the time series x(t), for every fixed t the function of ξ is a valid cumulative distribution function except for the right-continuity property (at the discontinuity points) (Fig. 3). For a further proof see also [90,Chap. 2]. The cumulative distribution function is almost-periodic by construction. It can be expressed by its Fourier series where Γ 1 is a countable set and the complex-valued Fourier coefficients are (Fig. 4) For γ = 0 the coefficient F 0 x (ξ) is coincident with the FOT distribution (4c). In addition, E 1 ⊆ Γ 1 and Moreover, for a well-behaved function g(·), the following fundamental theorem of expectation can be proved [43]: Equation (34) is the FOT counterpart of the fundamental theorem of expectation in the stochastic approach (9) with F X (ξ) therein replaced by a time-variant F X (ξ; t).
From (30) and (34) it follows that E {α} {·} is the expectation operator corresponding to the almost-periodically timevariant distribution F {α} x(t) (ξ). In addition, the almost-periodic probability density function (pdf) of a time series x(t) can be formally written as where δ(·) denotes Dirac's delta. It can be expressed by its Fourier series whose convergence is in the sense of generalized functions [114] and the complex-valued Fourier coefficients are given by (Fig. 5) The formal expressions (35c)-(37) allow to prove (34) by formal manipulation of Dirac's delta [43], [90,Sec. 2.3].
In [37, Part II], [43], [44], an extension of the FOT approach from stationary time series to time series that exhibit cyclostationarity is constructed; this class of time series is comprised of all those that are cyclostationary, or polycyclostationary, or almost cyclostationary. Such time series, by definition, have time-variant FOT distribution functions that are periodic, polyperiodic, or almost periodic.
The result that (30) is a valid distribution function is not obvious and opens a new and wider perspective on the FOT approach. Specifically, periodically, poly-periodically, or almost-periodically time-variant probabilistic functions can be constructed starting from the unique observed time series at the hand of the experimenter. This completely circumvents the typically unjustifiable assumption that there exists a stochastic process model that is stationary and ergodic and for which the time series at hand is a sample path.
Furthermore, periodic phenomena-those obtained from combinations of periodic or almost-periodic mechanisms and random phenomena-can now be modeled without incurring the dichotomy between abstract properties of stochastic processes and concrete properties of individual sample paths [41]. Examples of this dichotomy are given in Section VI.
For an AP function x ap (t), we have That is, the AP component extraction operator, applied to an AP function, produces the AP function itself. In addition, according to (35c), for an AP function, the time-varying probability density function above the (ξ, t) plane is an impulse fence Since the AP component extraction operator is the expectation operator in the AP FOT probability framework, it follows from (38) that the AP functions are the deterministic signals in the AP FOT probability framework. All other signals are the random signals. Note that the term "random" here is not intended to be synonymous with "stochastic". In fact, the adjective stochastic is adopted, as usual, when an ensemble of realizations or sample paths exists, whereas the adjective random is used in reference to a single function of time.
In other words, decomposition (25) can be interpreted as the decomposition of a generic random signal x(t) into its deterministic (that is, AP) component x ap (t) and a residual component x r (t)

VOLUME 4, 2016
A detailed decomposition for a class of functions that suitably models communication signals is presented in the Appendix of [9]. Decomposition (40) is the FOT counterpart of the classical decomposition of a stochastic process into the sum of its mean value and a zero-mean (centered) stochastic process.
Starting from decomposition (40) an analogous decomposition can be determined for the impulse-response function of a linear time-variant system [58].
It is worth illustrating with a concrete example that a single time series can admit more than one FOT probability model, depending on which cycle frequencies, if any, are to be recognized. Here we consider a signal that admits two distinct FOT probabilistic descriptions. The time-series is random in the stationary FOT framework and has FOT density ( Fig. 6) whereas it is deterministic in the almost-periodic FOT framework and, according to (39), has FOT density ( Fig. 7) In Section VI-B, a presented example shows how a proper choice of the FOT model enables one to avoid stochastic-process discrepancies existing between properties of a stochastic process model and those of its sample paths. By adopting the almost-periodic component extraction operator as an expectation operator, a valid second-order almost-periodically time-variant joint distribution and a valid almost-periodically time-variant autocorrelation function can be defined [37], [43], where A is a countable set of (possibly incommensurate) cycle frequencies α and the Fourier coefficients (46) are the cyclic autocorrelation functions. The finite-time Fourier transform at a particular frequency value f is the spectral component of the time series x(t) at frequency f with approximate finitebandwidth ∆f . Thus, the cyclic spectrum represents the temporal correlation of two spectral components separated by the quantity α when the bandwidth ∆f becomes infinitesimal. For an almost-cyclostationary signal, S α x (f ) is non zero for only α ∈ A. That is, only spectral components that are separated by quantities equal to one of the cycle frequencies are correlated. In addition, the following Gardner relation [90, pp. 10, 20, 56, 57, 139], originally introduced in [35], [37], holds.
For α = 0 it reduces to the Wiener relation between the power spectrum and the time-average autocorrelation function. For this reason, Gardner originally referred to (49) as Cyclic Wiener Relation.
The extension of Wiener's GHA to time series that exhibit cyclostationarity can also be achieved by defining cyclostationarity as the property of time-series that enables the regeneration of additive finite-strength sine wave components from hidden periodicities in the time series by using homogeneous quadratic time-invariant transformations [37], [35], [39].
Further approaches for signal analysis based on single functions of time are possible by considering as deterministic signals the polynomial phase signals [31], [85].

V. STATISTICAL FUNCTION ESTIMATION A. CYCLOERGODICITY
Cycloergodic theory, which extends and generalizes existing ergodic theory, is developed in [16], where it is shown that sinusoidal and periodic components of time-varying probabilistic parameters can be consistently estimated from time averages on one sample path. It is also established that a strict-sense theory of cycloergodicity inclusive enough to cover all applications of practical interest had, at that time, not yet be shown to exist. Moreover, it is shown that such a theory cannot presuppose the existence of a dominating stationary measure, as does the theory presented therein. Nevertheless, it would appear that it can be argued that because a continuous-time cyclostationary process can be characterized as a discrete-time vector-valued (or functionvalued) stationary process, Birkhoff's Ergodic Theorem [14] for scalar-valued discrete-time stationary processes, if generalized to vector-valued processes, leads to a completely analogous cycloergodic theorem for continuous-time cyclostationary processes. The vector (or function), at any discrete time equal to an integer multiple of the period of cyclostationarity, consists of the infinite set of process values over the period between that discrete time and the previous discrete time.
Furthermore, it is shown in [47,Chap. 7,and refs. therein] that Birkhoff's ergodic theorem has been extended from stationary to asymptotically-mean stationary (AMS) discretetime processes. This extension guarantees the existence of consistent estimators for the discrete-time averages of timevarying probabilistic parameters, such as probability density functions. Because almost-cyclostationary (ACS) discretetime processes are AMS, this extended theorem applies to discrete-time ACS processes (and the same might well be true for continuous-time ACS processes after discrete-time sampling) but it does not apply directly to estimation of the sinusoidal and periodic components of almost-periodically time-varying probabilistic parameters.
Nevertheless, [47,Chap. 7] does discuss ergodicity of Nstationary discrete-time processes, which are N -dimensional vector-valued representations for discrete-time cyclostationary processes with period N . Furthermore, the discrete-time infinite-dimensional vector-valued process described above that represents a continuous-time scalar-valued process is AMS if that continuous-time process is ACS (which includes, as special cases, poly-cyclostationary, cyclostationary, and stationary processes).
Consequently, for any selected period of a continuous-time ACS process, one can form a discrete time vector-valued AMS process as explained above. Then the time average of a probabilistic parameter of this vector-valued process will equal the periodic component of probabilistic parameter of the original ACS process. In this way any periodic component for any real-valued period T of the almost periodically time-varying probabilistic parameters of the original scalarvalued continuous-time ACS process can be guaranteed to be consistently estimable by applying the proposed ergodic theorem to the infinite-dimensional vector-valued discretetime AMS process.
It follows that the discrete-time AMS version of the Birkhoff ergodic theorem can be extended/generalized to accommodate cycloergodicity for continuous-time ACS processes by requiring that the ergodicity condition for discretetime AMS processes be satisfied by the vector-valued representation for each and every period T . In addition, there is a partially cycloergodic version of this proposed theorem that satisfies the ergodicity condition for some but not all periods.
This leaves one class of ACS processes for which there is so far no known cycloergodic theorem, and this is the class of discrete-time processes having measures that possess nonzero sinusoidal components with sine-wave frequencies that are incommensurate with the time-sampling rate. Some such processes do indeed allow for consistent estimation of such sinusoidal components, but others do not. A necessary and sufficient condition for consistent estimation is not presently known.

B. FOT INTERPRETATION OF THE PROBLEM OF STATISTICAL FUNCTION ESTIMATION
In the stochastic approach, the estimator of the expected value of a wide-sense stationary stochastic process is the time average of one sample path of the process over the finite time interval [t − T /2, t + T /2] with center t and width T . Similarly, in the FOT approach, the estimator of the infinitetime average of a time series is the finite-time average of this time series over [t − T /2, t + T /2].
In the stochastic approach, t is fixed (and typically assumed to be 0 or T /2). The estimate is a random variable, that is, it depends on the sample point ω ∈ Ω which determines the sample path or realization of the process. The variability of the estimate reflects its dependence on VOLUME 4, 2016 the sample path used for the estimation. Under appropriate mixing and stationarity assumptions, the estimate converges in some probabilistic sense as T → ∞ to the expected value of the process. In the FOT approach, the variability of the estimate reflects its dependence on t, the central point of the observation interval, when t ranges over a wider temporal interval, say [−Z/2, Z/2], with Z T [37,Chap. 15], [88,Sec. 6.3.5].
In the FOT approach, asymptotic characterization of the convergence of the estimator is expressed in terms of a double limit as Z → ∞ and T → ∞, provided that Z/T → ∞. The limit in Z produces the average-over-t behavior of the estimate, such as the average error (bias) and the average squared deviation of the estimate about its average value (the variance), where t is the central point of the observation interval. The limit in T produces an estimate that uses the whole time series for estimation.
Let z(t) be a RM time series obtained as a frequencyshifted second-or higher-order lag product of another time series x(t). That is, Second-and higher-order cyclic moments of x(t) are infinitetime averages of z(t) [44], [99]. The finite-time average is the estimator of the infinite-time average For T sufficiently large (much greater than the longest period of cyclostationarity of z(t)) the function t → m is approximately (asymptotically exactly) wide-sense stationary in the FOT sense. That is, m (T ) z (t) and its homogeneous nonlinear transformations do not contain any additive finite-strength sine-wave component with nonzero frequency. Thus, the FOT expectation operator of interest for asymptotic properties of the estimator is the infinite-time average. The performance of the estimator can be expressed in terms of FOT bias and variance [37,Chap. 15] bias m (T ) where the two approximations become exact equalities in the limit as T → ∞. Equation (53) shows that in the FOT approach the estimator m Assuming summability of second-and fourth-order temporal cumulants of z(t), the estimator is mean-square consistent in the FOT sense, that is, In addition, under further temporal cumulant summability assumptions, the function has a normal FOT distribution as T → ∞ [25]. A cyclic spectrum estimator is the time-smoothed cyclic periodogram, which is the right-hand side of (48) with finite T and ∆f . The estimate is accurate (small FOT bias) provided that ∆f is sufficiently small, and it is reliable (small variance) provided that the smoothing product T ∆f is sufficiently large [37,Chap. 13]. Moreover, this estimator is asymptotically equivalent to the frequency-smoothed cyclic periodogram [42] which, for α = 0, reduces to the frequencysmoothed periodogram for power-spectrum estimation.

VI. EXAMPLES OF THE DICHOTOMY BETWEEN PROPERTIES OF A STOCHASTIC PROCESS AND THOSE OF ITS SAMPLE PATHS
In this section, three examples of the dichotomy between properties of a stochastic process and those of its sample paths are illustrated. Specifically, properties that are valid for stochastic processes are shown to be not valid for their sample paths. Such drawbacks of the stochastic-process based models constitute a strong motivation for the adoption of the FOT approach.

A. EXPECTATION OF A PAM TIME SERIES
Let us consider a pulse-amplitude modulated (PAM) stochastic process where { ξ n } n∈Z is a sequence of random variables with equiprobable values in a finite alphabet and the pulse q(t) ∈ L 1 (R). Under the assumption the σ-linearity of the statistical expectation operator E{·}, validates the interchange of the infinite summation and expectation operations, and the expected value of ξ(t) is given by Such an interchange of expectation operator and infinite summation is not allowed on the single sample paths (after having replaced the ensemble mean by the infinite time average). Thus, it is not allowed in the FOT approach. Let us consider the PAM time series with x n a discrete-time time series. Even if for the time series x(t) in general we have , it is shown that by properly executing the almost-periodic component extraction operation, the FOT expectation of x(t), which is its almost-periodic component, can be shown to be given by where E { α} {·} denotes the discrete-time almost-periodic component extraction operator.
Comparison of the procedures used to obtain the expected value (59) of a PAM process and its FOT counterpart (64) shows the required difference in ways of executing the expectation operator in the stochastic and FOT approaches. In fact, even if a summability condition for the discrete-time sequence is satisfied, the FOT expectation operation cannot be freely interchanged with the infinite-summation operation. Thus, the result (59) valid for the stochastic process cannot be immediately extended to its sample paths by replacing the ensemble mean with the infinite time average. In contrast, the result derived in the FOT approach is valid by default for the unique time series at the hand of the experimenter. As an example, the calculation of the infinite time average in the left-hand side of (62) is reported in Appendix A.
Note that every digitally modulated signal, and in particular the PAM process (57), can be modeled as a stochastic process whose sample space (before normalization) has finite measure. Therefore, the existence of a finite-measure sample space is not sufficient to avoid the above-mentioned dichotomy between a stochastic process and its sample paths.
The fact that digitally modulated signals can be modeled as stochastic processes with finite sample space follows from the subsequent considerations. The sequences of symbols belonging to a finite alphabet of size D can easily be modeled as sample paths of a discrete-time stochastic process whose sample space has finite measure and is isomorphic to the interval [0, 1]. In fact, every finite or infinite sequence can be seen as the expansion in base D of a number ω taken in the unit interval that plays the role of the sample space Ω. The probability measure is the Lebesgue measure. Sequences corresponding to the rational numbers have zero Lebesgue measure (they occur with probability zero). Typical sequences correspond to Borel's normal numbers, that is, those satisfying the weak law of large numbers: Each letter of the alphabet occurs in the sequence with relative frequency 1/D and typical sequences occur with probability one.
Therefore, every digitally modulated signal can be modeled as a stochastic process whose sample space (before normalization) has finite measure since its randomness is due to the randomness of the modulating sequence. This is true even if other parameters are modeled as random variables.

B. LINEAR TRANSFORMATION OF A GAUSSIAN SIGNAL
Let us consider the stochastic process where G(t) is a zero-mean strict-sense stationary Gaussian stochastic process and f 0 , and φ 0 are deterministic parameters. The process X(t) is a periodically time-variant linear transformation of a Gaussian process. Thus, it is in turn Gaussian. Due to the periodic modulation, X(t) is strictsense cyclostationary with period of cyclostationarity T 0 = 1/(2f 0 ). In particular, its variance is a periodic function of t with period T 0 . Let be a single realization of the stochastic process X(t), where g(t) is a single realization of the stochastic process G(t).
Albeit the stochastic process X(t) is Gaussian, its realizations do not have an empirical (stationary) Gaussian distribution. In fact, the time-series g(t) and c(t) cos(2πf 0 t + φ 0 ) have stationary FOT densities and f c (ξ) given in (42), respectively. Since the functions g(t) and x(t) are independent in the FOT sense [72,Sec. 5], the FOT density of the product waveform x(t) = g(t) c(t) is given by the classical formula [2, Chap. 2, Problem 14] which is a non-Gaussian density. It is the stationary FOT density of x(t) which is coincident with the (stationary) empirical density. In Fig. 8, results are reported for a band-limited zero-mean Gaussian time-series g(t) with bandwidth 1/(64T s ), where VOLUME 4, 2016 T s is the sampling period, which modulates a cosine with f 0 = 1/(16T s ) and φ 0 = 0. The thin line represents theoretical non-Gaussian density obtained by numerically evaluating the integral in (68) as predicted by the FOT model; the dotted line is the density estimated from the data by a kernel-based estimator (with Gaussian kernel); [97,Sec. 2.1.8,; the dashed line represents a zero-mean Gaussian density with variance equal to the sample variance estimated from x(t). The data-record length is T = KT 0 with K = 2 10 .
In the above result, the stationary FOT model of Section III-C is considered. A more appropriate statistical description of the time series x(t) can be obtained by adopting the almost-periodic FOT model of Section IV (which includes periodic models as a special case).
In Fig. 9 the periodically time-variant distribution estimated by the synchronized average is reported for t = nT s , n = 0, 1, . . . , 15. Moreover, asterisks represent the theoretical zero-mean Gaussian distribution numerically evaluated in correspondence with the periodically time-variant variance estimated by the synchronized average The estimated periodically time-variant distribution (69) exactly fits the Gaussian distribution (70) with estimated periodically time-variant variance (71). For t = 4T s and t = 12T s , cos(2πf 0 t) = 0 and one has the degenerate distribution u(ξ) obtained from (70) in the limit as σ → 0.
No one would call the stochastic process (65) non-Gaussian. In particular, for every set of fixed values t 1 , . . . , t n , the random variables X(t 1 ), . . . , X(t n ) are jointly Gaussian. However, the (stationary) empirical density of X(t) approaches the non-Gaussian density (68) as the data-record length approaches infinity. The reason for this is that the appropriate stochastic stationary distribution for a cyclostationary process is the time average of the stochastic cyclostationary distribution, which is a mixture of nonidentical Gaussians and is therefore non-Gaussian; it is the limit of what the stationary empirical distribution produces. The mismatch between the Gaussianity of the random variables X(t) and the non-Gaussianity of the stationary distribution can be overcome by modeling φ 0 as a random variable uniformly distributed in [0, T 0 ). In such a case, the process X(t) is non-Gaussian, strict-sense stationary, with density given by (68), which is also the limit of the empirical density. The price paid for assuming the stationary model is that the stationary process is not cycloergodic. In fact, every realization is a cyclostationary time series. In particular, the Fourier  coefficient (32) with γ = k/T 0 , k integer, and the cyclic autocorrelation functions (46) with α = k/T 0 , k ∈ {0, ±1}, are not identically zero for almost all realizations.
In summary, in the stochastic process framework, we have the following. If the stochastic process (65) is modeled as Gaussian cyclostationary (φ 0 is modeled as a deterministic parameter), then its stationary empirical distribution approaches a non-Gaussian distribution. If the stochastic process (65) is modeled as a stationary process (φ 0 is modeled as a random variable uniformly distributed in [0, T 0 )), then it is non-Gaussian, its stationary density is given by (68), but it is not cycloergodic.
Such a discrepancy, is not present in the FOT approach. The time series x(t) is cyclostationary. That is, its periodically time variant distribution (30) does not degenerate into the stationary distribution (4a)-(4c). If one decides to exploit the stationary model, then x(t) in (66) is non-Gaussian and the empirical distribution approaches the non-Gaussian distribution whose density is given by (68) with (67) and (42) substituted in. If one decides to exploit the cyclostationary model, then x(t) in (66) is Gaussian and the finite-time FOT distribution (69) approaches the periodically time-variant distribution given by (70) with σ replaced by a periodically time-variant standard deviation which is the square root of the limit in (71) as K → ∞. The non-Gaussian density (68) is the coefficient with zero frequency of the Fourier series expansion of the Gaussian periodically time-variant density function.
A linear periodically time-variant transformation of a Gaussian stochastic process is in turn a Gaussian process, but with realizations whose stationary empirical distribution is non Gaussian. Such a dichotomy is not present in the FOT framework. Both stationary and almost-cyclostationary models give rise to probabilistic functions that are fitted by their estimates obtained from the data.

C. STATISTICAL INDEPENDENCE
Let us consider the two stochastic processes where B(t) is the binary PAM stochastic process for which {B n } n∈Z is a sequence of independent and identically distributed binary random variables assuming values ±1, {Θ i } are random variables uniformly distributed in [0, T i ), i = 0, 1, 2, and p(t) = rect(t/T 0 ). All the random variables are statistically independent. It is well known that the random variables Θ i make the random processes X 1 (t) and X 2 (t) stationary but not cycloergodic [33]. That is, sample paths of X 1 (t) and X 2 (t) are cyclostationary time series. In the following, it is shown that the stationary model gives rise to a dichotomy between the statistical dependence property of the stochastic processes X 1 (t) and X 2 (t) and that of their sample paths.
The stochastic processes X 1 (t) and X 2 (t) are statistically dependent by virtue of the common factor B(t). Specifically, for every fixedt, the two random variables X 1 (t) and X 2 (t) are obtained as the product of the same random variable B(t) with the two random variables cos(2πt/T 1 + Θ 1 ) and cos(2πt/T 2 + Θ 2 ), respectively. In general we have That is, the necessary and sufficient condition for independence is not satisfied. Let be sample paths of X 1 (t) and X 2 (t), respectively, where b(t) is a sample path of B(t) and θ 1 and θ 2 are realizations of Θ 1 and Θ 2 . Almost every sample path b(t) is a pseudo-random Bass function [4], [5], [72,Sec. 6.3]. Therefore, it can be shown that x 1 (t) and x 2 (t) are FOT independent in the sense that their joint distribution factors into the product of the two marginals (see (18)), provided that the periods T i , i = 0, 1, 2, are incommensurate, that is, where Q is the set of rational numbers.
is satisfied. This is an example of a dichotomy between the statistical dependence property of the stationary stochastic processes X 1 (t) and X 2 (t) and the statistical independence of their sample paths x 1 (t) and x 2 (t).
However, if the AP FOT model is adopted, it can be shown that the condition (81) does not hold; that is, these time-series are statistically dependent by virtue of their common factor b(t).

VII. CONCLUDING REMARKS
The fraction-of-time (FOT) probability approach for signal analysis is an alternative methodological approach to the classical probability approach that models signals as sample paths of an ensemble which, together with a probability measure, is called a stochastic process. In the FOT approach, the unique function of time at the hand of the experimenter is not modeled as a representative of an ensemble. All the familiar probabilistic functions and parameters in this approach are constructed from this single function of time by adopting the relative measure of subsets of the real line (representing time) as a probability measure.

VOLUME 4, 2016
The relative measure is not σ-additive (additivity of countably infinite numbers of terms). Moreover, when it is adopted to construct the distribution function of the values assumed over time by a persistent function of time, the corresponding expectation operator, the infinite-time average, is not σ-linear (linearity of an operator applied to a linear combination of a countably infinite number of terms). These facts make the relative measure less amenable to mathematical manipulation involving infinite linear combinations than is the classical probability measure. However, when we define a stochastic process X(t, ω) as a function of two variables t ∈ R and ω ∈ Ω, for which ω is the ensemble index, the results of operations (e.g., calculations of values of time-distributions or time-averages) made over the time line indexed by t for some fixed ω (which represent empirical measurements) exhibit, by definition, the same properties as those of the FOT probabilistic functions and, therefore, do not have the σ-additivity and σ-linearity properties of the corresponding results of operations made over ω. Hence, if only one function of time is available at the hands of the experimenter, the abstraction of introducing a hypothetical ensemble indexed by ω creates an unnecessary dichotomy between theory and measurements. Directly considering probabilistic functions constructed over t avoids this dichotomy between the properties of the stochastic process and those of its individual sample paths, as illustrated in the examples presented in Section VI. This is the primary motivation for adopting the FOT probability approach for signal analysis.
The deep difference between the relative measure and the probability measure is a result of dictating that the measure of the hypothetical infinite sample space Ω be finite, which enables the introduction of Axiom VI of probability. These assumed properties are at odds with the empirical quantities to be modeled. They artificially give to the probability measure properties that are not shared with the relative measure, thereby rendering it more amenable to mathematical manipulations involving infinite linear combinations. Furthermore, although signals modulated with digital data have sample spaces with finite measure prior to normalization, as explained in connection with the PAM example in (57), many other communications signals do not have finite measure before normalization.
The conceptual usefulness of the FOT approach is evident in statistical spectral analysis of time series. This is the point of view of the book [37] by Gardner and the early work on generalized harmonic analysis [106]. In most problems of statistical spectral analysis, in fact, there is no need to model the available time series as a representative of an ensemble and, hence, there is no need to willfully incur a dichotomy between theory and measurements.
In Information Theory, compression of individual sequences has been shown to be a practicable and, in some cases, a convenient approach [118]. The same is true of channel coding. For example, the construction of block codes and convolutional codes and the corresponding decoding procedure can be made without involving any stochastic concept. However, in contrast to the practice of source and channel coding, in order to prove the channel coding theorem, one must adopt the concept of typical sequences whose probabilistic characterization is formulated in terms of the classical stochastic process. This is so, even though all the basic quantities used in Information Theory can be defined in terms of FOT probability.
Even if typical sequences can be characterized without introducing the stochastic process [60,Chap. 2], the proof of the channel-coding theorem is based on the characterization of typical sequences using the classical stochastic approach and the concept of the ensemble of all possible codes for a given channel [23,Chap. 8]. In addition, the channel is modeled as random. No proof of the channel coding theorem based on FOT probability is on the horizon.
A field where the FOT approach appears methodologically more appropriate than the classical stochastic approach is that of Monte Carlo simulations. They are FOT simulations, not stochastic simulations [25,Sec. 4.4]. In fact, a computer program for random number generation produces a unique periodic sequence with a very long period. Calling such a routine several times (with different seeds) is equivalent to picking different time segments of the unique sequence. So, the sample space is time indexed, not ensemble indexed. (If the period is sufficiently long, the sequence can be considered aperiodic for practical purposes.) The extension of the FOT approach to periodic phenomena, that is, those which produce time series through the interaction of periodic mechanisms and random phenomena, is based on the nonobvious result that the almost-periodic component extraction operator is a valid expectation operator. When it is applied to the indicator of the set of values of t where a time series is below a threshold ξ, one obtains, for every value of t, a valid distribution function in ξ. The obtained function is almost-periodic in t by construction. Therefore, it is suitable for an FOT probabilistic characterization of cyclostationary, poly-cyclostationary, and almost-cyclostationary time series. In particular, a complete temporal probabilistic theory for these time series can be constructed [35], [37], [43], including a theory of higherorder moments and cumulants [44], [99].
The FOT approach for almost-cyclostationary signals can be extended to the class of the generalized almostcyclostationary (GACS) signals [88,Chap. 2], [90,Chap. 12]. GACS signals have multivariate statistical functions almost periodic with respect to time with (generalized) Fourier series expansions for which not only the coefficients depend on the lag parameters of the time-shifted versions of the time series, but also the frequencies depend on the lag parameters. For these signals, the almost-periodic component extraction operator can be adopted as the expectation operator and a complete FOT higher-order theory is developed in [57].
Other generalizations of the class of almost-cyclostationary processes are the spectrally correlated (SC) processes [88,Chap. 4], [90,Chap. 13] and the oscillatory almostcyclostationary (OACS) processes [89], [90,Chap. 14]. For these classes of nonstationary stochastic processes, there apparently do not exist FOT-probability counterparts despite the relationships of these processes to almost cyclostationary processes because the nonstationarity of these generalizations is not almost periodic or otherwise of known functional form [38]. However, time-warped almost-cyclostationary signals have been treated in [46] with an approach that mostly avoids stochastic processes and ergodicity. .

APPENDIX A INFINITE TIME AVERAGE OF A PAM TIME SERIES
In this Appendix, the infinite time average of a PAM time series is calculated without resorting to the σ-linearity of the infinite time average that would lead to the wrong result in the right-hand side of (62). Let us consider the PAM time series (60) and, for the sake of simplicity, let as assume that the pulse q(t) has finite duration and the sequence x n is bounded, that is | x n | K, ∀n ∈ Z.
The finite time average of the PAM time series is m (T ) Since q(t) has finite support, only a finite number (not depending on N ) of terms of the sum in the right-hand side of (A5) is non zero. Consequently, in the limit as T → ∞ (and, hence N → ∞), we have Finally note that (A9) can be proved even after relaxing the assumption that q(t) has finite duration provided that it has a rate of decay to zero sufficiently fast as |t| → ∞.   McGraw-Hill, 1989. He is the editor of Cyclostationarity in Communications and Signal Processing, IEEE Press, 1994, and he is the author or co-author of chapters in four books. He holds various patents, is the author of over 100 peer-reviewed research papers, and has given about 50 invited lectures at university and industrial research laboratories. He received the international Best Paper of the Year award from the European Association for Signal Processing in 1986 for the paper, "The spectral correlation theory of cyclostationary time-series," the 1987 Distinguished Engineering Alumnus Award from the University of Massachusetts, and the international Stephen O. Rice Prize Paper Award in the Field of Communication Theory from the IEEE Communications Society in 1988 for the paper entitled "Signal interception: A unifying theoretical framework for feature detection". The present focus of his work is the educational website https://cyclostationarity.com/.