Roadway Traffic Sound Measured up on a High-Rise Building—The Sound-Level’s Statistical Normality

Percentile-value ceilings/thresholds have been mandated by governments around the world on roadway trafﬁc sound-level. Such percentile values, by deﬁnition, change with the sound-level’s underlying probability distribution, i.e., the same percentile can imply different percentile values for different probability distributions. Whether the underlying probability distribution is Gaussian or not for the roadway trafﬁc sound-level: contrary reports populate the open literature but such reports are typically weak in statistical rigor. This decades-long but ongoing debate will be surveyed comprehensively in this paper for the ﬁrst time in the open literature. Then, this paper will present two new datasets measured in two separate evenings at exactly the same location up in a high-rise building, and will employ the Jarque-Bera hypothesis test to rigorously show that neither dataset is Gaussian.

Such ''percentile values'' depend on the noise-level's governing probability density. That is, the same ''percentile'' could mean drastically different ''percentile values'' for different probability densities. For example, compare a Gaussian-distributed percentile value and a log-normaldistributed (thus non-Gaussian) percentile value: 1) A Gaussian (i.e., normal) random variable's cumulative distribution function F G (·) may be expressed explicitly in terms of its statistical mean µ G and its standard VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ deviation σ G : where erf(·) denotes the ''error function''. Therefore, the Gaussian distribution's mth-percentile value equals , ∀m ∈ [0, 100], (2) where the superscript −1 refers to the inverse of the function superscripted. 2) A ''log-normal'' random variable's cumulative distribution function F LN (·) may be expressed explicitly in terms of its statistical mean µ LN and its standard deviation σ LN : Hence, the log-normal distribution's mth-percentile value equals These two probability distributions' percentile values' difference is plotted in Fig. 1. at the 10th, the 50th, and the 90th percentiles. Clearly, the two distributions' corresponding ''percentile values'' are disparate even while both are preset at the same ''percentile'', whether the 10th, the 50th, or the 90th. That is, the probability density distribution would principally influence the roadway traffic sound-level percentile-value in relation to various governments' policy requirements. Consequentially, it is of practical interest to test out the Gaussian assumption that has often (implicitly) been made of the roadway traffic noise-level.

B. ROADWAY SOUND-LEVEL DISTRIBUTION's EMPIRICAL NORMALITY -A LITERATURE REVIEW
Is traffic sound level normally (Gaussian) distributed? Different responses have emerged over the decades, often without much (or any) rigorous statistical analysis. E.g., #1 Gaussian -In the 1960s, [27] presumed, but did not empirically test for, statistical normality of the noise level in its empirical data analysis. #2 Gaussian -Still in the 1960s, [28] used subjective assessment (without any rigorous statistical test statistic) to conclude that ''noise levels in the range between 10% and 90% appears sufficiently near normal'' for free-flowing traffic in a rural setting. #3 Inconclusive -In the early 1970s, [29] used non-rigorous statistical testing to examine if the pressure level data is Gaussian, but came to no definite conclusion. There was no description of where nor when the traffic data was taken. #4 Non-Gaussian -In the late 1970s, [30] found the sound power itself to be Gaussian, thereby implicitly implying the sound pressure level as not Gaussian. #5 Non-Gaussian -In the 1980s, [31] found its innercity ''banked'' traffic sound to have numerical values of mean, standard deviation, skewness, and kurtosis that were incompatible with any Gaussian distribution. This ''disproof by contradiction'' is statistically less rigorous than direct statistical hypothesis testing. #6 Non-Gaussian mostly -In the 1990s, [32] analyzed data measured at heights between 34 to 80 meters up on the high-rise buildings, on both sides of the facade. This reference presented ad hoc quantitative arguments (but not statistically rigorous hypothesis testing) to conclude that Gaussian distribution could apply for some datasets only for free-flowing traffic. (These ad hoc arguments could preclude statistical normality but cannot statistically affirm normality.) #7 Non-Gaussian -In the early 2000s, [18] presented noise level datasets that are close to log-tanh distributed (thus implicitly not Gaussian), according to only subjective inspection of data graphs but without any rigorous statistical testing. #8 Gaussian occasionally -Also in the early 2000s, [33] presented noise-level empirical datasets measured near the facade of buildings of unspecified heights, with a conclusion that some of these datasets are possibly Gaussian, based simply on visual inspection of (i) scatter plots of those datasets' sample noise-climate and sample variance -by checking if the scattered points (on the ''noise-climate / variance'' plane) lie in the subregions allowed by Gaussian distributions; (ii) scatter plots of the empirical datasets' sample kurtosis and sample skewness -by checking if the scattered points (on the ''kurtosis-skewness'' plane) lie in the subregions allowed by Gaussian distributions. These are, again, ad hoc quantitative arguments that could merely preclude statistical normality but cannot statistically affirm normality. #9 Non-Gaussian -In the mid-2000s, [34] presented noise-level empirical datasets measured near the facade of buildings of unspecified heights, assessing them as non-Gaussian, based on only visual inspection of scatter plots of the empirical datasets' sample kurtosis and sample skewness -to check if the scattered points (on the ''kurtosis-skewness'' plane) lie in the subregions allowed by Gaussian distributions. Like [32], [33], these are only ad hoc arguments. #10 Non-Gaussian -In the late 2000s, [35] mentioned of a heavy right tail in its sound data taken in ''shielded areas'' of a neighborhood with ''traffic events'', but made no direct reference to the statistical normality issue. #11 Gaussian -In the early 2010s, [36] [37] analyzed data measured on the exterior of the facade of high-rise buildings, at heights that varied between 1.5 meters and 90 meters up on the high-rise buildings. The Kolmogorov-Smirnov test indicated that ''65% of daytime noise datasets and 70% of nighttime noise datasets were normally distributed.'' This reference appears to be the only reference that subjects the data to statistically rigorous hypothesis-testing; however, this present work will improve on the statistical rigor and will arrive at a rather different conclusion.
This paper will add to the above empirical research literature regarding the statistical normality of the traffic sound level, by using conceptually rigorous statistical tests on new datasets, which were measured during two distinct evenings but at precisely the same location up at a high-rise building overlooking a highway.

C. ROADWAY SOUND-LEVEL DISTRIBUTION'S NORMALITY -MEASURED UP AT A HIGH-RISE BUILDING
High-rise buildings (commercial or residential) are ubiquitous in the world's metropolises, especially in East Asia. There, roads are urban canyons, boxed in on both sides by cement cliffs formed by the facades of high-rise buildings. As roadway sounds reverberate up these cement cliffs, the sound levels change.
Some studies empirically recorded roadway sound data up at a high-rise building; the aforementioned [32], [33], [34], [37] were, in fact, empirical measurements up the high-rise buildings near their facades. Besides these four references, all other studies (on traffic noise propagating up a high-rise building) did not investigate the sound level's probability distribution. Those other studies include: (a) [7] investigated how children's auditory discrimination and reading ability were affected differently according to the floor level in the building. (b) [38], [39] investigated how the noise level varied along the height up the exterior of the building. (c) [40] reported that the sound energy increased from ground level up to the 9th floor and then monotonically decreased as the floor level increases. (d) [41] statistically related the noise exterior to the building facade with the noise interior of the facade. (e) [42] related the noise level data with the daily motor traffic volume and the neighborhood's human population density. (f) [16] measured roadway sound-level data at 24 independent high-rise buildings in Hong Kong, each for more than 24 hours. Among other insights, [16] found that an arbitrarily chosen 30-minute period sufficed to characterize the noise climate in the ''evening'' time (19:00 -21:00) within +/-3 dB, for 85% of the cases.

D. THIS PAPER'S OBJECTIVE
This work will follow up the pioneering studies in [32], [33], [34], and [37] concerning traffic sound-level measurements up on high-rise buildings, by presenting and analyzing two previously unpublished empirical datasets, respectively collected in two separate evenings but at the same exact location up on a high-rise building. The statistical analysis here will be conceptually rigorous using the Jarque-Bera test, and will arrive at very different conclusions from [37] (which is the open literature's only VOLUME 10, 2022 other reference on traffic sound as measured up a high-rise building) about the sound level's statistical normality.
The rest of this paper is organized as follows: Section II will describe the to-be-analyzed measurements' urban setting, apparatus, metric, and data cleansing. Section III will present the exploratory data analysis which will hint that the datasets might be non-Gaussian. Section IV will present the confirmatory data analysis, statistically proving (via the Jarque-Bera statistical test) these datasets to be not Gaussian. Section V will conclude this investigation and will suggest follow-up studies.

II. THE ROADWAY TRAFFIC SOUND MEASUREMENTS' ENVIRONMENT, APPARATUS, METRIC, AND DATA CLEANSING A. MEASUREMENT ENVIRONMENT
The datasets were measured at exactly the same location in a particular neighborhood [43] in Hong Kong. There, a highway (the ''West Kowloon Corridor'') runs left/right as shown in Fig. 2. and extends for 4.2 km. The roadway sound of this West Kowloon Corridor has been found in [44] to be the major acoustic noise source to the Wing Cheong Estate. This highway is four-lane, bi-directionally divided, elevated above surface streets, bordered on one side by high-rise residential buildings, but open on the other side.
No traffic signal exists for over 1 km from the Wing Cheong Estate, so no red light could arise to back up the otherwise free-flowing traffic. Therefore, vehicular traffic flows freely on this highway with no temporally cyclic non-stationarity in the traffic sound time series.
Within this Wing Cheong Estate, on the 37th floor of a 40floor high-rise building [45], a microphone was hanged about a meter outside the building facade, exactly as photographed in Fig. 3. This high-rise building was aside from the highway by 30 meters on the ground. The microphone was 106 meters above the elevated highway.
where p symbolizes the instantaneous sound pressure measured by the microphone, and p 0 signifies the reference sound pressure set to 20 µPa. The A-weights [48] are applied by the sound level meter on the 1 3 -octave bands of L's spectrum to give L

(T )
Aeq , which is the time-averaged value within a duration of T .
The use of L (T ) Aeq for roadway sound-level measurements has been suggested by [11]. Moreover, earlier empirical studies [16], [39], [49], [50] of Hong Kong roadway sound-level have also used L 1 Human ears do not have flat spectral response in the audible range (20Hz to 20KHz). [46] experimentally calculated the ''equal-loudness contours'' for the first time by applying weights to the frequency spectrum in audible range in order to quantify the subjective human response to different acoustic-noise events. There exist various types of weightings: A-, B-, C-, and D-weighting. Among these, the A-weighting is the commonest, as it has been commissioned by the international standard IEC 61672 for use in sound-level meters [47]. In contrast, B-weighting adds a larger offset to the low-frequency components (below 1 KHz) of loud sounds than the Aweighting, and has been shown to better suit music; however, B-weighting is obsolete and is not included in the newer IEC 61672:2003 [47]. Cweighting adds an even larger offset than B-weighting to the low-frequency components of loud sounds. D-weighting is designed for aircraft noise but is now obsolete. There was no rain, the wind speed was under 5 meters/second (according to the local weather report), and the microphone was covered with a wind shield.
The measurement time windows of the measured data are reported in Table 1. Both days were work days, not public holidays [52], [53]. The measurements were taken during the evening rush hours. 2

D. DATA CLEANSING
Loud non-vehicular events are manually identified and excised from the datasets. Such non-vehicular loud events include ambulance sirens, horns, the technician adjusting the microphone, hammering, and other construction noises. Not excised are the vehicular sounds of loud trucks, cars, or motorcycles.
Each loud event (vehicular or non-vehicular) is not only audible from the audio file with its L (1sec) Aeq value shown in a time-series chart in the aforementioned Brüel & Kjaer ''Evaluator 782'' software, but is temporally synchronized to a video of the roadway to identify the loud events' likely sources. Hence, non-vehicular loud events may be cleansed out manually, whereas vehicular loud events are not excised. Table 2 compares the datasets' sample statistics before and after the cleansing of their loud non-vehicular events. Evident therein, this data cleansing has only minor effects (≤0.1%) on the sound-levels at the 10th, 50th, and 90th percentiles.

III. EXPLORATORY DATA ANALYSIS
Both datasets' normalized histograms, after excising the non-vehicular loud events, are shown in Fig. 4. Each normalized histogram is visually unimodal but the right  tail is heavier than the left tail. The right tail is due to the occasional very loud vehicular sounds, producing outliers in the histograms. Moreover, dataset (a) has a heavier right tail and a sharper/taller peak than dataset (b).
The sample statistics of the two datasets after excising the non-vehicular loud events are reported in Table 3. This table's every sample metric is mathematically defined and explained below, by conceptualizing L (1sec) Aeq as a continuousvalued random scalar. Recall that the probability density of any random scalar X (X = L (1sec) Aeq here) is characterized by (a) a ''location parameter'' that fixes the probability density's ''location'' (or the shift) on the abscissa, (b) a ''scale parameter'' that governs the probability density's spread on the abscissa, and (c) ''shape parameters'' (a.k.a. ''form parameters'') that affect the probability density's shape other than by shifting, shrinking, or stretching the density. Each above metric can be estimated from a dataset  mean. If a unimodal density has a positive (negative) ''skewness'', the density's right (left) tail would be longer/fatter than the other tail. If a unimodal density has a zero ''skewness'', the density's mean equals its median, but the density could yet be asymmetric in shape. The Gaussian distribution, being symmetric about its mean, has a zero γ 1 . One ''sample skewness'' is the ''Fisher-Pearson coefficient of skewness'', defined as (page 6 of [55]) which simply equals the ''sampled third central moment'' 3 normalized by the 3 2 -power of the ''sampled second central moment'': The ''skewness risk'' arises if a symmetric density model (e.g., the Gaussian density) is mis-applied to skewed data. * 'Kurtosis'' is commonly defined as the density's ''fourth standardized moment'': ''Kurtosis'' serves as another ''shape parameter'' to characterize the density's ''tailedness''. A larger kurtosis implies more frequent outliers and more extreme outliers. ''Kurtosis'' does not characterize the density's shape near the mean. The Gaussian distribution's kurtosis equals 3. ''Excess kurtosis'' is defined as γ 2 − 3. A leptokurtic distribution, by definition, has γ 2 > 3; its tails are heavier/fatter (i.e., decay slower along the abscissa) than the Gaussian distribution's. The ''sample kurtosis'' may be computed from the sampled data as the ''sampled fourth central moment'' 4 powernormalized by the ''sampled second central moment'': The ''kurtosis risk'' (a.k.a. the ''fat-tail risk'') arises when the Gaussian density is mis-applied to data with a positive ''excess kurtosis''. * The ''peakedness'' provides still another ''shape parameter'', characterizing the density's shape at/near 3 The ''third central moment'' is defined as µ 3 := E(X − µ) 3 . The Gaussian distribution has µ 3 = 0. The ''sample third central moment'' may be computed asμ 3  (1) in each of [56], [57], [58]): where a can be any real value but is often set to equal the distribution's mode. The Gaussian density's ''peakedness'' can be expressed as For a larger ''peakedness', the density's peak would be taller and sharper than the Gaussian peak. The ''sample peakedness'' can be computed aŝ whereF X (x) := 1 N N n=1 I(x n ≤ x), and I(·) symbolizes the indicator function. * The ''fourth cumulant'', defined as K 4 := µ 4 − 3µ 2 2 , represents yet another ''shape parameter''. ''Cumulants'' can be mathematically more convenient than the ''central moments'', because the jth-order cumulant of the sum of two statistically independent random variables equals the sum of the two random variables' individual jth-order cumulants. 5 The sample ''fourth cumulant'' may be computed asK 4 :=μ 4 − 3μ 2 2 . Compare the sample metric values in Table 3 against the traffic conditions in Table 1: Table 3 reveals that dataset (b) has 16.2% more vehicular traffic than dataset (a), and 2.2% more of the vehicular traffic being heavy relative to that of dataset (a). Dataset (b)'s higher traffic flow rate and a larger percentage of heavy vehicles expectedly give dataset (b) a higher noise level overall (i.e., a larger mean). These, however, need not imply more skewness or a larger kurtosis, because the heavier/louder traffic could produce a noise level that is more steady over time. Moreover, the sample skewness and the sample kurtosis both have the sample variance in their denominators, but the sample variance is larger for dataset (b), resulting in a smaller sample skewness and a smaller sample excess kurtosis for dataset (b).
The above-discovered patterns raise doubts whether either dataset can be considered Gaussian. Comparing either dataset's sample metrics to a Gaussian density's, clear disagreements appear for the skewness, the excess kurtosis, the third central moment, and the fourth cumulant -all of these would be zero for a Gaussian dataset. Instead, both datasets have skewness >0 (meaning both empirical distributions are skewed to the right) and an ''excess kurtosis'' >0 (meaning both empirical distributions are heavier-tailed than the Gaussian distribution). These apparent discrepancies from the Gaussian model raise reasonable doubts whether either dataset can each be considered as statistically realized from a Gaussian probability space. However, these doubts do not constitute any rigorous inference. For example, any nonzero sample skewnessγ 1 needs to be interpreted in relation to the estimation variance ofγ 1 . Similar judgment is needed with regard to the nonzero excess kurtosis, etc. Furthermore, the above analysis is not inference at rigorously preset to any statistical ''confidence level''. Indeed, this is why the present exploratory data analysis is followed in the upcoming Section IV, which will use the Jarque-Bera test for confirmatory data analysis.

IV. CONFIRMATORY DATA ANALYSIS -THE JARQUE-BERA TEST
The Jarque-Bera test 6 can rigorously decide if a dataset follows the Gaussian distribution, by analytically examining the dataset's sample skewness and sample kurtosis, respectively but simultaneously, against the Gaussian distribution's skewness and kurtosis. The Jarque-Bera test statistic is defined as The Jarque-Bera test is statistically more ''powerful'' than the Kolmogorov-Smirnov test employed earlier in [37]. Here, ''power'' represents a formal statistical term referring to a binary hypothesis test's probability of correctly deciding a non-Gaussian dataset as non-Gaussian. This advantage of the Jarque-Bera test has been widely recognized in the statistics literature, e.g. [60] (pp. 182-183), [61] (p. 88), [62], [63] (p. 343), [64], and [65] (Table 6.10). The Jarque-Bera test is especially ''powerful'' if the dataset is only slightly skewed, i.e., with a skewness magnitude under unity or at least not exceeding 1.7 [65] (Table 6.10) -as for the two datasets being tested; please see Table 4.
The statistical theory underlying the Jarque-Bera test is explained below. To comprehend this JB metric intuitively: Recall thatγ 1 symbolizes the sample skewness whereaŝ γ 2 represents the sample kurtosis; the JB metric above simultaneously considers both of these sample metrics in deciding whether the dataset is Gaussian. Furthermore, 6 N equals the estimation variance ofγ 1 and 24 N is that ofγ 2 , so the JB metric interprets the sample skewness and the sample kurtosis in light of their respective uncertainty.
This test statistic (JB) is asymptotically chi-squared distributed with two degrees-of-freedom, under the assumption that the data realizes an underlying Gaussian random variable. Define χ 2 α (2) to denote the (100 × α) percentile of a χ-squared distribution with two degrees-of-freedom at a specified significance level of α. 7 If JB > χ 2 α (2), the dataset is statistically decided as non-Gaussian, with a significance level α. The subsequent statistical analysis will set α to 0.05 and 0.01, respectively giving 95% and 99% confidence levels. Here, χ 2 0.05 (2) = 5.99; χ 2 0.01 (2) = 9.21. 8 Even more revealing is the ''p-value'', which is defined as where F χ 2 (2) (.) denotes the cumulative distribution function of a chi-squared distribution with two degrees-of-freedom. 9 If p-value < α, the data is decided, at the statistical significance of α, as not a Gaussian realization. Setting α smaller (i.e., setting a larger confidence level) means a lower likelihood to reject the data as a realization of any Gaussian random variable. Table 4 reports the Jarque-Bera test results for both datasets, along with their sample skewness and their sample kurtosis. Therein, the p-values of less than 2.2 × 10 −16 reject the Gaussian realization of the datasets at a confidence level above 99.99%. The large positive JB's and the extremely small p-values imply, with very high levels of confidence, that the datasets are non-Gaussian realizations. Hence, Table 4 has confirmed the findings of [68] that its roadway L Aeq data is non-Gaussian, leptokurtic, and skewed.

V. CONCLUSION
This present work along with [37] constitute the entire open literature's only two statistically rigorous hypothesis testing of roadway traffic sound level data for statistical normality. This paper has presented new datasets of the roadway soundlevel, measured in two different weekday evenings from exactly the same position up in the same high-rise building using the same equipment. The statistically rigorous analysis here has conclusively confirmed that they are non-Gaussian. The next investigative step is to identify appropriate non-Gaussian probability densities to model such roadway sound levels measured up at a high-rise building.