Phase Identification in Power Distribution Systems via Feature Engineering

Phase identification is the problem of determining the phase connection of loads in a power distribution system. In modern times, utility operators will generally accomplish this using smart meter data that requires some form of feature engineering to achieve practical phase identification using data-driven methods. Feature engineering is essential for voltage magnitude data containing noise, seasonality, and trend. We present crucial components of a feature engineering pipeline to perform linear denoising with Singular Value Decomposition, filtering of the denoised data to remove the seasonality and trend, and fuse multiple meter channels. We use the results of the feature engineering to perform phase label correction, a subproblem of phase identification. To evaluate techniques, the authors generate a synthetic dataset from the meshed IEEE 342-Node test feeder circuit with the 2021 Electric Reliability Council of Texas load profiles. Our results show that denoising is quite effective for improving phase identification accuracy in the presence of measurement noise. We present new insight into filtering voltage measurement data to improve accuracy and eliminate the need to determine salient frequencies. We also present the application of a data channel fusion technique that is novel to the phase identification literature. This technique enhances phase identification in cases where both wye and delta-connected loads are present.


I. INTRODUCTION
Voltage magnitude measurements are the most commonly used for phase identification because loads connected to the same phase are known to have highly correlated voltage magnitude measurements [1].However, raw voltage magnitude measurements can contain some combination of trend or seasonality that prevents data-driven methods from being effectively utilized for phase identification.Raw meter measurements are also inherently noisy, which can degrade or even prevent effective phase identification from being possible.
The associate editor coordinating the review of this manuscript and approving it for publication was Arash Asrari .
In 2018, the Energy Information Administration (EIA) [2], [3] categorized 86.8 million of the 154.1 million installed meters in the United States as part of the Advanced Metering Infrastructure (AMI).As AMI meter coverage increases, more data streams will become available for use in data-driven methods [4].Smart meter manufacturers assign different classes [5] to each meter that specify their measurement error percentage as a deviation from some expected measurement.Unfortunately, a meter's performance will degrade with age, resulting in more significant measurement errors.Maintenance personnel may be required to replace or recalibrate meters every few years [6].This measurement error can drastically affect the performance of data-driven methods that rely on measurement data [7].Data transformations (i.e., feature engineering) are vital in making time-series data usable by data-driven algorithms.We consider three types of transformations to achieve this: denoising, filtering, and channel fusion (for multichannel measurements).
There have been many data-driven approaches to phase identification with varying degrees of feature engineering.These approaches can involve clustering techniques ranging from k-means clustering [8] to hierarchical clustering ensembles [9].In many cases, researchers have also incorporated dimensionality reduction as a preprocessing step before clustering.These techniques can range from Principal Component Analysis (PCA) [10] to Non-Negative Matrix Factorization (NMF) [11].These works have shown that loads connected to the same phase tend to form groups in lower dimensional spaces.The literature has shown that this idea of data grouping via unsupervised methods extends to the supervised learning domain.Recently, Padullaparti et al. [12] used a random forest classifier to identify phases.These applications generally rely on using noisy data within the time domain.However, Chiu et al. [13] recently introduced using Fourier compression for phase identification in the frequency domain.As both the noise and desired information are present in the high-frequency content of the data [14], it is worth noting that a frequency-domain approach may still be affected by high-frequency noise.Though much of the literature focuses on radial networks with single-phase wye-connected loads, Wang et al. [15], [16] have tackled the problem of a dataset containing both wye and delta-connected loads.They showed that a weakened separation of phases is present when evaluating wye and delta-connected loads, especially when using clustering methods.
Surprisingly, few works have directly addressed the denoising of voltage magnitude measurements as a preprocessing step.Jayadev et al. [17] used Principal Component Analysis (PCA) with its graph theoretic interpretation to perform phase identification.They utilized the Gaussian noise model and showed that PCA is effective in dealing with this noise model.Modarresi et al. [7] tackled the problem of Gaussian noise in measurement data for identifying the phase connections of loads using a feed-forward neural network.Their approach involved supervised training of the neural network to denoise the measurements from a small subset of the loads in the network.Overington et al. [10] have also successfully used PCA as a dimensionality reduction preprocessing step for performing phase identification with noisy data.The limitations of the current approaches are that they can require training multiple models to account for drift in the data or are primarily effective for single-phase radial networks.
One essential stage of preprocessing is to filter out the low-frequency trend and seasonality from time series data to make the data stationary.Previous researchers have used several different linear filtering techniques to accomplish this.Xu et al. [18] have used an ideal filter to remove the trend.They worked with each time series in the frequency domain via the Discrete Fourier Transform (DFT) and zeroed out the desired low-frequency components.Hosseini et al. [14] used a 50th-order windowed Finite Impulse Response (FIR) filter to high-pass filter the data.Zhou et al. [19] have used the first-order time difference transformation, which is the discrete implementation of a first-order difference filter.The time-difference transformation is a fast and effective FIR filter when the underlying trend is linear.Tipton [20] used the Butterworth filter on time series in the frequency domain via the DFT.The current methods are quite effective and commonly used in applications other than power systems.Unfortunately, the current methods may not provide sufficient filtering or require manual cutoff frequencies that can change over time.However, not all datasets will require filtering, but data science experts in the industry emphasize that exploratory data analysis (EDA) is necessary to determine whether filtering is needed [21].
Our contribution to the phase identification literature is as follows: • We present the novel application of a data channel fusion approach that improves the effectiveness of data-driven phase identification when working with both wye and delta-connected loads.We achieve this with a simple cross-product technique to make the loads more easily separable in a higher dimensional space.This is the first instance of this technique in the phase identification literature.
• The authors present new insight into voltage magnitude data for phase identification.This new insight enables a previously unused filtering approach in the phase identification literature.This filtering approach utilizes only the phase angle terms of a time series Discrete Fourier Transform (DFT).
• The application of Singular Value Decomposition (SVD) to the problem of phase identification because it is an effective method for denoising voltage measurements.
Appropriate feature engineering can allow faster and simpler data-driven models to achieve comparable or better performance than slower complex models [22].We want to emphasize that researchers can use the feature engineering techniques presented in this paper to enhance current and future data-driven techniques within the phase identification literature.
We begin the rest of the paper by discussing SVD for denoising (Section III).Stationarity and approaches to achieving stationarity are then discussed (Section IV).Some approaches to channel fusion are discussed (Section V).Finally, we validate and discuss the results of the methods presented in this paper (Section VI).

II. SYNTHETIC DATASET
We use the Electric Power Research Institute's (EPRI) OpenDSS implementation of the IEEE 342-Node test network [23], the Low-Voltage North American (LVNA) network.This network is composed of 624 wye or delta-connected loads.We use the Electric Reliability Council of Texas (ERCOT) 2021 15-minute historical backcasted load profiles [24] to emulate customer power usage throughout the year in the south-central Texas weather zone.The ERCOT 2021 load profiles have an hour of missing data.We imputed these missing values using an ensemble of forecasting and backcasting Auto Regressive Integrated Moving Average (ARIMA) [25] models for all load profiles.This lvna dataset is part of the arima-ercot-2021 data suite and is publicly available on Kaggle [26].The source code for recreating the data suite from scratch is available as open source on GitHub [27], [28].
The following Gaussian noise model is used [29] for simulating measurement noise: x n [t] is the value of meter n at time t.The 0.002 coefficient is used to simulate a class-0.2meter [5].

III. DENOISING A. SINGULAR VALUE DECOMPOSITION (SVD)
Since we are looking to exploit the linear correlations between the noisy voltage measurements of each load, it naturally makes sense to use the Singular Value Decomposition (SVD) [30] to denoise the data via its principal components.We use the Singular Value Decomposition (SVD) [30] to project the voltage magnitude measurement data onto its principal components.Since voltage measurements are linearly correlated, finding the axis that maximizes the variance in the data makes sense.For some M × N matrix X, its SVD is: The N × N matrix V contains the right singular vectors, representing the axis of some vector space, where the axis maximizes some characteristic of the original data.
We choose this maximized characteristic to be the variance in the data, which most modern implementations of SVD provide through the data's principal components.The M × M matrix U contains the left singular vectors, which are the projection vectors of the data onto its principal component axis, normalized by their respective singular value.The N × N diagonal matrix contains the singular values that unnormalize the left singular vectors.The first singular value σ 1 corresponds to the axis that exhibits the most of the target characteristic (in our case, the first axis exhibits the most variance in the data).
The data can be denoised by reconstructing it using a number of singular values (and their corresponding left/right singular vectors) that is less than the original data's column rank.The Singular Value Hard Threshold (SVHT) τ [31] provides a simple and effective way to determine the number of singular values to keep in a data-driven manner: σ median is the median of the singular values.ψ(β) is the optimal hard threshold coefficient and is quickly approximated as: The choice of the coefficients and order of the polynomial comes from the recommendation of [31] for the general case.
β is calculated as follows: Once τ is calculated, we only need to keep the k singular values that are ≥ τ for reconstructing the low-rank equivalent of the original data: IV. STATIONARITY Before using any data-driven techniques (i.e., estimating summary statistics, training a machine learning model, etc.), we must ensure that the time series data is at least weakly stationary [32].Stationarity is a requirement because the estimation of summary statistics [25] and training of machine learning models require the parameters of the data's distribution to be constant over time [33](i.e., no concept drift/shift).Weak stationarity requires that the mean and variance of the voltage magnitude measurements should be constant over time [25].Time-dependent variance is a phenomenon known as volatility clustering (i.e., conditional heteroskedasticity).To the author's knowledge, volatility clustering is not an issue in the current phase identification literature.However, a time-dependent mean is commonly present in time series data from power systems (e.g., trend and seasonality).Fig. 1a shows a 15-day window of the nonstationary voltage profile for a load in the lvna dataset.
It is clear from the figure that the mean of the data experiences several shifts throughout time.Fig. 1b shows the stationary version of the same voltage profile after some appropriate feature engineering (to be discussed in the next section).The stationary profile has a constant mean and looks like white noise.The most common summary statistic that is estimated and used in phase identification is the covariance [34]: We can see from (7) that the covariance requires the expected value of both random variables to be independent of time.
Since voltage magnitude measurements of loads connected to the same phase have been found to be highly correlated [1], we need to ensure that the measurements are stationary.A standard tool for analyzing a time series in the time domain is the Autocorrelation Function (ACF).For simplicity, we utilize the ACF for visualizing if a time series  is stationary.Fig. 2a shows the ACF for the non-stationary voltage profile from Fig. 1a, while Fig. 2e shows the ACF for the stationary voltage profile from Fig. 1b.Other than the correlation at lag 0 (i.e., the correlation of the time series with itself), the correlation at the remaining time lags should be within the dashed error bands (indicating statistically insignificant serial correlation).This condition indicates that the time series is stationary and looks like white noise.Stationarity is generally achieved by modeling the serial correlation in the time series or, more commonly, by filtering.

A. FILTERING
The voltage magnitude measurements in the lvna dataset are nonstationary.However, this is only the case for some datasets.Nonstationarity generally manifests as an underlying combination of trend and seasonality in smart meter data.In the phase identification literature, a high-pass filter is a standard tool for removing the trend and seasonality from a time series because these compose the low-frequency content of a time series [14].We compare a few filters most commonly used in the literature and contribute a new alternative approach.

1) DIFFERENCE FILTER
The discrete equivalent of the nth-order difference filter is the most common high-pass filter used [32] (i.e., nth-order timedifference transformation).If we look at the Fourier transform of the nth-derivative (8) [35], we see that the transfer function (9) of the difference filter is the (jω) n component.
In this work, we consider using the difference filter up to the second order, as in most cases, a second-order difference filter is usually the highest order needed [32].Fig. 3a and Fig. 3b show the periodogram for H D,1 and H D,2 , respectively.The periodograms show that a higher order corresponds to more attenuation of lower frequencies.
The difference filter is implemented as the nth-order timedifference transformation [25], as shown for the first and second orders, respectively: ) The nth-order time-difference transformation implements the respective nth-order discrete derivative [36].In the literature, the first-order time difference is the most common filter and has taken many different forms, such as the trend vector [37], voltage difference [34], [38], etc. Fig. 2b and Fig. 2c show the ACF of the voltage profile from Fig. 1a after applying a first and second-order difference filter, respectively.We can see that the filters have removed most of the serial correlation, but some remain, as seen at the first and second time lags.Though the difference filter is usually good enough, there is still room for improvement.

2) IDEAL FILTER
The second filter we consider is the ideal filter [18]: Fig. 3c shows the periodogram of H I (ω) for ω c = 0.05 cycles/sample.The periodogram shows that the ideal filter can perfectly separate the content between ω c while maintaining a flat passband.A downside of the ideal filter is that it introduces ringing into the filtered data [36].This ringing occurs because the inverse Fourier transform of the ideal filter is the sinc function, which is convolved with the original data.We can see this materialize in the ACF of the filtered voltage profile in Fig. 2b.The periodic movement of the serial correlation indicates that a periodic process still exists within the filtered profile.
To determine a reasonable cutoff frequency ω c , we take the spectrogram of all loads using a 15-day window across the entire year of data and average them to obtain the average frequency content.Fig. 4 shows this average spectrogram.The majority of low-frequency content is contained within |ω| = 0.05 cycles/sample.We use ω c = 0.05 cycles/sample for the ideal filter.An important observation within the spectrogram is that the low-frequency content shifts over time and would ideally require an adaptive cutoff frequency.

3) UNIT-MAGNITUDE FILTER
The final filter we consider is the unit-magnitude filter H U (ω).To the author's knowledge, this is the first occurrence of using such a filter in the phase identification literature.The use of the unit magnitude filter also reveals some new insight into the frequency domain relationship between loads connected to the same phase.
The DFT of a time series will result in a series of complex numbers, where the magnitude represents the strength of the presence of a particular frequency, and the phase angle represents the offset of that specific frequency within the original time series.The unit-magnitude filter sets the magnitude of each DFT term to unity.Then, the inverse FFT is used to reconstruct the time-domain signal with only information contained within the DFT terms' phase angles.An interpretation of the phase angles is that they represent the underlying scaffolding or structure of the data (e.g., the frame of a house).In contrast, the magnitude represents the additional material placed around the structure (e.g., the house's walls, windows, etc.).Fig. 2e shows the resulting ACF.We can see no statistically significant serial correlation in the filtered profile.Since each frequency's contribution is equal, the unit magnitude filter makes the voltage profile to look like white noise, which is stationary by definition [25].There is also no need to select a cutoff frequency manually.
Fig. 5 shows the output of t-Distributed Stochastic Neighbor Embedding (t-SNE) when applied to the magnitude and angle terms of the DFT of a 15-day data window.We can see that the phase angle terms encode a fair amount of the load phase structure of the distribution system.The magnitudes also appear to contain some meaningful information on the load phase structure as well.This observation may be worth further exploring for future researchers to determine if some optimal mixture of the magnitude and angle terms exists.
We implement the ideal and unit-magnitude filters in the frequency domain, which requires using the Fast Fourier Transform (FFT) algorithm to perform the Discrete Fourier Transform (DFT).We implement the difference filters in the time domain via the time-difference transformations.The difference filters have a time advantage because they can be implemented in linear time, whereas the frequency domain filters require the FFT, which requires logarithmic time [39].Ultimately, the choice of filter and their accompanying trade-offs will depend on the problem at hand.

V. CHANNEL FUSION
The final step is to fuse the measurement channels of a multichannel meter into a single equivalent measurement channel.This fusion step makes it easier to work with polyphase voltage measurements for cases where there may be line-to-line connected loads, 3-phase loads, etc.The dataset used in this work contains loads with singlechannel (wye-connected line-to-neutral) and two-channel (delta-connected line-to-line) meters.
We can represent a window of width W , of voltage magnitude measurement data, for meter n as a W ×C channel matrix V n : W is the number of timesteps in the window, and C ∈ {1, 2, 3} is the number of meter channels.If we assume that the column vectors of V n are independent (i.e., each meter channel measures a different phase [40]), then the dimension of the column space C(V n ) is equal to C. The row vectors of V n will span C(V n ) (i.e., each row vector will lie within the hyperplane represented by C(V n )).
We also introduce the notion of an expected channel.Definition 1: The expected channel E[⃗ v] is the time-wise expectation of all meter channels and is estimated from a W × C T window of data V as: W is the number of time steps in the window, and C T is the total number of meter channels in the entire dataset.⃗ 1 is a C T × 1 column vector of ones.The expected channel can be used as additional information when performing channel fusion.
Next, we present two simple approaches to fusion.Fig. 7 shows a correlation heatmap between loads and a scatter plot of the output of t-SNE of the results of the different fusion approaches.The figures utilize a 15-day data window at the start of the year.The loads are sorted and grouped by phase in the correlation heatmaps to more easily observe a distinct structure.

A. MEAN
We use the time-wise mean of multiple meter channels as a baseline channel fusion method (Pappu et al., [41] use the sum between channels rather than the mean): ⃗ 1 is a C × 1 column vector of ones.Fig. 7a and Fig. 7b show the correlation and t-SNE output with the unitmagnitude filter, respectively.As seen before, the loads form a distinct structure by phase.However, when we evaluate all loads together, the heatmap shows a significant mutual correlation between phases.This mutual correlation empirically shows a weakened voltage correlation structure in a meshed network [42].

B. CROSS-PRODUCT: TIME
With the mean fusion approach, we can see that distinguishing the two-channel delta loads within the 2-dimensional C(V n ) is possible but difficult without more significant data.
In this section, we present a novel approach in the field of phase identification to fusing a smart meter's multiple voltage magnitude channels.Fortunately, we can enhance the separability of two-channel delta loads by mapping the measurement row vectors that span C(V n ) to vectors that don't span C(V n ).This approach can also eliminate the explicit filtering step, as the resulting data will not contain the original low-frequency content.We achieve this with the cross-product between two time-adjacent row vectors: Fig. 6 shows the cross-product between two time-adjacent row vectors within C(V n ).For the single-channel meters, the voltage vector can be column stacked with the expected channel E[⃗ v] to form a W ×2 matrix whose row vectors can be used with (16).Because taking time-adjacent row vectors will result in D = W −1 normal vectors, N n will be a D×3 matrix.Since two vectors define a plane, the cross-product between the two vectors will be a noncollinear vector with the two original vectors and is orthogonal to the plane defined by C(V n ).If we assume that C(V n ) in Fig. 6 is the x-y plane, then the z-component of all the row vectors would be 0, and the x and y-components of the normal vectors would all be zero.The final fused data vector for the nth load comprises the z-component of the normal vectors: Fig. 7c and Fig. 7d show the correlation and t-SNE output with no explicit filtering, respectively.The correlation matrix shows delta loads have a more distinct correlation structure using the new data representation.We can also see that the wye load's correlation structure is more substantial.Unfortunately, the mutual correlation between the loads has become more substantial, which we can easily visualize in the t-SNE output.When observed alone, the wye and delta loads are nicely grouped and separated, but when observed together, they overlap in the scatter plot.We can easily solve this issue with a simple time reversal of the wye or delta voltage profiles.Fig. 7e and Fig. 7f show the result of the time reversal.The time reversal step eliminates the mutual correlation between the wye and delta loads while maintaining their distinct correlation structures.The t-SNE plot shows that the loads are nicely separated into tight groupings.This time reversal step works because two perfectly correlated time series will look exactly the same.They will look highly similar (mutual correlation) if highly correlated.Reversing the order of one of the time series will result in two different time series that look highly dissimilar.However, if a particular task doesn't require using the wye and delta loads together, then the time-reversal step is not required.If the implicit high-pass filtering is not desired, then an alternative approach would be to apply a component-wise function (such as the Gaussian Radial Basis Function (RBF)) to each row vector.The component-wise function should result in a new row vector that is not collinear with the original vectors.Though we don't address this alternative approach in this work, we include an implementation on GitHub for the interested reader.
For the two-channel meters, the cross-product approach requires that the column vectors of V n are independent.Independence ensures that the dimension of C(V n ) is two, as shown in Fig. 6.If the column vectors were not independent, then the dimension of C(V n ) would be one, where all row vectors of V n would lie on the same line and be collinear.The cross-product of two collinear vectors will be the zero vector.118620 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. VALIDATION SETUP
We use the lvna dataset, with injected noise, to evaluate the methods discussed in this paper.The year of data is split into C w = 365−w + 1 windows of width w (in days) using a 1day stride.The 1-day stride is due to meter data not generally being available in real-time, as it is stored locally in the meter and only sent to the utility company at the day's end [3].
For each data window, we perform the task of phase label correction [1] and evaluate the effect of methods on the accuracy.We set a random 40% of the phase labels to be incorrect to simulate label errors.Since there are C w windows of width w, we use statistical summaries of the accuracies to compare the results.This approach also allows for C w Monte Carlo iterations for validation of each window of width w.
B. DISCUSSION: DENOISING AND FILTERING Fig. 8 shows the mean and standard deviation of the phase label correction accuracy using the different filters as well before/after denoising with SVD.Each data window was denoised with SVD (and left noisy for comparison), filtered, reduced to 2 dimensions via Principal Component Analysis (PCA), and clustered with k-means clustering to perform phase label correction.
It is clear from the plots that both filtering and denoising are essential preprocessing steps for this particular dataset.When working with noisy data, we see a trend of increasing accuracy with increased window width for the difference and ideal filters.The figure shows that the unit magnitude filter doesn't perform much better than no filtering.At a 20-day window, the second-order difference filter surpasses the first-order difference and ideal filters.From Fig. 3, the first-order difference and ideal filters apply most of their attenuation at much lower frequencies.As we increase the amount of data in a window, we also increase the amount of a time series seen in the window.An increase in window width can result in additional low-frequency content that might not be within an effective attenuation window by the first-order difference and ideal filters.The periodogram for the second-order difference filters shows significant attenuation for a more extensive range of low-frequency content and thus likely explains the phenomena at the 20-day window.However, as we'll show in later results, some sequences of approaches are more resilient to this phenomenon.
It is important to emphasize that we used PCA for dimensionality reduction.PCA alone has been shown in prior works to effectively deal with the Gaussian noise model [10], [17] when working with single phases.The denoised results in Fig. 8 show that denoising each data window with SVD improves accuracy over using PCA for dimensionality reduction alone for the lvna dataset.When denoising with SVD, we consider the linear interactions between the non-stationary time series of each load.This means that the principal components are learned from the entirety of the data window.However, we lose information when using PCA due to using the covariance matrix instead of the time series data itself.In this case, the principal components are being learned using a summary statistic.
The unit magnitude filter gives the best results when working with the denoised data for smaller window widths.However, the filter choice matters less as the window width increases for the denoised data.The difference and ideal filters have comparable performance in terms of mean accuracy across the different window widths.
We utilize the normalized Area Under the Curve (AUC) of each curve in Fig. 8 to quantify the quality of each filtering approach.Table 1 shows the AUC for the difference filters, and they agree with our previous observations.In the table, our novel contribution is placed in bold, and comparison filters have their appropriate references.The nofiltering AUC is the baseline result (null model), whereas we include the second-order difference filter as an additional comparison (due to its higher frequency cutoff).Though the unit magnitude filter idea is not new, this is its first use in the phase identification literature.
C. DISCUSSION: CHANNEL FUSION Fig. 9 shows the results when using the different channel fusion methods.Table 2 shows the AUC for Fig. 9a.The mean fusion approach uses the unit magnitude filter as it gives the best performance over the other filters with the denoised data.In the mean fusion approach, two principal components are sufficient for representing the data.We evaluate the cross-product fusion approach with   and without explicit filtering.Three principal components are needed in the cross-product fusion approach to have a sufficient representation.
Using the cross-product approach without filtering shows improved performance up to a 19-day window before trending downwards.This is a similar phenomenon that was observed with the filters using noisy data.Using the second-order difference filter, we can reverse this trend by adding explicit filtering to account for a more extensive range of low-frequency content in the data.The main advantage of using cross-product fusion is better performance at smaller window widths.This benefit diminishes at a 21-day window where the cross-product only provides a 1% improvement in average accuracy.Fig. 10 shows the results comparing a few of the dimensionality reduction techniques used in the literature: PCA [10], Laplacian Eigenmaps (LE) (i.e., spectral embedding) [43], and t-SNE [15].Table 3 shows the AUC for Fig. 10a.The learned representations of t-SNE are generally considered unreliable to be used as part of a processing step outside of visualization.However, we include it as a dimensionality reduction step to show that it can still be effective for phase label correction.
PCA and LE are comparable in performance with mean channel fusion, whereas t-SNE performs the worst.We used the unit-magnitude filter for the PCA and LE mean fusion results but the second-order difference filter for the t-SNE mean fusion results.Though not shown in the figures, t-SNE does not generalize well over data windows when using the unit magnitude filter (some of the learned representations will break our assumption of only 6 clusters for each phase).
With cross-product channel fusion, t-SNE gives the best results, with LE in second and PCA with the worst.The second-order difference filter is used for the PCA and LE cross-product fusion results (to account for the larger window phenomena), but no filtering is used for the t-SNE crossproduct results.We can also see that at a 7-day window and beyond, in Fig. 10b, the t-SNE results become the least volatile.These results show we can better leverage non-linear dimensionality reduction with cross-product channel fusion.As with the previous results, the choice of dimensionality 118622 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.reduction technique becomes almost meaningless with larger window widths.

VII. CONCLUSION
In this work, we presented a few feature engineering techniques that improve the representation of voltage magnitude measurements in datasets that contain both wye and deltaconnected loads.We showed that Singular Value Decomposition (SVD) is effective and necessary for denoising voltage magnitude measurements in the dataset.Our results also showed that a unit magnitude filter can result in a better representation of the data compared to other commonly used filtering techniques, such as the difference filter.However, the filter choice mattered less for more significant amounts of data.When working with single-phase delta-connected loads (i.e., line-to-line), the meters will contain two channels of voltage data for each phase.A couple of channel fusion approaches were presented and evaluated.We presented an approach that utilizes the cross-product between two timeadjacent row vectors in the channel data, allowing for almost perfect separation of the phase for wye and delta-connected loads.These techniques were compared and applied to the problem of phase label correction, and our results showed an improvement in accuracy with smaller amounts of data.The Python code and results used in this paper are available as open-source on GitHub [44].The dataset is available for download on Kaggle [26], and the source code used to generate the dataset is available on GitHub [27], [28].