A Multistage Hybrid Deep Learning Model for Enhanced Solar Tracking

Solar tracking helps maximize the efficiency of solar applications, such as photovoltaic (PV) solar panels. In the recent past, machine learning (ML) techniques have been extensively used to implement automatic solar tracking. However, applying predictive models in solar trackers is a non-trivial task due to the randomness and non-linearity of meteorological data, limiting their ability to clearly represent the underlying data patterns. Most existing predictive models take a monolithic approach to addressing limitations related to meteorological data, thereby limiting their performance. Therefore, this paper proposes a deep hybrid learning (DHL) model to enhance solar tracking performance. Furthermore, the proposed model improves feature representation in the data by using combined normalization methods and conversion of numerical data to images. In a nutshell, the model integrates sine and cosine transformations (SCT) to reveal cyclical patterns in the data, sigmoid and minimum-maximum data transformations to scale the data to a Gaussian distribution, and Gramian Angular Fields (GAF) to convert tabular data into 2D image representation to take advantage of the feature extraction capability of a convolutional neural network (CNN). The model also utilizes long-term short-term memory (LSTM) and gated recurrent units (GRU) for both spatial and temporal feature extraction. The results show that the aggregation of the above-mentioned methods significantly enhances solar tracking. The proposed hybrid model outperforms existing methods on a publicly available dataset, achieving outstanding performance with MAE, MAPE, and RMSE scores of 0.0073, 1.4635, and 0.0097, respectively.


I. INTRODUCTION
In recent years, the world's energy requirements have surged, leading to increased costs in power generation [1].To combat the negative economic and environmental impact of conventional energy sources, the focus has shifted to renewable energy.Solar energy has gained popularity due to its abundance and environmental friendliness.It is estimated that the solar energy incident on Earth's surface annually is nearly 10,000 times the world's current energy consumption [2].At present, PV solar panels, dating back to the mid-20th century, remain the most effective technology for harnessing the sun's energy [3].Research has shown that the amount of sunlight captured by PV solar panels is The associate editor coordinating the review of this manuscript and approving it for publication was Roberto C. Ambrosio .maximized when the sun's rays are incident to the surface of the panels [4].One way in which this can be achieved is by using a solar tracker which is able to track the position of the sun and maintain an optimum angle between the sun's rays and the surface of the PV panels [5].However, factors such as unpredictable weather patterns and the non-linear movement of the sun across the sky have a significant impact on the performance of a solar tracker in tracing the position of the sun [1], [6].Therefore, based on this background, reducing solar tracking errors has become a prominent topic in renewable energy [7].
Astronomical-based models use mathematical equations based on astronomical data.These models offer a significant level of simplicity, reliability, and fast computation [14].However, these models are limited in their ability to capture the underlying relationships between metrological features, especially during days of cloud cover, thereby affecting their prediction accuracy [8], [15], [16].In contrast, conventional ML models can capture non-linear relationships, offer fast computation, and high accuracy at a low-cost [17].However, conventional ML-based models are limited in their ability to extract deeper features from data and require extensive manual data preprocessing for optimal performance [18], [19].Fortunately, DL models have emerged as a superior approach, overcoming the limitations of shallow learning models [20].DL models use a distributed and hierarchical feature representation method, which allows them to automatically extract deep hidden features and relationships in the data.This excellent feature extraction capability has led to the adoption of DL models in various fields of solar energy forecasting [21], wind speed prediction [22], and electrical load prediction [23], making them a promising choice for solar tracking systems.
DL-based models such as convolutional neural networks (CNN) and recurrent neural networks (RNN) such as long short-term memory (LSTM) and gated recurrent units (GRU) have been widely used to enhance solar tracking systems [24].However, each of these models has its own individual constraints.For instance, CNN models perform exceptionally well when learning features of image-based data.However, metrological data is often represented in one dimension (1D).This means that when leveraging the CNN's feature extraction capabilities, the kernels can only move in a single direction, this constrains the CNN's ability to learn deep spatial relationships in the data [20].Similarly, LSTM and GRU models thrive in capturing time-bound temporal dependencies but may not be sufficient for capturing spatial features in the data [25].Various studies have noted that the integration of the CNN, LSTM, and GRU models has the potential to improve their performance compared with the corresponding individual models [26], [27], [28].However, most of these studies rely on 1D CNN models for spatial feature extraction of metrological data.
Moreover, other studies have also shown that the choice of data normalization methods has a significant effect on the performance of the DL models [24], [29], [30].Data normalization mitigates the influence of dominant features on the learning algorithm and reshapes the data distribution to facilitate the extraction of underlying relationships [29], [31].Presently, existing DL-based methods aim to enhance predictive performance by employing single method-based data normalization approaches [32], [33].One commonly used data normalization method is minimum-maximum normalization (MMN), which scales dataset values to a range from 0 to 1 while preserving the original relationships [29].However, MMN is sensitive to extreme values and lacks the adaptability to varying minimum and maximum feature values over time [34], [35].Alternatively, the sigmoid normalization (SN) method, which utilizes a sigmoid function, normalizes features independently of the data distribution, making it suitable for unknown data distributions during model training [31].Nevertheless, recent studies have shown that the aggregate of normalization methods has a positive impact on the performance of DL models [36], [37].However, studies combining aggregate data normalization and spatial-temporal feature extraction to enhance solar tracking are limited.
In this study, we propose a model that combines aggregate data normalization, tabular to 2D image conversion, and spatial-temporal feature extraction to enhance solar tracking.The model uses an integration of the SN and MMN methods.Furthermore, the model leverages the spatial feature extraction capabilities of CNNs by converting the tabular representation of metrological data into a twodimensional (2D) data representation.This allows the CNN kernels to move in two different directions, providing the capability to extract fine-grained feature patterns [20].Additionally, the LSTM and GRU models are combined to improve the extraction of long short-term temporal dependencies in the data for the final prediction of the sun's position.
To achieve this, the proposed model integrates four different parts to perform accurate solar tracking.In the initial part, the sine and cosine transformation (SCT) is used to capture the oscillatory pattern in the data.Thereafter, the combination of the SN and MMN methods is performed to adjust the underlying distribution of the data toward the normal distribution.Then, the Gramian Angular Fields (GAF) method is used to convert the output from the aggregation of normalization methods into 2D images.In the subsequent step, a CNN module learns the underlying features of the images.The extracted features are then fed into the stacked LSTM and GRU layers which capture the time-bound spatial-temporal features of the data using the gating units and memory cells.The final output provides a prediction of the sun's trajectory.The proposed model is trained using a solar positioning dataset and evaluated against other standalone base models.Furthermore, the model's performance is also evaluated on both 1D and 2D variations of the dataset.Additionally, the model is compared with other published works.The experiments reveal that a 2D conversion of metrological data along with the combination of aggregate data normalization and spatial-temporal feature extraction provides the best results.This shows that a 2D representation of the metrological data reveals more hidden patterns.
Thus, our contributions in this paper are presented as follows: • A novel aggregate data normalization approach comprising two base normalization methods, namely, SN and MMN, in which original input data is first rescaled using the former base normalization method and its 129450 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
output transformed using the later base normalization method.
• A novel model that combines the aforementioned aggregate normalization, SCT, GAF, CNN, LSTM, and GRU modules to enhance solar tracking.The remainder of this paper is structured as follows.The second section provides a summary of pertinent literature.The proposed approach is explained in the third section.The findings from the study are covered in Section Four.Section Five offers a discussion of the study's findings.The conclusion and recommendations for future research are presented in Section Six.

II. LITERATURE REVIEW
Recent advancements in artificial intelligence have seen a rise in the application of DL methods to optimize the performance of solar tracking systems.Data normalization techniques have been extensively applied in improving the predictive performance of DL methods using renewable energy-based datasets [29].For instance, Al-Muswe et al. [38] used the z-score (ZS) normalization method to scale their data.The authors reported a noticeable improvement in their model's predictive performance.However, while ZS normalization can adequately handle outliers, its inability to transform feature values into a common numerical range may limit the performance of a predictive model.Similarly, Oviedo et al. [39] used the MMN method to transform their data to a range between 0 and 1 before feeding it into a DL model.The DL model was able to accurately learn the underlying patterns of the data partly due to the effect of MMN.However, the normalization method sometimes fails to efficiently handle outliers.Additionally, Tasdemir et al [40] used the SN method which is suitable for handling the effects of outliers.Nevertheless, the SN method's approach to handling outliers may lead to information loss and poor performance of the learning algorithm.Furthermore, literature has shown that no single normalization technique is superior to the others across all datasets as datasets have distinct underlying characteristics, which restricts the generalizability of normalization methods [36].
Given the time-series-based nature of solar tracking and its tabular data representation, most researchers use RNN and LSTM models to track the sun's position as they are well suited for sequential data.For example, Haris et al. [41] used tabular data to train an RNN model for optimized solar tracking.The study's results showed that the RNN model could accurately predict the sun's position.However, while the model was able to capture the temporal dependencies in the data, the authors did not account for the cyclical behavior in the data, which may have limited the model's performance.Likewise, Kaul et al. [42] used an LSTM model to develop an adaptive solar tracking system.Although the LSTM was able to improve the performance of the model due to its ability to capture long-term dependencies in the data, the authors did not address the issue of spatial features.In contrast, other researchers have sought to leverage the power of CNNs using image-based datasets.For example, Pierce et al. [43] proposed a multi-input CNN that uses sky image data to predict the trajectory of the sun.Though the model was able to effectively track the sun, the authors did not take into consideration the effect of seasonal patterns on the predictive performance of the model.
Due to the drawbacks of single-based DL models, there has been a growing interest in deep hybrid learning (DHL) models among researchers.DHL models can automatically identify and combine various latent features to enhance predictive performance [44].For instance, Syahram & Effendy [45] used an RNN-LSTM hybrid model which accounted for both the short-term and long-term dependencies in the data to forecast the sun's trajectory.However, the authors did not address the impact of spatial features which may have limited the performance of their model.In another study, Frizzo Stefenon et al. [46] integrated the wavelet energy coefficient (WEC) and LSTM.The WEC was used for signal pattern extraction, while the LSTM performed timeseries-based forecasting.While this approach significantly improved the predictive performance of the model, the authors did not address the importance of spatial patterns in the data.
Furthermore, Lee et al. [47] used a CNN-LSTM model to forecast 24-hour ahead solar power.The model showed remarkable performance as the CNN was able to extract short-term local features while the LSTM captured long-term features.However, unlike image data, the tabular data used in the study may have limited the CNN module's ability to extract deeper features.Likewise, J. Wang et al. [17] proposed a DHL model for thermal power forecasting.In their study, the CNN module was used to extract spatial-temporal features, the LSTM was used to learn the temporal dependencies, and the multi-layer perceptron (MLP) for forecasting.While the model was able to accurately predict thermal power, the authors did not apply any data normalization which may have negatively affected the performance of the DHL model.
Therefore, considering the gaps identified in the reviewed works, we implement an algorithm that integrates four techniques, namely, two aggregated data normalization methods, cyclic transformations, image conversion, and feature extraction (short-term and long-term feature extraction as well as spatial feature extraction) to enhance the predictive performance of the DL model.This study employed data normalization techniques that scaled the original features using the SN and MMN methods.Combining the individual strengths of the SN and MMN methods allows for an enhanced form of shifting the data's underlying distribution toward the Gaussian distribution [36].Furthermore, the cyclic transformations used are based on the SCT, which helps in revealing the oscillatory patterns in the data.Additionally, the spatial feature extraction used is based on a CNN, which is commonly employed in image processing tasks.Finally, the LSTM and GRU are used to extract long-term and short-term dependencies, respectively.

III. METHOD
This section provides a description of how the aggregate data normalization and SCT-GAF-CNN-LSTM-GRU modules are combined to formulate an enhanced DL-based solar tracking model.The section begins by detailing the dataset used in this study, followed by an outline of the underpinning concepts of the SCT, SN, MMN, GAF, CNN, LSTM, and GRU modules.Additionally, all the experiments were carried out using the Python 3.9-based Keras framework with Tensorflow [48].The program was implemented using a Graphical Processing Unit (GPU) and 12 GB of RAM on Google Collaboratory [49].Furthermore, we also leveraged the Nvidia CUDA Deep Neural Network (cuDNN) toolkit, a powerful library that provides GPU functionality for DL models to optimize model performance [50].

A. DATASET DESCRIPTION
This study used a real-world-based dataset from the Girasol sky imaging and global solar irradiance repository [51], [52].The features included in the experiment were UNIX time, temperature ( • C), atmospheric pressure (mmHg), relative humidity (%), solar radiation (W /m 2 ), elevation angle, dew point ( • C), wind direction (radians), and wind velocity (mile/s), ), These features were selected based on seasonal, weather, and environmental conditions.The evaluation angle of the sun was considered as the output variable of the model.Furthermore, the features contained in the dataset were collected over 272 days of the solar cycle from 2017 to 2019.Observations of the sun's position were collected four to six times per second, whereas the metrological data was recorded at 10-minute intervals.However, to ensure that all the features were observed at the same interval, the metrological data was interpolated to match the time resolution of the sun position data.This was effectively done using the averaging method which readjusted the data resolution to one-minute intervals.

B. SINE AND COSINE TRANSFORMATIONS
Renewable energy-based datasets, particularly those based on solar and wind energy, often have non-linear fluctuations due to weather patterns [53].These patterns often have daily, seasonal, and yearly variations, making it difficult to predict future trends based on past patterns alone.To address this challenge, researchers in signal processing and engineering fields commonly use SCT for time series and circular analysis [54], [55].Using SCT plays a significant role in analyzing and modeling complex renewable energy-related data.Conceptually, the SCT module is expressed by ( 1) and (2).

sin (2π ϕm
where, m is the outcome of interest and T represents the unit of analysis.For instance, if the unit of analysis is monthly periodicity, then T = 12.Likewise, if the unit of analysis is an annual cycle, then T = 365.25.Therefore, in this study, the SCT module was used to extract daily, weekly, monthly, and yearly periodicity from the UNIX time (seconds) feature.Similarly, elevation angle and wind direction features were transformed into a 2-dimensional feature space by replacing each cyclic feature X with two features, cosine (X) and sine (X).

C. ADOPTED DATA NORMALIZATION METHODS
The performance of most learning algorithms is highly influenced by the choice of normalization method [56].In this study, two normalization methods were adopted.Firstly, the SN method was adopted due to its ability to handle outliers.The SN method can compress commonly occurring values to essentially the same range without compromising its ability to rescale extreme values [57].Secondly, MMN was used given its ability to preserve feature relationships in the data [58].The associated formulae and descriptions of the normalization methods are detailed as follows.

1) SIGMOID NORMALIZATION
This method normalizes the data using a sigmoid function which rescales the values into a range between 0 and 1 or −1 and 1.The SN is computed by (3), where the data value f of the feature F is normalized to f by computing: (3)

2) MINIMUM AND MAXIMUM NORMALIZATION
MMN is one of the most common approaches to data normalization which scales the features into a range between 0 and 1 or −1 and 1 [57].The bounded range in the MMN suppresses the effects of outliers resulting in a smaller standard deviation.The Min-Max scaler is calculated as follows.
where f min and f max refer to the min and max values of the features, respectively.

D. GRAMIAN ANGULAR FIELD
This study uses the GAF to encode time series into image data.The GAF represents times series in a polar coordinate system where each element is the cosine of the summation of angles [59].Given a time series T = {t 1 , t 2 , . . ., t k } of k observations, T is normalized using (5) or (6).Thereafter, the normalized data T is represented in polar coordinates by using a cosine function to encode the data values and the time stamp as the radius (r) with (7), where λ represents the polar coordinate value, m i is the time stamp and V is the constant factor used to regularize the span of the polar coordinate system.
129452 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
After encoding the time series data into the polar coordinate system, a symmetric matrix called the gramian matrix is formed.This is done by using either the gramian angular summation Field (GASF) or gramian angular difference field (GADF), which take a trigonometric summation or difference of angles, respectively [59].In this study, the data is normalized using a combination of the normalization methods described in the preceding section, unlike the standard GAF method.After that, the cosine function is applied to encode the data.We used the Gramian Angular Summation Field (GASF) to transform the encoded data into a symmetric matrix.The GASF uses the encoded data to exploit the angular perspective by considering the trigonometric summation between each inverse cosine value to identify the temporal correlation within the varying time intervals [60].The GASF is defined by (8).

E. CONVOLUTIONAL NEURAL NETWORK
A CNN is an artificial neural network characterized by a muti-layer feedforward architecture that has widely proven its superiority in extracting underlying spatial features [61].
CNNs have received wide coverage in many areas of research due to their ability to solve image recognition, object detection, speech recognition, and classification tasks [62].Unlike other DL models, such as the deep belief network (DBN) [63], CNNs have the characteristics of weight sharing and sparse connectivity [64].These two features of a CNN significantly reduce the parameters it needs to learn and extract different patterns.This study uses a CNN to extract the underlying spatial relationships in the metrological data to reduce the prediction error of the sun's position.The architecture of a CNN is depicted in Fig. 1 and comprises three parts: the convolutional layer, pooling layer, and fully connected layer.Further, the computation of the CNN is defined by (9).
where r and c represent the row and column indexes, respectively, m and n denote the convolution filter's row and column indexes, respectively.M' and n' represent the number of rows and columns of the convolution filter, respectively.u is the feature map's index in the (h-1)th layer.In this paper, two convolutional layers are used as initial layers to process the time series converted image data.Each convolution layer has 32 filters with a kernel size of 3 × 3 and applies a rectified linear unit (ReLu) activation function.Additionally, max pooling is applied using a pool size of 2 × 2 to downsample the spatial dimensions of the feature maps received from the convolutional layers.Further, dropout is applied at a rate of 0.25 to regularize the network and prevent overfitting.The output from the pooling layer is then flattened and reshaped to fit the input requirements of the subsequent LSTM and GRU layers.

F. LONG SHORT-TERM MEMORY
LSTM is a specific variant of the RNN module used to mitigate the vanishing and exploding gradient challenge the RNN faces [65].LSTM comprises three main gates: the input gate, forget gate, and the output gate [28].The primary function of the three gates is to control the flow of information in and out of the LSTM cell.More specifically, the input gate determines how much data enters the memory cell.The forget gate handles the retention of information and regulates which information is eliminated from the cell.The output gate determines which part of the information is passed on to the subsequent unit.Furthermore, the LSTM cell mainly uses the tangent and sigmoid activation functions.The following equations outline the recursive computations of the LSTM cell.
where tanh and σ denote the tangent and sigmoid activation functions, respectively.f, i, o, g, z, and h represent the forget gate, input gate, output gate, temporary memory, new memory cell, and memory block at time step s, respectively.Represents an element-wise multiplication between two matrices.s represents the time step, and W denotes the magnitude of the window.l denotes the layer weights representing input x, and b represents the bias term.The cell structure of the LSTM module is shown in Fig. 2.This study uses two LSTM layers with 64 units in the first layer and 32 units in the second layer.A dropout layer is applied after each layer with a rate of 0.2.

G. GATED RECURRENT UNIT
The GRU is another variant of the RNN model, and however, unlike the LSTM model, the GRU has fewer gates [66].In the GRU, the hidden state (hs t ) and cell state (cs t ) are controlled and merged into one [67].Generally, the GRU model comprises two gates, the update gate (ug t ) and the reset gate (rg t ).The update gate (ug t ) controls the extent to which the state information hs t−1 (cs t−1 ) at the previous time step t-1 is retained in the current state t.The reset gate (rg t ) determines how the information from the previous state will be integrated into the current candidate activation ( ca t ).The structure of the GRU cell is shown in Fig. 3 and the update equations are defined as follows.
where W x(rg) , W (hs)(rg) , W x(ug) , W (hs)(ug) , W x(cs) , and W (hs)(cs) are the network weight matrices.b rg , b ug , and b cs denote the bias vectors.rg t and ug t represent the vectors of the activation values of the reset gate and update gate, respectively.This paper uses two GRU layers, with 64 units in the first layer and 32 in the second layer.Dropout at a rate of 0.2 is applied after each GRU layer.The output from the SN method is then passed to the MMN method, where the resulting output is passed to the subsequent module.The normalized data is converted to images using a GAF technique in the image conversion module.In the following module, the CNN module extracts spatial features in the image data.Thereafter, the forecasting module is used to predict the sun's position using both the LSTM and GRU models, which are ideal for identifying temporal dependencies in data sequences.The processing flow of the combined data normalization-based DL model is presented in Fig. 4.

I. PERFORMANCE METRICS
There are several performance metrics that various researchers use to evaluate the predictive ability of their models [68].However, there is no uniform criterion for selecting the appropriate evaluation metrics [69].Thus, in this study, we used the mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) to verify the performance of the combined data normalization-based DL model.The MSE was used as the loss function to calculate the average squared difference between the actual values k i and the predicted values ki as shown in (21).
Similarly, MAE was used to calculate the average absolute variation between the actual and predicted values as defined by (22).A low MAE value is desirable for a model.
MAPE was used to calculate the percentage difference between the actual and observed values as explained by (23).It is expected for a model to have a low MAPE value.
RMSE was used to calculate the square root of the mean squared difference between the actual and predicted values as computed by (24).Smaller index values of the RMSE indicate a high model performance.
The suitability of the selected performance metric lay in the experiment's overall purpose, which was to predict the sun's position.The MSE provides an essential model training, validation, and verification benchmark.It provides an ideal performance measure for models that predict continuous variables because of its concept of cross-entropy [70].Likewise, MAE is the most natural measure of average error size, and, in contrast to RMSE, it provides an unambiguous measure of the average error magnitude [71].MAPE, on the other hand, offers reliability, ease of interpretation, clarity of presentation, and utilization of all the error-related information, which make it significantly effective [72].RMSE is a desirable error metric for several mathematical calculations because it avoids using absolute values [73].Therefore, considering these metrics would allow for a more informed conclusion about the model's overall performance.

IV. RESULTS AND DISCUSSION
This section analyzes the performance of the SCT-SN-MMN-CNN-LSTM-GRU model by comparing the impact of the SN-MMN and other existing data normalization methods on the model.We also demonstrate the benefit of integrating the CNN, LSTM, and GRU modules by comparing the DHL model to other DL models.Furthermore, we compare the model's performance on both the image-transformed dataset and the original tabular dataset.To appraise and demonstrate the merits of the SCT-SN-MMN-CNN-LSTM-GRU model, we performed 30 runs of model testing for each of the individual comparisons.Additionally, significance testing was performed to ensure that the differences observed in model performance were statistically significant.

A. COMPARISON BETWEEN DATA NORMALIZATION METHODS
To illustrate the benefit of applying the SN-MMN approach to the DHL model, we compare the model's performance when other existing single-based data normalization methods are applied.Furthermore, we also analyzed the effect of 129456 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
non-normalized (NN) data on the performance of the DHL model.This study used MMN, ZS, SN, and median scaling (MS) methods.Table 1 presents the average performance results of the model scored when each normalization method is applied.Based on Table 1, it is evident that the performance of the DHL model fluctuates based on the different normalization methods.Among the normalization methods, the SN-MMN method stood out with the best MAE, MAPE, and RMSE of 0.0073, 1.4635, and 0.0097, respectively, indicating exceptional predictive model performance.MMN followed with slightly worse but commendable results, recording an MAE of 0.0138, MAPE of 2.7470, and RMSE of 0.0193.SN followed, with notably higher MAE, MAPE, and RMSE scores of about 0.0198, 3.9567, and 0.0324, respectively, signaling increased prediction errors.Subsequently, ZS normalization exhibits a further deterioration in model performance, resulting in an MAE of 0.0277, MAPE of 5.6470, and RMSE of 0.0540, suggesting a relatively larger predictive error compared to the previous methods.Under the MS method, the DHL model exhibited an MAE of 0.0383, a MAPE of 7.5295, and an RMSE of 0.0552.Finally, the NN approach resulted in the worst model performance, with an MAE of 0.0395, MAPE of 7.7230, and RMSE of 0.0561, underscoring the pivotal role of normalization in enhancing predictive accuracy.
Additionally, Fig. 5 presents the scatter plot of the proposed DHL model when trained using the different normalization methods.Each normalization method is presented with two subplots, one representing the sine of the sun's elevation angle and the other representing the cosine of the angle.The vertical axes of the subplots are the predicted cosine and sine elevation angles.Similarly, the horizontal axes of the subplots are the actual cosine and sine elevation angles.Furthermore, the baselines are represented by the red diagonal line in the plots.For the SN-MMN method, the scatter plot converges well to the baseline, which demonstrates that the predicted results match the actual results.For the MMN method, there is evidence of a slight deviation between the blue points and the baseline.Also, from Fig. 5, it is indeed evident that there is a significant deviation from the baseline line for the ZS, MS, SN, and NN methods.This further justifies the phenomenal performance achieved when the SN-MMN method is used.This may have been due to the SN-MMN method's ability to effectively scale the data to the Gaussian distribution.

B. COMPARISON BETWEEN DIFFERENT DL MODELS
In this section, we compare the performance of different DL models to the CNN-LSTM-GRU model.Specifically, we compare the proposed model to CNN, LSTM, GRU, and CNN-LSTM.This is done to analyze whether the integration of spatial feature extraction along with short-term and longterm temporal dependence extraction impacts the model's performance.Table 2 presents the average performance results of the different DL models.According to Table 2, the CNN-LSTM-GRU model achieved the best results with an MAE of 0.0073, MAPE of 1.4635, and RMSE of 0.0097.Following closely, the CNN model also delivered a commendable performance, scoring an MAE of 0.0136, MAPE of 2.7501, and RMSE of 0.0194.The GRU model followed with an MAE of 0.0133, a MAPE of 2.6137, and an RMSE of 0.0158, demonstrating competitive results.Subsequently, the CNN-LSTM model also showed competitive performance with an MAE of 0.0146, a MAPE of 2.9230, and an RMSE of 0.0190.Lastly, the LSTM model exhibited the lowest performance, with an MAE of 0.0214, a MAPE of 4.3419, and an RMSE of 0.0252.Furthermore, Figs. 6 and 7 present the residual plot and histogram plots of the DL models.The analysis of these residual plots and histograms is crucial for further assessing the performance and reliability of the DL models.
In Fig. 6, the subgraphs of residuals are labeled with index values on the x-axis and residuals on the y-axis.These graphs illustrate the variation between the true target values and the predicted values of the sine and cosine of the sun's elevation angles.As expected, the CNN-LSTM-GRU model displays a consistent spread of residuals across the entire range of predicted values.This indicates that the model maintains dependable performance regardless of whether predictions are high or low.Moreover, the absence of discernible patterns or trends in the differences between the model's predictions and actual values suggests that it avoids consistently overestimating or underestimating in specific situations.Similarly, the CNN-LSTM model also exhibits a relatively consistent spread of residuals and maintains a fairly random pattern.However, the absence of the GRU model may have limited its performance.Conversely, it is evident that the CNN, LSTM, and GRU models do not display random patterns, as most of the points are consistently above and below the baseline.This suggests that the majority of the sun's elevation angle cosine values are systematically overpredicted by these three models.Similarly, the indication also suggests that the sine values are consistently underpredicted by the models, with this pattern being significantly evident in the case of the GRU and LSTM models.
In Fig. 7, the x-axis represents the range of residual values, and the y-axis represents the frequency of residuals within each range.It is evident that the residuals from the CNN-LSTM-GRU model form a symmetric bell-shaped curve centered near zero, implying that these residuals follow a Gaussian distribution.This observation suggests that the CNN-LSTM-GRU model effectively captures the hidden relationships within the data, demonstrating its ability to learn from the data effectively.In contrast, the other models exhibit more skewed distributions, indicating that they may struggle to capture the underlying patterns in the data accurately.Furthermore, the scarcity of unusual or extreme values in the residuals of the CNN-LSTM-GRU model indicates its capability to make reliable predictions across a diverse range of naturally occurring situations.Indeed, both the residual plot and histogram collectively highlight the superiority of the CNN-LSTM-GRU model, which is consistent with the remarkable performance observed in the MAE, MAPE, and RMSE results.Certainly, the integration of the CNN's spatial feature extraction capabilities, along with the LSTM-GRU's short and long-term temporal dependence extraction, significantly contributes to the model's outstanding performance.

C. COMPARISON OF MODEL PERFORMANCE BASED ON IMAGE AND TABULAR DATA
In this section, we compare the performance of the model on image data and tabular data.This is done to illustrate the significance of the image conversion module that is incorporated into our model.Table 3 shows the average performance results of the DHL model on image and tabular data.Based on 3, the DHL model demonstrates remarkable accuracy when trained using image data.The model achieved outstanding MAE, MAPE, and RMSE results of 0.0073, 1.4635, and 0.009, respectively.In contrast, the DHL model exhibits significantly poorer performance when trained on tabular data.The model obtained MAE, MAPE, and RMSE scores of 0.0380, 7.4467, and 0.0430, respectively.These results indicate that the image conversion module significantly enhances the model's effectiveness by revealing fine-grained relationships in the data through its conversion of tabular data into images.To further illustrate the remarkable performance of the DHL model's image transformation module, Fig. 8 presents an analysis of the DHL model's convergence when trained with tabular data and image data.
According to Fig. 8, the loss curve of the DHL model, when trained with image data, exhibits a more rapid initial drop compared to when it is trained with tabular data.Furthermore, the DHL model trained with image data maintains a consistently lower level of loss and remains stable until the end, in contrast to the tabular data-trained DHL model.This demonstrates that the utilization of image data significantly influences the model's training, enabling it to maintain a low training loss and achieve better convergence performance.This remarkable achievement can be attributed to the DHL model's use of the 2D-CNN module, which allows it to learn deeper spatial relationships.In contrast, when the model is trained on tabular data, it is limited to using the 1D-CNN module, which restricts the number of features it can learn.Additionally, unlike the conventional GAF image transformation approach, which employs a single data normalization operation, the DHL model incorporates SN and MMN techniques.Moreover, it applies an initial SCT before image transformation and employs a cosine transformation during image transformation.These combined approaches likely contribute to the DHL model's high accuracy by leveraging fine-grained image-transformed data.

D. SIGNIFICANCE TESTING
Statistical significance testing was employed to verify that the performance of the DHL model across the different models, normalization methods, and data types was statistically significant.The model was evaluated for significance based on their MAE, MAPE, and RMSE scores.To start with, we assessed the performance results of the DHL model across the different models, normalization methods, and data types.This included determining whether the MAE, MAPE, and RMSE results followed a normal distribution.Given our small sample size, which was derived from 30 model testing runs across the different comparisons, we employed the Shapiro-Wilk test to assess the normality [74].Using a significance level of 0.05, if the p-value exceeds 0.05, the data distribution is not significantly different from a normal distribution.Conversely, if the p-value is less than 0.05, it indicates a significant departure from normal distribution.Table 4 presents the Shapiro-Wilk normality test outcomes for the comparisons across the different models, normalization methods, and data types based on their performance results.
Based on Table 4, the Shapiro-Wilk test showed that although all the samples contained 30 elements, some of the method's performance results departed significantly from normality with p-values < 0.05 and very high statistic (W) values.Thus, due to the sample deviation from the normal distribution, the Kruskal-Wallis (H) non-parametric test was used on the samples to verify that the results from the different methods differ significantly [75].The null hypothesis of the 129460 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Kruskal-Wallis (H) test is that the data samples do not differ.If the p-value falls below the specified significance level (α = 0.05), we reject this null hypothesis and conclude the existence of a in among the groups.Therefore, the results for comparing the different method performance result samples are shown in Table 5.
Based on Table 5, all the p-values recorded are considerably less than 0.05, indicating strong evidence against the null hypothesis of equal medians in performance scores for the comparisons across the different models, normalization methods, and data types.Therefore, we reject the null hypothesis, suggesting that at least one of the individual methods has a significantly different median performance score compared to the others.To further investigate the pairwise differences between the comparison types, post-hoc testing was done using Bonferroni-Dunn's test.A two-tailed null hypothesis at the 0.01 and 0.05 level of significance was employed.Table 6 presents the results of the significance testing for the DHL model with the SN-MMN method against the other single-based methods.
As seen in Table 6, the group-wise p-value comparison between the SN-MMN method and the other normalization methods, except MMN, is statistically significant at the 0.001 level.Nevertheless, the difference observed between the SN-MMN and other methods is statistically significant 129462 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.at the 0.05 significance level.Therefore, it can be concluded that the outcomes produced by the methods are significantly different.As seen in Table 7, the difference between the CNN-LSTM-GRU and other DL models was statistically significant at the 0.001 significance level.Furthermore, Table 8 shows the results of the significance testing of the performance of the DHL model when trained with image data against tabular data.Based on Table 8, it is evident that the p < 0.001 for each of the comparisons between the image data and the tabular data is statistically significant.Thus, we can conclude that the performance results of the DHL model differ significantly when trained on image and tabular data.

E. COMPARISON OF THE PROPOSED MODEL WITH OTHER RECENTLY PUBLISHED MODELS
As discussed in the literature review, several recent studies have proposed various DL-based solar tracking models.These models represent some of the most recent intelligent models designed to predict the sun's position efficiently.However, their robustness and superiority can vary based on factors like the variables used for training, preprocessing techniques, and model complexity.Therefore, this section aims to compare the most recently developed DL models for solar tracking with the DHL model proposed in this study.We consider four recent articles in this comparison to evaluate and assess the superiority and contribution of the proposed DHL model.It's worth noting that each of the models proposed in the previous studies was trained and tested on non-publicly available datasets.To compare their performance against that of the proposed model, all the models were trained and tested on the publicly available dataset used in this study.Table 9 provides a comparison between the proposed model and the other recently published models.As depicted in Table 9, the proposed model consistently outperforms all the models in previous studies.This highlights several key factors.Firstly, it is important to note that data types may impact the performance of models.Specifically, the previous studies' models used tabular data, and similarly, this study also utilized tabular data.However, relying solely on tabular data often limits the utilization of spatial features inherent in the data.Thus, this study incorporates a tabular-to-image conversion, which enables the extraction of spatial features.Secondly, the recent studies employed both RNN and LSTM models.The use of individual RNN and LSTM models in those studies allowed them to capture short-term and long-term dependencies in the data.In contrast, this study integrates the advantages of CNN, LSTM, and GRU models, enabling the simultaneous capture of spatial and long-term relationships in the data.Finally, a noteworthy observation in the previous studies is the use of methods based on single data normalization.Conversely, this study incorporates two different data normalization methods, effectively scaling the features by leveraging the strengths of each technique.By harnessing these capabilities, the proposed model achieves superior performance compared to previous studies.

F. CONCLUSION
This study provides empirical evidence that using the SN-MMN method with cyclic transformations, image conversion, spatial feature extraction, and short-term and longterm feature extraction significantly enhances solar tracking.Our experiments demonstrate three essential aspects for achieving success.These aspects include effective feature scaling while preserving original data relationships, leveraging image representation of the dataset, and modeling spatial and temporal dynamics using CNN, LSTM, and GRU modules.Furthermore, The proposed hybrid model outperforms existing methods on a publicly available dataset, achieving outstanding performance with MAE, MAPE, and RMSE scores of 0.0073, 1.4635, and 0.0097, respectively.However, this study has limitations.Since the proposed model was only trained on data from a single geographical location, the model may require reconfiguration according to the location of its implementation.Additionally, it should be noted that the model was trained on data collected over 272 days through three years, representing approximately 25% of the period.This may limit the generalizability of the model.Thus, future work should consider training the model on a larger dataset, as well as investigating the integration of image and tabular data for improved solar tracking.It would also be interesting to combine the two data types with ensemble and other state-of-the-art methods.
b hj represents the bias term of the jth feature map in the hth layer.ω mn hju denotes the value of the position (m, n) in the convolution filter that links the uth feature map in the (h-1)th layer.The value of a position (r, c) in the jth new feature map in the hth layer is represented by map rc hj .Likewise, map (r+m)(c+n) (h−1)u Depicts the value of the position (r + m, c + n) in the uth feature map in the (k-1)th layer.The activation function used in each layer of the CNN is represented by g.

FIGURE 6 .
FIGURE 6. Residual graph of predicted results from DL models.

FIGURE 8 .
FIGURE 8. Loss curve of the DHL model with image and tabular data.

TABLE 4 .
Shapiro-wilk normality test results for different comparision types.

TABLE 1 .
Average performance of the DHL model based on the different normalization methods.

TABLE 2 .
Average performance of the DL models.

TABLE 3 .
Average performance of DHL model using different data types.

TABLE 5 .
Kruskal-wallis test results based on comparison types.

TABLE 6 .
Post-Hoc tests results of the SN-MMN and other normalization methods.

TABLE 7 .
Post-Hoc tests results of the DHL model and other DL models.

TABLE 8 .
Post-Hoc tests results of the DHL model on image data against tabular data.

TABLE 9 .
Comparision of the proposed model with other recently published models.