A Novel Interval Forecast for K-Nearest Neighbor Time Series: A Case Study of Durian Export in Thailand

The k-nearest neighbor (K-NN) time series model is widely favored for its simplicity and ease of understanding. However, it lacks a forecast interval, an essential feature for capturing the uncertainty inherent in point forecasts. This study introduces a novel interval forecasting approach that integrates the K-NN time series model with bootstrapping. A key step involves determining the optimal distribution of K-NN forecasted values, derived from a range of $k$ values representing the number of nearest neighbors. Considered distributions include Gaussian, gamma, logistic, Weibull, log-normal, Cauchy, inverse-gamma, log-logistic, inverse Weibull, and log-gamma. Forecast values from both recursive and multi-input multi-output (MIMO) K-NN time series techniques are used as inputs in the bootstrapping framework. The proposed forecast intervals are compared with those obtained from the seasonal autoregressive integrated moving average (SARIMA) model, which is a benchmark in statistics. Performance is evaluated using many criteria, such as root mean squared error (RMSE) and average interval width. In a case study of durian exports in Thailand, the results show that the intervals from both recursive- and MIMO-based K-NN forecasts are narrower than those from SARIMA, suggesting increased forecasting confidence. This proposed interval is also applicable to other datasets with trend and/or seasonal components.


I. INTRODUCTION
K-NN is a widely used machine learning algorithm known for its simplicity and effectiveness in various applications, including agriculture [1].It has been successfully applied to other crops for forecasting purposes, demonstrating its potential to provide accurate and reliable predictions [2].Furthermore, K-NN has been successfully applied to various research areas, such as financial modeling [3], image interpolation [4], visual recognition [5], and environmental sciences [6].
K-NN has demonstrated its utility in time series forecasting, with various studies implementing the algorithm for diverse applications.An

integrated K-NN regression model combined with principal component analysis was
The associate editor coordinating the review of this manuscript and approving it for publication was Fabio Mottola .
proposed by Tang et al. [7] for predicting financial time series.Zhang et al. [3] employed a multidimensional K-NN model in conjunction with ensemble empirical mode decomposition to forecast stock prices in a separate investigation.Martínez et al. [8] devised a comprehensive methodology using the K-NN approach to precisely predict the N3 competition dataset.The objective was to create an automated tool for selecting modeling strategies and preprocessing approaches.Alvarez et al. [9] presented a pattern sequencebased forecasting algorithm that utilizes the K-NN method to predict electricity prices and demand.Lora et al. [10] proposed a straightforward method for predicting future electricity market prices using a weighted K-NN approach.They provided an explanation of the process of obtaining the relevant parameters that define the adopted model.These parameters are related to the duration of the time series window and the number of selected neighbors for prediction.
Yan [11] devised an automated scheme for modeling artificial neural networks (ANN) using the generalized regression neural network (GRNN), which is a specific variant of neural networks.Fernandez-Rodriguez et al. [12] extended nearestneighbor predictors to include simultaneous time series and applied them to nine EMS currencies.The results show marginally better forecasting performance compared to random walk and autoregressive integrated moving average (ARIMA) predictors, producing directional forecasts.
In a recent work, Martínez et al. [13] established a methodology that employs K-NN regression in the context of time series forecasting, demonstrating its capability to generate accurate forecasts for a wide array of time series.This approach was thoroughly examined and evaluated, utilizing the data set from the N3 competition.In a subsequent study, Martínez et al. [14] introduced the tsfknn package in R, specifically designed for time series forecasting using K-NN regression.The package offers a selection of different strategies for multi-step predictions and various functions for aggregating the targets of the nearest neighbors.Easton et al. [15] applied a K-NN model for forecasting earnings, quantifying the dispersion of the nearest neighbors' lead earnings around the point forecast by using the median of the absolute differences between each nearest neighbor's lead earnings and the median of the nearest neighbors' lead earnings.
According to the literature, K-NN time series techniques are only applicable to point forecasts.This paper aims to propose a novel method for finding interval forecasts based on simulated distributions of point forecasting values obtained from recursive and multi-input multi-output (MIMO) strategies.For K-NN, identifying the optimal value for the parameter k was a crucial step.This involved conducting trials to determine the most effective k value for forecasting.However, it is important to highlight that the information derived from all-point forecasts across different k values is valuable.In our methodology, these point forecasts are not disregarded.Instead, they are harnessed as integral inputs for our proposed approach.
The remainder of this paper is organized as follows: Section II presents a review of the K-NN time series techniques; Section III describes the seasonal autoregressive integrated moving average (SARIMA) model; Section IV provides an overview of the bootstrap method; Section V describes the durian data and its preprocessing; Section VI outlines the proposed method for determining interval forecasts; Section VII describes the performance comparison; Section VIII presents the results; Section IX discusses the findings; and Section X offers the conclusion.

II. K-NN TIME SERIES FORECASTING STRATEGIES
The K-NN is a well-known nonparametric method used for both classification and regression tasks.It is often referred to as a ''lazy learner'' because its training process simply involves storing instances in their original form, yet it remains easily understandable.This section will describe time series forecasting using K-NN regression, followed by an explanation of the recursive or iterative technique and the multi-input multi-output (MIMO) strategy.

A. K-NN TIME SERIES REGRESSION
K-NN regression acts as a repository for training samples.
Each training example, referred to as the i-th example, consists of a vector of m characteristics, denoted as x i = (x 1 , x 2 , . . ., x m ) i , which describes the example, and a corresponding target value, denoted as y i , for univariate time series.In time series analysis, given a target value, lags from 1 to m serve as input variables.When a new example with known m lags x ′ = (x 1 , x 2 , . . ., x m ) but an unknown target value is introduced, the features of this new example are used to identify the k training examples most similar to it.This similarity is assessed based on feature vectors using a measure like Euclidean distance, where the distance between the new instance and each training example is computed using the Euclidean formula.
The k training examples that are nearest to x ′ are deemed to be its k most analogous examples or k nearest neighbors.The principle underpinning K-NN is learning through comparison.With a new example, it is assumed that the targets of its closest neighbors are likely comparable to its undisclosed target.In this manner, the targets of the nearest neighbors are consolidated to forecast the target of the new example.Suppose targets or the k nearest neighbors are the vectors: y (1) , y (2) , . . ., y (k) , these can be averaged to estimate the target of the new example as k i=1 w i y (i) , where w i is the weight of y (i) .If w i = 1 k, k i=1 w i y (i) = k i=1 y (i) k, which is the arithmetic mean.Furthermore, the estimated target can be a median, a trimmed mean, or a certain function of y (i) [6].

B. RECURSIVE STRATEGY
For the prediction of multiple periods, techniques like ARIMA or exponential smoothing frequently use the recursive or iterative method as a tactic.This is the oldest and most intuitive forecasting strategy [16], [17], [18], [19], [20].In this approach, a model specifically designed for one-step forecasting is implemented.As such, this model is repetitively applied to project all subsequent periods in the future [14].The model can be described as In (1), d represents the size of the window of previous observations used for forecasting, and e t+1 denotes the noise component at time t + 1.The term t belongs to the {d, . . ., N -1}, where N represents the total number of observations in the time series [19].The forecasting process for h future steps commences with the first forecast generated by applying the f (y t ) model.The output of this initial forecast then serves as input for subsequent forecasts, with this procedure repeating until all h steps are predicted through the application of the f (y t−d+1 ) model [21].For example, the first three forecasts are: ŷt+1 = f (y t , . . ., y t−d+1 ), (2) where ŷt+1 , ŷt+2 , and ŷt+3 are the forecast values at time t +1, t +2, and t +3, respectively.These values are predicted using the recursive application of the forecasting function f .In time series analysis, noise and the length of the forecast horizon can affect the performance of forecasting methods, especially for predictions extending several steps into the future.This issue becomes significant when the forecast horizon is longer than the number of lagged observations used in the model.In such cases, the model ends up using predicted values as inputs instead of actual past data.Despite these challenges, the recursive approach is widely recognized and effective for forecasting in various practical time series applications, including recurrent neural networks [22] and K-NN methods [23].The R package tsfknn [14] also supports this forecasting strategy.

C. MIMO STRATEGY
The Multi-Input Multi-Output (MIMO) strategy is a method employed in time series forecasting that enables the prediction model to handle multiple input and output variables.This approach stands in contrast to the single-input single-output (SISO) models, which manage a single input and output [24].The model can be described as [y t+h , . . ., y t+1 ] = F(y t , . . ., y t−d+1 ) + e, (5) where t ∈ {d, . . ., N − 1} , F : R d → R h is a vectorvalued function, and e ∈ R h is a noise vector with an optionally diagonal covariance [25].The underlying principle of the MIMO approach is to retain the inherent stochastic interdependence among forecasted values, a key feature of time series.Unlike the direct approach, the MIMO method does not rely on a presumption of conditional independence.Additionally, it sidesteps the issue of error accumulation often encountered in the recursive strategy [26].

III. SARIMA MODEL
The seasonal autoregressive integrated moving average (SARIMA) model, an enhancement of the autoregressive integrated moving average (ARIMA) model, is ideally suited for analyzing time series data with seasonal patterns.Its widespread use in statistical analyses is due to its interpretable parameters and the capability for hypothesis testing on these parameters.SARIMA, supporting both point and interval forecasting, serves as an excellent benchmark for comparing interval forecasts with the proposed K-NN time series methodology.
The SARIMA model is represented as SARIMA(p, d, q)(P, D, Q) s .Here, p denotes the number of autoregressive terms, d the number of differencing steps for stationarity, and q the number of moving average terms.The seasonal elements P, D, and Q correspond to the seasonal autoregressive, differencing, and moving average terms, respectively, with s indicating the seasonality period.The SARIMA model's formula is: In this formula, φ and represent the autoregressive coefficients for the non-seasonal and seasonal parts, respectively, while θ and are the moving average coefficients for the non-seasonal and seasonal parts, respectively.L denotes the lag operator [6].

IV. BOOTSTRAP METHOD
The bootstrap method, originally proposed by Efron [27], is a statistical technique that falls within the broader class of resampling methods.This methodology provides a simple and general tool to quantify the uncertainty associated with a given estimator or statistical learning method, allowing for robust estimation of the sampling distribution of a statistic.This approach has gained popularity due to its ease of implementation and efficacy in estimating the distribution of a statistic without making strong assumptions about the form of the population from which the sample was drawn.
Many researchers applied bootstrapping to various applications.As an illustration, Ji et al. [28] introduced an innovative approach using deep belief networks (DBNs) and bootstrapping to construct prediction intervals (PIs).Rodrigues and Silva [29] calculated the confidence intervals for reliability data of power distribution equipment using existing historical indices.The determination of these intervals is achieved by combining the bootstrap technique with calibration models.Panichkitkosolkul and Srisuradetchai [30] utilized the bootstrap method to estimate confidence intervals for the parameter of the zero-truncated Poisson-Ishita distribution, which is a probability distribution commonly used for count data.Zhang et al. [31] aggregated individual point forecasts to create either a probabilistic or an interval forecast.Random forest and gradient boosting regression tree were initially trained using bootstrap-sampled training datasets and various forecasting models.Subsequently, the bootstrapping process was employed again to generate multiple-point forecasts.
The bootstrap method works by resampling with replacement from the observed dataset to generate numerous bootstrap samples.By calculating the statistic of interest for each of these samples, a sampling distribution of parameters can be built up over many iterations, providing both point and interval estimates [32].This is also extremely valuable in situations where the data sizes are too small to invoke asymptotic results [33].

V. DATASET OVERVIEW AND PREPARATION
Durian, a highly valued fruit in Thailand, has seen significant growth in its industry due to the country's favorable climate, soil conditions, and increasing demand from both domestic and international markets.As a major producer and exporter of durian, Thailand has experienced substantial growth in its agricultural economy.The primary regions for durian cultivation are the eastern and southern regions, including provinces such as Rayong, Chanthaburi, Chumphon, and Surat Thani [34].Driven by its profitability and the support of government and private sector policies, durian farming has expanded significantly over the years [35].Thailand heavily exports agriculture-related products, with China being its main trading partner.Since 2014, durian has been Thailand's most valuable fruit export, with sales reaching a new high of USD 934.9 million in May 2021.China accounts for 70% of all durian exports [36].
In 2023, Thai buyers are expected to pay more for durian as Chinese investors purchase and ship it to China.Durian is Thailand's most valuable export to China, worth over 100 billion baht and holding a market share of over 90%.However, Thailand's durian market share in China is likely to decrease due to increased competition from other countries and China's ability to grow its own durian [37].
Given the importance of durian in Thailand's agricultural economy, accurate forecasting of its production and export is essential for stakeholders such as farmers, exporters, and policymakers [38].This study aims to optimize durian export forecasts in Thailand using the K-NN technique on time-series data of monthly durian exports.However, point forecasting is not sufficient, so a prediction interval using K-NN will be proposed in this paper.
The monthly durian export data, measured in million Thai Baht (THB), was sourced from the website https://www.oae.go.th/view/1/Home/EN-US.The dataset, which is complete with no missing values, spans from January 2015 to May 2023, totaling 101 months, and is depicted in Fig. 1 1 indicates that a multiplicative model could be appropriate for this dataset, represented by the formula y t = T t × S t × ε t .In this model, T t and S t represent the trend and seasonal components, respectively.Often, the irregular component is difficult to estimate and is therefore included in the error term, ε t .Moreover, as displayed in Fig. 2    The forecasted point in K-NN regression is calculated as the average value of the k nearest neighbors.When the data displays a pronounced global trend, this average value might not accurately reflect future values, as target values tend to be lower than actual future values.Therefore, detrending the data is essential.The trend line is determined by fitting a regression line, and a quadratic equation proves to be the most suitable.This suitability is indicated by the statistical significance of the coefficient of t 2 .The estimated trend line is as follows: where T t denotes the estimated trend at time t, and t ranges from 1 to 89 (up to the final month of the training data).
To detrend, the actual export values are divided by their corresponding trend values.The resultant detrended data, which is dimensionless, is depicted in Fig. 3.This detrended data serves as the input for the proposed forecast interval methodology discussed in the subsequent section.

VI. PROPOSED INTERVAL FORECAST
In K-NN regression, the forecasting outcome is heavily influenced by two main factors: k, which represents the number of closest neighbors, and m, which denotes the number of lags or features.Given the pivotal role of these factors in prediction outcomes, several strategies have been proposed to determine the optimal k.Two strategies are predominantly adopted.The first employs a heuristic technique, suggesting that k be set to the square root of the total number of training samples.The second approach emphasizes optimization [19].
In this strategy, the primary training data is partitioned into a training subset and a validation subset.The objective is to pinpoint a k that yields the most accurate forecast on the validation set, using the training data as a basis.However, this optimization method can be time-intensive.Recognizing this challenge, Martinez et al. [13] introduced an approach that amalgamates the forecasts from different K-NN models, each with a distinct k value.By averaging the forecasts from these models, they presented a solution that is not only expedient but also not tethered to a single heuristic k value.Nevertheless, this approach primarily provides point predictions.
The proposed method utilizes the distribution of the bootstrap sample.Given that the number of forecasts corresponds to different k values, indicating the number of nearest neighbors, these forecasts are treated as observed forecasts.The next step is to identify the most suitable distribution for these observed forecasts.Once the best-fitted distribution is determined, a bootstrap sample is generated from it.The forecast values are then bound by the α-th and (1−α)-th quantiles.The detailed process for constructing this forecast interval is methodically outlined in the following steps: 1. Data Analysis: Begin by examining the inherent patterns within the time series data.Focus on its trend and seasonal components to determine whether an additive or multiplicative model better suits the dataset.Frost [39], the minimum effective sample size is 10.
Testing k * values up to 12 showed that forecasts do not significantly change after k * = 12.However, a k * of 9 already yields accurate forecasts for the durian data.For other datasets, the optimal largest k needs to be determined and depends on the nature of the data.4. Selection of Lagged Observations: Choose the lagged observations from the time series that will serve as input variables for the K-NN regression model.Given datasets with a monthly frequency and a noticeable seasonal component, lags typically range from 1 to 12 months.For the recursive technique, the model can be formulated as: and for the MIMO technique, the model will be [y t+12 , . . ., y t+1 ] = F(y t , . . ., y t−11 ) + e, (8) where t ∈ {12, . . ., 88} and e ∈ R h is a noise vector.5. Determination of the Best-fitting Distribution: For a forecast extending h steps into the future, and with a series of k values that range from 1 to k * , the corresponding forecasted point values are represented as ŷ1 t+h , ŷ2 t+h , . . ., ŷk * t+h .These values are regarded as the observed sample forecasts within the bootstrapping methodology.In this step, the main goal is to ascertain a probability distribution that most closely aligns with these values.The probability distributions examined in this research encompass: (1)Gaussian distribution: X ∼ N (µ, σ 2 ) (2) Gamma distribution: X ∼ G(α, β) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
(6) Cauchy distribution: X ∼ C(m, θ) The mean and variance do not exist.( 7) Inverse-gamma distribution: X ∼ IG(α, β) (9) Inverse Weibull distribution [40]: X ∼ IW (α, β) (10) Log-gamma distribution [41]: where ψ(α) and ψ ′ (α) are the digamma and trigamma functions, respectively.Among the ten candidate probability distributions, the one yielding the lowest Akaike's information criterion (AIC) is designated as the ''best'' model.The parameters for this selected model are determined using the maximum likelihood (ML) method.This chosen distribution is then employed for forecasting values.Tables 1 and 2 present the results of the five distributions with the lowest AIC for each month's durian export.In these tables, the observed sample forecasts are derived from both recursive and MIMO techniques.Notably, no single distribution consistently emerges as the best fit for every month.

VII. PERFORMANCE COMPARISONS
The proposed forecast intervals will be compared with those generated by the SARIMA model, focusing on metrics such as average width, standard deviation of the width, percentage of coverage, and the percentage of negative lower bounds.For point forecasts, the comparison will involve metrics like the root mean squared error (RMSE) and the coefficient of determination, R-squared (R 2 ).The specifics of these criteria are outlined as follows: (1) Percentage of Interval Coverage (% Coverage): This metric reflects how often the forecast interval contains the actual value.(2) Percentage of Times with Negative Lower Bound (% Neg.LB): This criterion assesses the frequency of intervals having a negative lower bound (LB), which is particularly important for data where negative LBs are not acceptable.(3) Width Average: The average width is calculated using the formula M i=1 (UB i − LB i ) M , where UB i and LB i are the upper and lower forecast bounds for observation i, and M is the size of the test dataset.A lower width average indicates greater confidence in the point forecast.
where M represents the size of the test dataset.(5) R-squared (R 2 ): This metric quantifies the proportion of the variance in the actual values that is explained by the proposed model.The formula is: where ȳ is the mean of the actual values.

VIII. RESULTS
Upon reviewing the point and interval forecasts for durian export values, as detailed in Tables 3, 4, and 5, several key observations can be identified.

A. PROPOSED INTERVAL USING RECURSIVE AS INPUT
Table 3 highlights the notable strengths of the proposed forecasting method using the fitted distribution.In June 2022, the     for several months are negative, which is an important consideration.One month that stands out across all methods is August 2022, where the forecasted values significantly deviated from the actual value of 2,537.52 million THB.Each model predicted higher export values, with the MIMO-based forecast being the most optimistic at 13,874.82 million THB.External factors, which the models did not account for, could be responsible for this discrepancy.This instance is considered an irregular component in time series data, which is inherently unpredictable.Further investigation revealed that China, the primary export market for Thai durian, had  input, the proposed method using recursive input, and the traditional SARIMA approach for the test data.
implemented intermittent border closures as part of its zerocovid policy during this period.
The forecast proposed using the MIMO input marginally outperforms the recursive-based method.However, both methods significantly exceed the SARIMA model in accuracy, particularly in capturing substantial data shifts, such as the notable increase in April 2023.As illustrated in Fig. 7, while all methods provide intervals that cover most of the actual values, the proposed methods' narrower intervals suggest greater precision.The SARIMA model sometimes struggled to adjust to sharp fluctuations in data, often leading to wider forecast intervals.For the MIMO-based, recursive-based, and SARIMA methods, the RMSEs are 6,462.28,8,238.33,and 11,804.65 million THB, respectively.Compared to the SARIMA model, the RMSEs for the MIMOand recursive-based methods reduced by approximately 45% and 30%, respectively.The average width of the interval for the MIMO-based method is slightly narrower than that for the recursive-based method; however, both proposed intervals are significantly lower than that of the SARIMA model's interval.Additional criteria are summarized in Table 6.

IX. DISCUSSION
Given that our dataset consists of only 89 observations without explanatory variables, a method adept at efficiently managing small sample sizes and accurately capturing nonlinear patterns and seasonal components was essential.Additionally, external factors like governmental policies and unforeseen events, such as epidemics, can significantly impact durian exports, making reliance on point forecasts less adequate.Therefore, this study advocates for the use of forecast intervals as a more robust approach to account for the uncertainties inherent in this data.
The K-NN time series method was selected due to its straightforward nature and ability to effectively manage the complex, non-linear patterns often found in durian export data.While more advanced techniques like the light gradient boosting machine and ensemble models excel in point forecasting with extensive datasets, they tend to encounter difficulties in interval forecasting, particularly with limited data.Conversely, the SARIMA model, regularly employed as a standard in time series analysis for data displaying seasonal and trend components, offers a significant point of reference.Such data characteristics are well-suited to the K-NN time series model.
The proposed interval forecasting method independently identifies the best-fitted distribution for each month, based on ''observed forecasts'' derived from K-NN time series models with varying k values (k = 1, 2, . . ., k * ).As such, each month's data is treated as an independent entity, leading to the possibility of each month following a different distribution.For example, using the MIMO technique, it was found that June 2022's export data was best represented by an inverse Weibull distribution, while August 2022's data closely fit a log-gamma distribution.These variations in distributions are a reflection of the unique characteristics of the ''observed forecasts,'' which originate from different k * values.
In applying the MIMO strategy with the K-NN method to durian export data, three critical parameters were focused upon to optimize the forecasting approach.The forecast horizon was set at twelve months to capture the typical yearly pattern.The parameter k needed to be determined to ensure the most accurate forecasts.However, our proposed method requires only the largest k, using all forecast values with different k values as the observed sample in the bootstrap process.Determining the optimal number of lags involved a trial-and-error approach, and for this study, 12 lags were employed as autoregressive variables, in line with the monthly data format.An alternative approach to selecting the number of lags involves analyzing the autocorrelation function (ACF) plot.
Additionally, a direct correlation between the variability of durian export values in different months and the width of the forecast intervals produced by the proposed method is observed.Fig. 8 displays a boxplot durian for each month, highlighting that months like April and May have high variability in durian export values within the training dataset.This variability is reflected in the wider forecast intervals for these months, as depicted in Fig. 7, indicating greater uncertainty in the predictions.In contrast, months such as November and January, which show lower variability in the training data, are associated with narrower forecast intervals, suggesting increased confidence in these predictions.
Finally, the adaptability of this method to other datasets is feasible, provided these datasets exhibit essential characteristics such as trend and seasonal components, which are pivotal for the successful implementation of the K-NN time series method.For effective application to other datasets, several adjustments are necessary.Firstly, datasets with a trend component require a detrending process, as detailed in step 2 of our methodology.Secondly, determining the optimal largest k value, denoted as k * , is vital and varies according to the dataset's unique attributes.Lastly, the number of lags incorporated in the K-NN analysis should align with the data's frequency.These considerations ensure that the proposed interval forecasting remains robust and applicable across various datasets with similar characteristics.

X. CONCLUSION
The integration of the K-NN time series with bootstrapping presents a promising approach for interval forecasting.In this method, the recursive and MIMO techniques of the K-NN time series are employed as inputs.The fundamental process involves determining the optimal distribution of K-NN forecasted values, which are derived from various k parameter values.Once this optimal distribution is identified, it is used to generate a bootstrap sample of forecasts.The mean, 2.5th percentile, and 97.5th percentile of this sample are then utilized as the point forecast, lower bound, and upper bound, respectively.In our case study, when compared to the SARIMA model, a standard in time-series statistics, both the recursive-and MIMO-based intervals demonstrate superior performance, exhibiting tighter intervals than those of SARIMA.For point forecasts, the MIMO-based forecast slightly outperforms the recursive-based method, and both significantly exceed the accuracy of SARIMA's forecast.
. For analysis purposes, the data was partitioned into training and testing sets.The testing set encompasses the latest 12 months, from June 2022 to May 2023.This means the training set spans 89 months, from January 2015 to May 2022.February 2016 recorded the lowest export value at approximately 98.67 million THB, while the peak value reached about 48,583.21 million THB in April 2023.The export values have a mean and standard deviation of roughly 4,814.87 million THB and 7,624.65 million THB, respectively.The notably high standard deviation, around 1.6 times the mean, underscores substantial fluctuations in durian export values.Fig.
, the moving average reveals a clear upward trend.The dataset also showcases a marked seasonal component, with April and May consistently showing the highest seasonal indices and January and February the lowest.

FIGURE 1 .
FIGURE 1. Monthly durian export from january 2015 to may 2023.

FIGURE 2 .
FIGURE 2. Trends and seasonal components of durian exports.

FIGURE 3 .
FIGURE 3. Detrended training data for durian export values.
Moreover, in certain months, multiple distributions fit equally well.For instance, in June 2022, all five distributions yield similar AIC values, while in September, the AIC values of the five distributions vary considerably.6.Generation of PointForecast: In this phase, a large bootstrap sample, sized at 1,000,000, is generated from the distribution identified as the best fit, based on the lowest Akaike's Information Criterion (AIC) among the 10 candidate distributions.This substantial sample size is selected to more accurately represent the entire population.The procedure is repeated for each month under study.Representative examples of the fitted distribution for the detrended forecast values of durian exports are illustrated in Figs.4-6.The average of the bootstrap sample now serves as the point estimate for the detrended forecast.The actual forecast values are obtained by multiplying the detrended point forecasts by the corresponding predicted trend from (6).Additionally, the median of the bootstrap sample can serve as an alternative to the average in determining detrended forecast values.7. Generation of Forecast Interval: The lower and upper bounds of the forecast interval are established by the 2.5th percentile (or 0.025 quantile) and the 97.5th percentile (or 0.975 quantile), respectively.The same multiplication method used to obtain actual forecast values is applied to both the lower and upper bounds.

FIGURE 4 .
FIGURE 4. Best-fitted distributions of detrended forecast values from the MIMO technique for durian exports each month from June 2022 to August 2022.

( 4 )
Root Mean Squared Error (RMSE): RMSE measures the average magnitude of the difference between the forecast points and actual values.It is calculated as follows:

FIGURE 5 .
FIGURE 5. Best-fitted distributions of detrended forecast values from the MIMO technique for durian exports each month from September 2022 to January 2023.

FIGURE 6 .
FIGURE 6. Best-fitted distributions of detrended forecast values from the MIMO technique for durian exports each month from February 2023 to May 2023.

FIGURE 7 .
FIGURE 7.Comparison of actual export values with forecasted values from three different techniques: the proposed method using input, the proposed method using recursive input, and the traditional SARIMA approach for the test data.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 1 .
Top 5 distributions for detrended forecast values from the recursive technique.
durian export values for each month.The narrowest interval was observed in January 2023, while the widest interval occurred in May 2023.

TABLE 1 .
(Continued.) Top 5 distributions for detrended forecast values from the recursive technique.
B. PROPOSED INTERVAL USING MIMO AS INPUT

TABLE 2 .
Top 5 distributions for detrended forecast values from the MIMO technique.

TABLE 2 .
(Continued.) Top 5 distributions for detrended forecast values from the MIMO technique.

TABLE 3 .
Point and interval forecasts of durian export values from the proposed method using input from the recursive technique.

TABLE 4 .
Point and interval forecasts of durian export values from the proposed method using input from the mimo technique.

TABLE 5 .
Point and interval forecasts of durian export values from SARIMA.
FIGURE 8. Boxplots of durian exports in each month.