Food Demand Prediction Using the Nonlinear Autoregressive Exogenous Neural Network

Food demand prediction is a significant issue for both business process improvement and sustainable development. Data science methods, including artificial intelligence methods, are often used for this purpose. The aim of this research is to develop models for food demand prediction based on a nonlinear autoregressive exogenous neural network. The research focuses on processed foods, such as bread or butter. The architectures of the developed models differed in the number of hidden layers and the number of neurons in the hidden layers, as well as with different sizes of the delay line, were tested for a given product. The results of the research show that depending on the type of product, the prediction performance slightly differed. The results of the R2 measure ranged from 96,2399 to 99,6477, depending on particular products. The proposed models can be used in a company’s intelligent management system for the rational control of inventories and food production. This can also lead to a reduction in food waste.


I. INTRODUCTION
Food demand prediction is a critical issue for both businesses and sustainable development. Business aspects are mainly related to improving manufacturing, logistics, and supply chain processes, inventory cost reduction, and customer satisfaction. Sustainable development issues are mainly related to food loss and waste, and they have been drawing much attention in recent years. The Food and Agriculture Organization of the United Nations estimates that up to one-third of food produced globally, amounting to 1,3 billion tons of food per year, is lost and wasted [1]. Food loss occurs along the food supply chain from harvest to retail, and food waste occurs at retail and consumption levels [1]. In the whole food chain, beginning from direct production on farms through processing, ending with retail and wholesale, consumer food waste is the largest. It is estimated at 40-60%, while retailer waste is evaluated at the level of 10% [2], [3], [4]. The prevention of food waste is one of the most important issues worldwide, especially in the context of sustainable development. The amount of food wasted is not geographically The associate editor coordinating the review of this manuscript and approving it for publication was Hiu Yung Wong . specific but correlates with the country's development [5]. Food waste is responsible for economic, environmental, and social problems in many countries, and strategies for its reduction are crucial [2], [6], [7]. The United Nations, in its Agenda for Sustainable Development, included a target number of 12.3 considering food waste reduction of 50% by the year 2030 [8]. The European Commission is committed to fighting food waste and has included the target 12.3 in its European Circular Economic Action Plan [9].
Wasted food is defined as food that is unconsumed or discarded by the retailer because of its colour or appearance [10]. Some products delivered to the store are never sold because of the expiration date on the label, time spent on the shelf, or damage. Store managers expect a certain loss rate and monitor their levels. Food whose expiration date is close is distributed as an ingredient of ready-to-eat products, discounted, or donated to food donation organisations. However, little is known about the implementation of food waste reduction activities and policies within stores. Managers indicate customer demands and legal and logistics issues as barriers to reducing food waste [10]. Such food waste levels and cited limitations indicate that modern food systems should be developed to determine and prevent VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ food overproduction, over-abundance, and waste [4]. One of the functions of these systems is food demand prediction. There are two types of prediction methods: qualitative (based on the opinions of process specialists predicting demand) and quantitative (based on mathematical models, which use historical data of demand volume). However, quantitative methods, including data science methods, present better prediction performance [11]. Predictive data science methods are very helpful for the demand prediction process. These methods automatically analyse the relationships and trends in the data and make future predictions based on current observations [12]. Traditional data science forecasting models, such as multiple regression, exponential smoothing, the Holt-Winters model (also called seasonal exponential smoothing), ARIMA, supervised regression and classification models, random forest, gradient boosting, and stochastic optimisation are often applied in food demand prediction [13], [14], [15]. However, several limitations of traditional methods should be mentioned: e.g., they include a relatively short ''life cycle'', they do not have the ability to learn, and the food market is very turbulent, as a result of which historical data becomes less valuable for forecasts [16]. In addition, traditional methods do not have the ability to generalise. The prediction is appropriate only for a given time period. When the characteristics of the food market change, new models must be developed. To overcome the limitations of traditional models, a machine-learning approach is applied, including deep learning models.
The aim of this study was to develop models for food demand prediction based on the nonlinear autoregressive exogenous neural network (NARXNN). We focused on processed food in the research (e.g. bread, butter). To the best of our knowledge, such a model has not yet been developed; therefore, it is the main contribution of the research presented in this paper. NARXNN is a special type of recurrent neural network (RNN), that is, a many-to-one RNN, usually provides better predictions than the traditional RNN (i.e. one-to-one RNN) because it uses the additional information contained in the series of interests that have already been output before a given period [17]. This research put the main focuses on data science application issues related to food demand predictions. Supply chain management theory is treated as a supplementary issue.
The rest of the paper is organised as follows: The next section presents the state of art in the considered field. The materials and methods are presented. The last part presents the results of the experiments, conclusions, and future work.

A. PREMISES FOR FOOD DEMAND PREDICTION
Food demand prediction is critical for both businesses (in terms of optimising their strategies and processes) and societies (in terms of economic, environmental, and social policies). Moreover, demand prediction is an essential tool for increasing the speed of the decision-making process and lowering the risk observed in it [18]. For organisations operating in the foodservice industry, demand estimation impacts production effectiveness and resource planning [19]. As shown in [20], the inaccuracy of sales forecasts in the retail food industry is the main cause of waste products and stockouts. Many scholars have investigated the problem of managing the inventory of perishable products [21], especially considering the growing number of perishables in retailing that are disposed of due to spoilage, which is reported to be approximately 15% [22]. According to [23], more accurate forecasts in the fresh food sector result in a reduction in both losses from products that reach their expiration dates and the costs of transportation and storage of refrigerated products. Food sales prediction is also extremely important in the case of products facing seasonal changes in demand, which may depend on many hidden contexts that are not always easily recognised [15]. At the same time, food demand prediction relates to the effectiveness of supply chain management challenged by several shifts, such as the growing urbanisation level, accompanied by an increasing consumer demand for organic products, but also the growth of the e-commerce distribution channel [24]. All these aspects are considered by many retail companies that decided to concern waste reduction not only in operational targets but also in performance indicators [25].
Predicting food demand is also crucial from a societal and environmental standpoint. The world population is estimated to reach 9.7 billion people in 2050 [26]. Feeding the rising population requires developing sustainable agricultural, economic, and conservation policies which would respect the environment [27] and manage food waste [28]. According to [29], prevention is the most favourable option in terms of food waste management. This option includes avoiding surplus food generation through food production and consumption. It is postulated by some currently emerging movements against food waste to take responsibility for this problem by food chain actors, especially retailers [30]. More accurate food demand forecast methods could significantly contribute to this issue from both perspectives -the economic and managerial challenges observed by retailers, and the social and environmental impact of food waste imposition. As reported by [31], data on the ability of food waste prevention measures are still scarce, which is the gap we want to address.

B. METHODS FOR FOOD DEMAND PREDICTION
Scholars distinguish different forecast models, including both linear and nonlinear methods, for quantitative demand forecasting [32]. They also share a similar understanding that none of them is universal enough to be used in all situations and circumstances [33], [34]. Although food demand has traditionally been perceived as remaining stable on many occasions [35] and influenced by factors such as seasonality and perishability, we may witness an increasing demand volatility caused by a variety of factors, such as changing lifestyle choices [36], [37], blooming foodie culture [38], and social media influencers [39]. All these issues make predicting food demand more challenging. They open up research streams devoted to the comparisons and evaluations of traditional models [33], as well as the development of machine learning models.
Among the existing traditional models used for food demand forecasting, we can distinguish multiple regression, exponential smoothing, and the Holt-Winters model (also called seasonal exponential smoothing), ARIMA, supervised regression and classification models, random forest, gradient boosting, and stochastic optimisation.
The Holt-Winters model has been proven to be effective in the case of short-lifecycle dairy products [11]- [13], but also the ARIMA model has been effectively applied and tested, proving that it could be utilised to model and forecast the future demand for the purposes of manufacturing this type of food [40]. When the two models were compared for dairy products with a short lifecycle, the HW model obtained better results (higher prediction accuracy) [33].
Owing to the limitations of single forecasting models, researchers have attempted to use a mixed approach to obtain better results. As shown in [11], a mixed method combining three different models (4-week moving average, exponential smoothing, and ARIMA) is efficient in exploring demand prediction in the food industry. To provide a more advanced multi-region and multi-commodity analysis of food consumption in the long term, a partial equilibrium model was proposed [41] and solved using the dynamic recursive technique [39].
The need to provide more accurate forecasts encouraged researchers to develop alternative approaches, such as a judgmental-based approach where traditional mathematical forecasts were considered as a basis and developed further with the structured knowledge of the experts, which enables the adjustment of the initial forecasts, but also provides better initial data cleaning and outlier identification [14]. Another approach is the ensemble learning approach, which uses the dynamic integration of classifiers reflecting seasonal changes and fluctuations in consumer demands, which has been proven to perform better than the currently used baseline [15].
Researchers share the view that, in general, machine learning models result in better demand predictability than traditional models [23], [24]. The main advantages mentioned in previous research are better predictions and more flexibility, which is relevant when the estimation models are built not only on the basis of historical sales series but also when new variables are added, which increases data volume and analysis complexity [23].
Several researchers have investigated the advantages of machine learning models over traditional models in the food industry context. This advantage was discussed by Tsoumakas [43], who proved that machine learning techniques for sales prediction are reliable and efficient for accurate short-term forecasting, which enables inventory level minimisation, expired product reduction, and lost sales drop. Among the main benefits, the reduction of human bias, a higher degree of forecast precision, and flexibility to change variables were observed [23].
One of the methods used for food demand forecasting is the artificial neural network (ANN) prediction model developed by Agrawal and Schorling [43]. The efficiency of the model was tested for a perishable and refrigerated food convenience store chain. The results proved that although the ANN may suffer from interpretability problems, it is more accurate than the traditional estimation method (multinomial logit model). The ANN model is aimed at using previous data and a predefined demand estimate [19].
Another method used in demand forecasting is the support vector machine (SVM) [44]. In combination with other techniques, it reduced the losses from unsold dairy products after their expiration date.
The heterogeneous mixture learning technology was developed by Ochiai [45] based on algorithms useful in demand estimation performed for a short-term food grocery store chain and presented a significant reduction in unsold items.
A model that aggregates machine learning for demand forecasting and price-optimisation techniques was presented by Fujimaki et al. [46] and further tested in beverage retailing. It has shown greater reliability for decision-makers, and as a result, a revenue increase of 16%.
Deep learning models have been tested and confirmed for forecasting crude oil prices [47], photovoltaic power [48], and on-demand ride services [49]. Research results related to the food industry mention deep learning methods (convolutional neural network (CNN)-based food image recognition algorithm) used to derive food information (food type and portion size) from food images [50] or to propose an assistive calorie measurement system [51]. In [52], a time-dependent food distribution model and a weight optimisation algorithm were proposed to adapt the user's data to their eating habits. Deep learning has also been imposed in the waste sorting process to automate some waste-handling tasks [53].

C. NARXNN APPLICATIONS
Considering the research related to NARXNN, there are many areas of implementation of this type of neural network. For example, it has been used for the prediction of water consumption [54], traffic condition prediction and monitoring of motorways [55], and energy forecasts [57].
DL applications in agriculture employing the NARXNN model were analysed in [57] and [58]. Leaf area index (LAI) was estimated. In [59], a NARXNN model was applied to predict the LAI of rubber.
To estimate the time series, the LAI used a NARX model called the NARXNN. The NARXNN proved to be a promising tool for time-series LAI estimation [60]. The authors of [61] presented weather prediction using three models: an RNN-based model named NARXnet, a case-based reasoning model (CBR), and a segmented CBR model. The structure of the input of the NARXnet meant that it could not only learn from historical data but also from previous predictions. NARXnet had an accuracy of 93.95%, outperforming the VOLUME 9, 2021 other two models significantly. Soil moisture (SM) estimation successfully used a simplified NARXNN model, whose input was only the current features and the prediction it had given in the last time step [62].
In [63], a NARXNN model was used to predict soil moisture on an hourly basis. The predictions were compared with ground measurements. The experiments showed that the model is a promising tool for this task.
The study [65] predicted wheat yields using historical wheat data and related plantation areas, rainfall, and temperature. The spatial NN model yielded better results in terms of forecasting yield than the temporal nonlinear autoregressive neural network and the NARXNN model. The same study with the same result was also conducted in [65].
In [66], single-layer NARXNN models were used, which were designed to forecast sesame yield in the dry zone of Sri Lanka. The NARXNN was used as a useful tool for addressing future climate change by forecasting weather variables in the study area [67].
Two AI-based prediction models of a single variate MLFNN and multivariate NARXNN were developed to provide 1-to 7-day ahead offer price forecasts for a certain type of strawberry in Canada. The series-parallel NARXNN architecture with one hidden layer and ten neurons was used for the multivariate model. The results demonstrated the usefulness of including California's commodity yield variable as an additional predictor, which resulted in up to 39% improvement in future price forecasts obtained by the NARXNN multivariate model compared to the univariate MLFNN model [68].
Although we may find works where the demand forecasting model using the deep learning approach was used for the supply chain [69], to the best of our knowledge, it has not been tested in the food industry. Therefore, this is the first study to combine NARX neural networks with food demand forecasting.

A. DATA
The data supplying the decision-making process were collected by the ProLogistica Soft company, which deals with modelling inventory management and demand forecasting systems for producers and distributors. They represented the volume of demand for food products within the chain of stores located in the Lower Silesian Voivodeship (Poland). The basic statistical features (minimum and maximum values, daily averages, and deviations) of the food demand data for each product are presented in Table 1. The company has not agreed to publish product names; therefore, they are marked with numerical identifiers. The samples were recorded over approximately 2 years (04.2017-06.2019) or 3.5 years (01.2016-06.2019) with irregular time steps. The average daily demand was characterised by different variabilities, expressed by the standard deviation. Depending on the product, it ranges from 42.0 (product id=1272) to 103.4 (product id=492) (Tab. 1).
The demand data profile of most products was characterised by an exponential distribution, which was confirmed by the Kolmogorov-Smirnov test. Only in the case of the product ID=492, at the significance level alpha=0.01, the obtained p<alpha was the basis for rejecting hypothesis 0, assuming that the empirical and theoretical distributions are equal (Tab. 1). The daily variability of the demand data under investigation is shown in Fig. 1. The input data were first extracted and then aggregated (for each product separately) within a day. Subsequently, for the use of the neural network approach, normalisation was required to train and test the prediction model. Finally, the output must be de-normalized to reflect the volume of demand.

B. PREDICTION SYSTEM
Models for describing a dynamic stochastic process in a discrete-time domain take various forms of difference equations. An interesting concept in time series forecasting is a hybrid model, that is, a nonlinear autoregressive exogenous model in combination with neural networks (NARXNN). It is a recurrent dynamic network owing to the feedback loop between the output and input. The value of the dependent output (y) is estimated based on the past values of y and past values of the independent (exogenous) input (x). The NARXNN model equation is as follows: y (n) = f (y (n − 1) , y (n − 2) , . . . , y (n − d) ; x (n − 1) , x(n − 2), . . . , x (n − d)). In this study, it was assumed that x(n) denotes the discrete-time sequence, y(n) denotes the demand values in these time sequences, and d is the size of the tapped delay lines (TDLs). In this way, the ability of the NARXNN model to predict the value of the demand time series based on its past values was ensured. Past values were supplied from TDLs in which the past sequence values of x(n) and y(n) were stored. A three-layered network was used, consisting of a sigmoidal activation function in the first and second hidden layers and a linear activation function in the third output layer. Learning was performed based on the Levenberg-Marquardt backpropagation algorithm.

C. EXPERIMENTS
The research consisted of developing an accurate NARXNN demand forecasting model for each product analysed in this study. Accordingly, model architectures differing in the number of hidden layers and number of neurons in the hidden layers, as well as with different sizes of the delay line, were tested for a given product. Neural network topologies were first trained and then simulated for demand prediction. Therefore, two separate datasets (time series) were necessary: one for training and the other for simulation (prediction testing). The training dataset was a time series with demand values (in-sample period) that were shown to the neural network during training. The testing dataset, which was not previously shown to the neural network, represented only the time series for multi-step prediction testing (out-of-sample period). The model simulation for each product was carried out in both the short and long prediction periods defined by the ratio. The ratio was calculated as the percentage of the number of testing samples to the number of complete data samples. Two of its values were considered: 4% as the low ratio (short prediction period) and 10% as the high ratio (long prediction period). For example, the size of the complete demand dataset for the product ID = 492 was 339 samples, so at a low ratio, the last 14 samples constituted the out-of-sample period. Thus, the ratio determines the number of steps ahead in the multi-step forecast.

D. PERFORMANCE PREDICTION MEASURES
The best topology of the demand prediction model (at low and high ratio) for each product was determined based on the minimum value of the mean absolute error (MAE), mean absolute percentage error (MAPE), and root-mean-square error (RMSE), and then on the basis of the highest value of the determination coefficient (R-squared). R 2 measures the percentage of variation in the response variable Y , explained by the explanatory variable X . MAE, MAPE, RMSE, and R-squared are defined as follows: where n is the number of out-of-samples, y i andŷ i are the observed (actual) values and fitted values of the dependent variable Y for the ith case, respectively, andȳ is the arithmetic mean of Y .

A. TRAINING PROCESS
The effects of training various nonlinear autoregressive exogenous neural network (NARXNN) models on the in-sample period dataset with low and high ratios are presented in Tables 2 and 3, respectively. The number of hidden layers, number of neurons in the hidden layers, and different sizes of the delay line were tested to find the best network architecture for each product data demand. These tables summarise the best results obtained after repeating the training process of each NARXNN architecture 20 times. The number of epochs needed to achieve them ranged from 6 to 177, with a low ratio of 4% (Tab. 2), and from 8 to 181, with a high ratio of 10% (Tab. 3). Thus, fewer samples were used in the training process (or otherwise, the more in the prediction process), the more epochs were needed for the tested model to obtain satisfactory results of data generalisation. Not only did the number of samples influence the learning process length, but also the model architecture complexity determined by the number of neurons or delayed inputs. It involved learning experiments conducted with the use of an in-sample dataset in both low and high ratios. For example, regarding the number of neurons, training with 4% or 10% dataset of the model for product ID=1347 with 20 neurons and 4 delays required 11 times (Tab. 2) or 21 times (Tab. 3) Fewer epochs than the model with 10 neurons. A decrease in learning time after an increase in the number of neurons was observed in most of the trained models (excluded for the product ID=1272) at a 4% ratio (Tab. 2). For example, regarding the number of delays, training using 90% dataset of the model for the product ID=492 with 20 neurons and 8 delays required 8 times fewer epochs than the corresponding model with four delays (Tab. 3).
However, it should be emphasised that too few epochs usually did not succeed in providing good learning outcomes. This is because only in one case, i.e. for the product 1272 with the use of 90% in-sample dataset, the minimum number of epochs resulted in the achievement of the lowest MSE error value (1033) (Tab. 2). Overall, depending upon the product, the lowest values of the minimum MSE error were obtained from 575 (ID=1325) to 7785 (ID=492) at a 4% ratio (Tab. 2) and from 330 (ID=1325) to 6925 (ID=492) at a 10% ratio (Tab. 3). On this basis, it should be noted that in both ratios, there is a clear difference in the achieved MSE error values between the product ID of 492 and the other ones.

B. PREDICTION PERFORMANCE
The results of the prediction performance of the demand volume for the product under investigation were measured using RMSE, MAE, MAPE, and R 2 . The NARXNN models trained at low and high ratios are also included in Tables 2 and 3, respectively. Based on their analysis, it can be concluded that with both ratios, the lowest values of the minimum error MSE obtained in the learning process did not appear in equally good values of the above-mentioned indicators obtained in the testing (prediction) process for all food products without ID=492. However, during the training, as well as during the forecasting, the influence of the tested neural network structures on the demand forecast accuracy of all analysed products in both the short and long periods was observed.
In the short period of the demand volume prediction, both according to MAE, MAPE, and RMSE, the best results were achieved by the model for product ID=1272. This optimal model consisted of 10 neurons (7 in the first and 3 in the second hidden layer) and 6 delay inputs. For the other products, these were models with 20 neurons (14 in the first and 6 in the second hidden layer) and 4 delays (IDs of 492 and 1325) or 8 delays (ID of 1347). All optimal models provided forecasts with the lowest MAE, MAPE, and RMSE values ranging from 24.7 (ID=1325) to 138.4 (ID=492), 0.105% (ID=1347) to 0.436% (ID=492) and from 25.3 (ID=1325) to 215.4 (ID=492), respectively, with the best simultaneous fit expressed by the R 2 determination coefficient, ranging from 96.2% (ID=492) to 99.6% (ID=1347) (Tab. 2). Therefore, the deviations of the estimated demand values by the network from the actual values are relatively small, and the generalisation capability is quite satisfactory. For each product, the error values between MAE and RMSE did not differ significantly, which indicated that errors with very large values did not occur in the forecast period. It can also be confirmed by analysis (for all products) the plots presented in Fig. 2 regarding the actual (X ) and predicted values of demand (Y ), as well as the error plots (E=X-Y). For the product ID=1347, the NARXNN model trained on the past demand data executed an almost flawless forecast (E<0.3%) throughout the low outof-sample period (Fig. 2d). For the other products, the forecasts were equally accurate, but already in a slightly limited time interval, that is, without the time interval corresponding to the last 3 samples (Figs. 2a, 2b, and 2c). However, during this period, the most marked increase in the E value was only observed for product ID=492, but it did not exceed 1.6% (Fig. 2a). Finally, the error values in the last 3 forecast moments for IDs of 492, 1272, and 1325 contributed to the increase in both the MAE, MAPE, and RMSE values. The values of MAE and RMSE in the demand prediction for the product ID=492, which is several times higher than the other ones, were additionally affected by the wide range of values as well as the large variation in demand data for this product. Nevertheless, for each product, a fairly good fit of the model data to the actual demand data was achieved. This was best reflected by the linear regression shown in Fig. 3.
Very low values of MAE, MAPE, and RMSE allowed for an almost perfect fit (R 2 > 0.989) of the model to the independent variable for the product ID=1272 (Fig. 3b) and ID=1347 (Fig. 3d). Larger deviations of the forecast from the ideal realisation of the explained variable (Y =X ) were observed for the other two products, that is, ID=492 and ID=1325.
The prediction performance in MAE, MAPE, RMSE, and R 2 obtained at a high ratio can be considered satisfactory (Tab. 3). Considering the first 3 indicators, the best results were achieved with the models containing 10 neurons and 6 (ID=492) or 8 (ID=1347) delay inputs. Compared to these models, the most effective structure of the neural network for another product (ID=1272 and ID=1325) demand predictions required twice as many total neurons, with sufficient 6 delays.
MAE, MAPE and RMSE values for these models were ranged from 18.5 (ID=1325) to 133.4 (ID=492), 0.248% (ID=1272) to 0.431% (ID=492) and from 24.5 (ID=1325) to 169.5 (ID=492), respectively, with the degree of fit expressed by R 2 ranging from 97.4% (ID=492) to 99.1% (ID=1347) (Tab. 3). Thus, there were slight differences between MAE and RMSE, which indicates that no large-scale errors occurred at the high ratio prediction. Nevertheless, in the long term, these differences were higher than in the shorter period for most products (IDs of 1272 1325 and 1347) of the prediction demand.
The plots of the observed and forecasted demand data, as well as the plots of forecast errors for all the researched products at a high ratio, are presented in Fig. 4. They showed that all of the developed NARX neural models underestimated or overestimated the demand volumes in most of the analysed time moments. However, in the worst case, it was, depending on the product, with an error E of no more than 0.7% (ID=1325) to 1.3% (ID=492). Errors close to the maximum values of E were observed only at the end of the forecast period for ID=492 (Fig. 4a), ID=1325 (Fig. 4c), and ID=1347 (Fig. 4d), as well as in the initial forecast period for ID=1272 (Fig. 4b). However, in the latter case, an almost flawless prediction was observed in the middle period of March 2019 to June 2019.
In addition to the product demand forecast ID=492, higher values of the maximum errors E in the long than in the short VOLUME 9, 2021 FIGURE 2. Demand volume prediction with error prediction at the low ratio obtained by the best NARX neural network model architectures for four food products: (a) products id=492, (b) products id=1272, (c) product id=1325, and (d) product id=1347.
prediction interval were recorded. Nevertheless, extending the period from low to high did not result in a drastic deterioration in the overall prediction performance for all products.
There was no regular relationship between the prediction effects at high and low ratios. Compared to the low ratio, at the high ratio, on the one hand, there was even a decrease FIGURE 3. Comparison between actual values and NARX neural network model predictions at the low ratio for four food products: (a) product id=492, (b) product id=1272, (c) product id=1325, and (d) product id=1347.
in MAE and RMSE by 3.6% and 21.3%, respectively, for the product ID=492 and by 25.1% and 3.2%, respectively, for the product ID=1325. On the other hand, an increase in these errors was recorded by 49.8% and 66%, respectively, for product ID 1272 and by 213% and 277%, respectively, for ID 1347 (Tables 2 and 3).
The plots of the observed demand volumes versus the NARXNN simulated values for all samples at high intervals are shown in Fig. 5. Comparing them with those obtained at a low ratio (Fig. 4), it can be concluded that in all cases, good fits of the model data to the actual data were also achieved. Similar to the low ratio, the high ratio with the best effect of the correlation coefficients (R 2 >0,988) for product IDs of 1272 (Fig. 5b) and 1347 (Fig. 5d). Thus, contrary to MAE, MAPE, and RMSE, the extension of the forecast period of these two products did not worsen the R 2 score. For the other two products, that is, ID=492 (Fig. 5a) and ID=1325 (Fig. 5c), an improvement in the NARXNN realisation of the demand variable was noted, whereby R 2 increased to a slight extent, not exceeding 0.012% (Tab. 3). Moreover, for these two products, the noticeable deviations of some of the predicted samples from the determined regression line did not decrease the prediction accuracy assessed by MAE, MAPE, and RMSE.
Taking into account the results of all measures, the developed models were provided with the most accurate forecasts of demand for products with IDs of 1347 and 1272, and then for products with IDs of 1325 and 492, regardless of the number of steps ahead tested. It can be assumed that this was caused by a smaller number of samples for product IDs of 1325 (n=296) and 492 (n=339) (Tab. 1) than for others (n>450). Moreover, in the case of the last one, the overall VOLUME 9, 2021 FIGURE 4. Demand volume prediction with error prediction at the high ratio obtained by the best NARX neural network model architectures for four food products: (a) product id=492, (b) product id=1272, (c) product id=1325, and (d) product id=1347. data samples that showed the highest standard deviation of the mean and failed the one-sample Kolmogorov-Smirnov test to establish a normal distribution (Tab. 1) resulted in inferior forecast accuracy. The factors mentioned above negatively affected the data generalisation in the training process.
Overall, we obtained satisfactory predictive results for all the NARXNN models assessed. However, depending FIGURE 5. Comparison between actual values and NARX neural network model predictions at the high ratio for four food products: (a) product id=492, (b) product id=1272, (c) product id=1325, and (d) product id=1347.
on the type of product, the prediction efficiency of the RMSE, MAE, MAPE, and R 2 indices differed slightly. Nevertheless, comparing them with the results of other studies, NARXNN has an advantage over other models. In one study on sales forecasting in the fashion industry using various shallow techniques [31], such as decision trees (DT), random forest (RF), support vector regression (SVR), artificial neural networks (ANN), linear regression (LR), and deep learning (DL) approaches, the results of the R 2 coefficient ranging from 0.568 (LR) to 0.756 (RF) with MAPE ranging from 0.451 (LR) to 0.345 (RF). On the other hand, in predicting the time series of footwear sales during the outof-sample period [30], using the ETS and ARIMA models obtained minimum MAPE error values of 12.5% and 14%, respectively.
By relying on the results obtained in this study, it can be concluded that the developed NARX models based on artificial intelligence can be used to create highly accurate food demand forecasts. It is crucial to manage short shelf-life products such as fresh food. Consequently, food retailers can effectively reduce food waste [70]. Therefore, the developed models can have a positive influence on inventory management. It can be estimated that the number of days with a higher prediction than the actual demand is almost equal to the number of days with a lower prediction than the actual demand. Therefore, fluctuations in food waste and profits can be reduced. Consequently, it can have a positive impact on inventory management in the supply chain. In addition, it may have a positive impact on research on sustainable development. However, the lack of supply volume data does not allow for the assessment of the influence of food demand prediction on food waste reduction.

V. CONCLUSION
The hybrid concept combining nonlinear autoregressive exogenous with neural network (NARXNN) is an effective technique for building prediction models of time series. This article presents a novel application of NARXNN to accurately forecast the demand for selected foods. The proposed approach can be practically applied as a component of a company's intelligent management system [71]. It can support the rational control of food inventory and production while reducing waste and costs in the supply chain. The main limitation of the developed models is the lack of the possibility of analysing small datasets (below 100 rows of data).
Future research can focus on using stacked autoencoder pretraining in the developed models [72] to process small datasets. In addition, research using demand and supply data aggregated at the shop level should allow for verification of the developed models with respect to the prediction of demand in other areas (such as drinks or fuel).