Research on Data-Driven Fresh Produce Joint Distribution Network Optimization Under Distribution Center Sharing

To address the problem of low efficiency and ineffective utilization of resources in the distribution of fresh produce at the end of the city, and taking into account the seasonal characteristics of the logistics demand of fresh produce. An innovative rolling adjustment framework model based on data-driven for optimizing fresh produce joint distribution network was proposed, which follows seasonal changes. The cycle of the adjustment framework was divided according to seasonal changes, with each cycle including four steps: fresh produce logistics demand prediction, fresh produce joint distribution network optimization, data collection, and parameters adjustment of the prediction model, further to achieve data-driven optimization of joint distribution network for fresh produce. A catastrophe adaptive genetic algorithm with variable neighborhood search (CAGA-VNS) is developed to solve the fresh produce joint distribution network optimization model. Finally, several numerical experiments are conducted to validate the model and algorithm. The results demonstrate that: 1) the rolling adjustment framework model can provide effective fresh produce distribution network optimization decisions when the fresh produce demand changes according to the season changes. 2) The CAGA-VNS algorithm can be more stable with the lowest difference percentage being 9.16%. 3) distribution center sharing strategy can effectively improve utilization of resources, reduce the total distribution cost of 19.46% and save travel distance of 14.72%.


I. INTRODUCTION
With the development of big data science, a new change has taken place in the traditional management decision paradigm, and a new management decision paradigm has emergeddata-driven management decision paradigm [1].In the past decades companies have invested heavily in information technologies and digitalization.Increased computing power and the explosion of data have changed the way organisations capture data, analyse information, and make decisions.These changes provide opportunities for the operations management community to develop new models for data driven decision-making, and might become more and more applicable to logistics and supply chain management.However, The associate editor coordinating the review of this manuscript and approving it for publication was Liandong Zhu.
the research on how to integrate the big data analysis and OR models to drive business performance improvements is still in its infancy [2].
Data driven demand prediction of fresh produce and optimization of distribution based on demand prediction of fresh produce has been proven to of great value to fresh produce distribution [3].Such optimization problem of fresh produce distribution decisions is often theoretically optimal due to assumptions that demand is deterministic or follows certain known statistical distributions.However, in real-world fresh produce distribution, the distribution process can be divided into multiple stages through corresponding information collection and fresh produce demand prediction can be realized with the help of data science during the fresh produce distribution.Hence, issues concerning how to make fresh produce joint network optimization decisions under restrained distribution resource and periodically acquired fresh produce demand and its corresponding features information should be addressed.
This paper is specifically motivated by making fresh produce distribution network optimization decisions in real-world fresh produce distribution.The logistics demand for fresh produce is booming sharply, however, fresh produce logistics distribution network in last mile has become one of the bottlenecks in the development of fresh produce industry, which cannot provide corresponding services for the existing demand.There are two obvious reasons existing: on one hand, fresh produce has short shelf lives and the change characteristics of maturity and seasonality, which is different from other produce.Enterprises have had to adapt quickly and respond to the change of demand given their special seasonality; On the other hand, the characteristics of urban last-miles' customers' dispersion and small batch make the logistics distribution of agricultural products more difficult, with repeated transportation, roundabout transportation and other unreasonable transportation situations existing.
Joint distribution is an effective mean of solving this problem.Joint distribution can improve the distribution efficiency of fresh produce by integrating distribution resources.Accordingly, higher requirements are put forward to the optimization of urban last-mile fresh produce distribution network.In order to further improve the distribution efficiency of fresh produce and respond to the constant adjustment of fresh produce demand, how to make rural fresh produce quickly move to the dining table with the help of accurate and fast logistics distribution network is a problem that we urgently need to solve at present [4].
To address the above challenges, the main contribution of our study is that an innovative data-driven urban last-mile fresh produce joint distribution network is designed.Considering the seasonality of fresh produce logistics demand, from the perspective of data-driven, this paper innovatively formulates a rolling adjustment framework model of fresh produce joint distribution network optimization that follows the seasonal change.The rolling adjustment framework model in each decision-making cycle includes the following parts: fresh produce logistics demand prediction, fresh produce joint distribution network optimization, data collection and processing, and parameters adjustment of the prediction model.More specifically for the above rolling adjustment framework model, in each decision-making cycle, the agricultural demand prediction model is fitted.Then a mixed integer programming is applied to solve a fresh produce joint distribution network optimization problem (FPJDOP) based on agricultural demand prediction.While the optimal FPJ-DOP decisions derived, a rolling-horizon manner is taken and implemented only the optimal solution for the immediately next cycle, the uncertain parameters of the prediction model are also updated to obtain the optimal demand for the next cycle.For the part of constructing the optimization model of the joint distribution network, based on the strategy of distribution center sharing, the sum of the total cost was minimized by optimizing the minimum number of distribution centers and the shortest delivery distance of vehicles.To solve the optimization model, this paper proposed a catastrophe adaptive genetic algorithm with variable neighborhood search.
The rest of this paper is organized as follows.Section II reviews the literature and fills the research gap.Section III introduces the data-driven fresh produce joint distribution network optimization rolling adjustment framework model and presents the mathematical formulation of fresh produce joint distribution network.Section IV shows the solving approach of the formulation in Section III and a catastrophe adaptive genetic algorithm with variable neighborhood search is developed.Section V provides the computational results and analyses the performance of the solving approach.Finally, Section VI gives a summary of our major contributions and delineates the future research directions.

II. LITERATURE REVIEW
Given that this work intends to explicitly focus on data-driven fresh produce joint distribution network optimization under distribution center sharing, the relevant literature is categorized into three main streams, which include fresh produce distribution network optimization, joint distribution network optimization and data driven optimization.Afterwards, based on the existing literature and analysing the research gaps, the main contributions of this study are presented.

A. FRESH PRODUCE DISTRIBUTION NETWORK OPTIMIZATION
Fresh produce distribution logistics network has been proven of great value to fresh produce distribution [5], [6], [7], [8].Motivated by reducing carbon emission and promoting the development of fresh e-commerce.Chen et al. [9]studied the fresh produce multi-compartment vehicle routing problem with time window (MCVRPTW) and proposed a variable neighborhood search (VNS) approach with the key steps of local search and shaking to solve the problem.Han et al. [10] explored the impact of the agricultural Internet of Things (IOT) and the special requirements of each distribution channel on joint planning of production and distribution of fresh produce.Ma et al. [11]discussed a perishable food location-routing problem with conflict and coordination (PFLRP-CC) between a fresh food seller and a hired transportation company and developed a hybrid algorithm to solve the problem.Zhang et al. [12]established a multi-objective optimization model, including minimizing distribution cost, freshness of fresh produce, carbon emissions.The main target method and fruit fly algorithm were used to solve it.Wei and Wang [13]designed a fuzzy C means clustering-improved simulated annealing (FCM-ISA) algorithm to solve continuous location problem of fresh produce distribution centers.Bortolini et al. [14]extended the problem to multi-modal distribution networks of fresh produce to minimize operating cost, carbon footprint and delivery time goals.A unique tool called Food Distribution Planner is applied to deal with the distribution of fresh fruits and vegetables based on a set of Italian producers to several European retailers.Jaigirdar et al. [15] investigated a multi-objective optimization model of distribution network for perishable goods.The CPLEX solver and a weighted sum method were considered to solve it.Golestani et al. [16] formulated a mixed-integer linear programming (MINLP) for hub location problem, considering multi-item and multi temperature joint distribution for perishable produce.The GAMS software and ε-Constraint method were adopted to solved this bi-objective model.
It is important to note that the existing publications only address the perishable nature and short shelf lives of fresh produce and do not address importance the seasonality nature on fresh produce distribution.For example, in the north of China, fresh produce is harvested once a year, while fresh produce is harvested twice a year or even more in the south of China.If the demand is predicted not accurately, the fresh produce distribution network configuration will not be designed reasonably.By taking the gaps into consideration, considering the seasonality of fresh produce, this paper establishes innovatively a rolling adjustment framework model which optimize the fresh produce joint distribution network from the perspective of data-driven.

B. JOINT DISTRIBUTION NETWORK OPTIMIZATION
Sheng et al. [17] focused on rural electronic commerce logistics distribution, introduced a joint distribution strategy sharing customers and vehicles to address the simultaneous pick-up and delivery.Wang et al. [18] formulated a mathematical model of multi-depot green vehicle routing problem based on a vehicle sharing strategy.The Clarke and Wright savings heuristic algorithm combined with the sweep Algorithm and the multi-objective particle swarm optimization algorithm was designed.Li et al. [19] found shared depot resources more beneficial for multi-depot vehicle routing problem compared to unshared depots strategy.Wang et al. [20] investigated a collaborative multiple centers vehicle routing problem with simultaneous delivery and pick-up (CVRPSDP) which shared vehicles and customers.The adoption of k-means and non-dominated sorting genetic algorithm-II (NSGA-II) constituted a hybrid algorithm to solve the problem.In the light of the issues low loading rate and cost of distribution, Wang [21] came up with a joint distribution pattern which is based on cargo resources and remaining vehicle resources.An improved genetic algorithm with dynamic parameters was built to solve the model.In comparison with the published articles in the domain, the essence of joint distribution is the integration of resource such as depots, vehicles, customers or cargos.

C. DATA DRIVEN OPTIMIZATION
In recent years, data-driven optimization approaches are widely used to management decisions or operations research, which can better improve efficiency or reduce costs with the help of data resource and big data analysis.Liu et al. [22] highlighted the fuzzy boundary and time-varying decision scenarios of an expected epidemic outbreak and proposed an innovative decision framework for optimizing the epidemic-logistics network based on data-driven.Under this new decision framework, the entire emergency response process could be converted to an interactive evolution process of data learning and resource optimization.Du et al. [23] investigated Cholera has clear spatial variation in its transmission pattern, proposed a data-driven optimization approach to determine the optimal strategy of intervention resource allocation at each period and each community in a rolling-horizon manner.At each period, they integrated single-period model parameter fitting and scenario-based on stochastic programming to make decisions under uncertainty with newly acquired system understanding.Du [24] held the opinion of the interplay of epidemic detection and healthcare resource allocation requiring joint decision optimization.A multi-stage joint decision-making approach for data-driven outbreak detection and dynamic allocation of healthcare resource was proposed.Xiang [25] extended [22] research study, considering the limited resource of the government and the need for emergency procurement from external suppliers, aiming at the problem of choosing the optimal time point for the start of emergency procurement, the rule for judging the optimal time for stopping (or starting procurement) of epidemic observation was designed, and the boundary characteristics and influencing factors of the optimal stopping time were obtained by combining with theoretical analysis, and a data-driven model was constructed based on the cyclic decision-making idea of ''epidemic prediction, emergency effect comparison, stop-time judgment, and parameter update''.A data driven methodology which employed the empirical risk minimisation (ERP) principle was introduced in the inventory management, Clausen and Li [26] formulated a big data driven dynamic order-up-to level inventory model which took multiple features into account and did not need classical distributional assumptions and designed a machine learning algorithm to solve the model.In view of the problem of unseasonable order assignments and long distribution distance, taking Meituan and Eleme as examples, Xiong and Yan [27] analysed the mechanism of dynamic real-time optimization and the mechanism of algorithm optimization mechanism for the intelligent delivery order dispatching on the delivery platform.To tackle the challenge of uncertain delivery time in last-mile food delivery, Chu et al. [28] proposed a data-driven optimization approach that combined machine learning techniques with capacitated vehicle routing problem and used a new smart predict-then-optimize framework.An efficient mini-batching gradient and heuristic algorithms were designed to solve the joint order assignment and routing problem.However, fresh produce distribution network optimization based on data-driven is not common.

D. GAP ANALYSIS AND CONTRIBUTIONS
To sum up, there have been certain achievements in research on the optimization of fresh produce distribution network 111156 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.optimization, joint distribution, and data-driven optimization, but there are still some research gap existing: (1) no scholar has combined the optimization of fresh produce joint distribution network under distribution center sharing strategy with data-driven optimization approach, so as to improve the delivery efficiency of fresh produce; (2) no scholar has included the impact of seasonality in fresh produce logistics demand into the optimization of fresh produce distribution network; (3) scholars always use sensitivity analysis to discuss the impact of fresh produce demand on the optimization of fresh produce distribution network.However, fresh produce prediction and fresh produce joint distribution network optimization are multi-stage decision-making optimization problems with mutual influence and coupling, which need decision-making optimization jointly.In response to the gaps of the above research, an innovative rolling adjustment framework model based on data-driven for optimizing fresh produce joint distribution network was proposed, which follows seasonal changes.The cycle of the rolling adjustment framework model was divided according to seasonal changes, with each cycle including four steps: fresh produce logistics demand prediction, fresh produce joint distribution network optimization, data collection, and parameters adjustment of the prediction model.Based on the strategy of sharing distribution centers, a catastrophe adaptive genetic algorithm with variable neighborhood search (CAGA-VNS) is proposed.The crossover operator and mutation operator are adaptively set to avoid premature convergence.Whereas dynamic catastrophe mechanism and neighborhood search structures are introduced to improve the quality of solution, numerical experiments are taken to test the effectiveness and superiority of the algorithm.

III. FRAMEWORK MODEL FORMULATION
This section explains the investigated fresh produce joint distribution network rolling adjustment framework model based on data-driven.The rolling adjustment framework model involves the following four steps.First, the support vector machine optimized by the grey wolf optimizer and nonlinear principal component analysis (NLPCA) is used to forecast seasonal logistics demand of fresh produce in the future one cycle.Secondly, according to the demand forecasting results, the CAGA-VNS algorithm is designed to further optimize the joint distribution network of fresh produce (FPJDOP).Then, executing the real-world fresh produce distribution network according to the computational results of FPJDOP, collect and record the new actual data of fresh produce logistics demand and its corresponding data in real-world distribution network.Finally, new actual data will be incorporated into the prediction model, and the parameters of the prediction model will be adjusted to improve the accuracy of the prediction model, as shown in Figure 1.

A. A BRIEF REVIEW OF GREY WOLF OPTIMIZER (GWO)
The GWO is a swarm intelligent algorithm that simulates the group hunting behaviour of grey wolf packs.A social hierarchy exists in a wolf pack; to model this when designing the GWO, the algorithm regards the fittest solution in each iteration as the alpha wolf (α).Sequentially, the second-and third-best solutions are named beta (β) and delta (δ) wolves, respectively.The rest of the candidate solutions are assumed to be ϕ wolves.In the GWO algorithm, the hunting behaviour is guided by the α, β and δ wolves, which theϕ wolves follow.
A mathematical model of wolf encircling behaviour is represented by the equations: where ⃗ D denotes the distance between the prey and the wolf, t is the current iteration, ⃗ X p (t) is the position of the prey at iteration t, ⃗ X is the position vector of a wolf, r 1 and r 2 ∈[0,1] are random vectors and vary linearly from 2 to 0, ⃗ A and ⃗ H are the coefficient vectors of wolves α, β and δ, ⃗ A and ⃗ H are updated by ( 3) and ( 4).
The mathematical model of hunting behaviour is represented by the equations: where Eq. ( 5) denotes the distance between wolves ϕ andα, β and δ, the position of prey is calculated by ( 6)&( 7).

B. FRESH PRODUCE LOGISTICS DEMAND FORECASTING BY NLPCA-GWO-SVR
Aiming at improving the forecasting performance of fresh produce logistics demand.First, the effects of factors related to fresh produce are focused on, customers and logistics distribution enterprises on fresh produce logistics demand prediction.In this fresh produce joint distribution network, for each customer, exacting the following features, such as season, fresh produce price, average wage, and calculate the average delivery time damaged fresh produce ratio as the input of the NLPCA-GWO-SVR prediction model.There may be complex nonlinear correlations between these input features, the correlations between the features logistics demand may cause information overlap.That is, using the SVR model directly may produce a ''dimensionality disaster'' in the input variables.On this basis, this paper uses NLPCA to reduce the dimensionality of the input features of a traditional SVR and eliminate their correlations.Moreover, the predictive accuracy of SVR has a severe dependence on the penalty factor and kernel parameters, GWO algorithm is used to optimize the penalty factor, and bandwidth parameters of the kernel function of the SVR model.Then, the NLPCA-GWO-SVR prediction model is constructed, because of its superiority in learning and modelling nonlinear and complex relationships, SVR has been deployed in various disciplines [29], [30].Due to space limitations, the concepts related to nonlinear principal component analysis (NLPCA) and support vector regression (SVR) will not be discussed here.More details on NLPCA, GWO and SVR can refer to [31].Figure 2 illustrates the flowchart of NLPCA-GWO-SVR prediction model.

C. FRESH PRODUCE JOINT DISTRIBUTION NETWORK OPTIMIZATION MODEL FORMULATION 1) PROBLEM DESCRIPTION
In order to improve the efficiency of fresh produce at urban last mile and reduce the waste caused by the repeated allocation of distribution resource, a distribution center resource sharing strategy is introduced to tackle this problem.Figure 3(a) introduces a typical fresh produce distribution network without sharing centers which deploys four distribution centers and eight vehicles to serve customers.
As figure 3(a) shows, unreasonable distribution phenomena such as cross transportation and roundabout transportation may occur, and it is easy to violate the customer's time window, and may even reduce the freshness of fresh produce.In this FPJDOP, the decision-making issues that need to be considered in this problem include: (i) Determining the number of joint distribution centers from potential distribution centers;  (i) Each distribution center has the same type of vehicles, and the driving speed of the vehicles between each node is constant; (ii) The geographical location and service time window of each customer node are known; (iii) Each order of an enterprise at an identical customer node cannot be split; (iv) Each vehicle can be used for (at most) one route, and it must start and end at the same depot; (v) The capacity of vehicle is limited and customers' demand cannot exceed the vehicle capacity.
According to the above description and assumption, the parameters are defined in Table 1.The following variables are defined in Table 2.

3) OBJECTIVE COST ANALYSIS
The total objective cost is consisted of five parts: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1) Fixed cost of opening common distribution centers, which is related to the number of chosen distribution centers, can be calculated by: 2) Fixed cost of vehicles, which is corresponding to the number of used vehicles, can be described as: 3) Travel cost, which only need to consider the effect of distance, for simplicity.Travel cost can be calculated by: 4) Time window violation cost, which is bound up to arrival time of vehicles, can be calculated by: 5) Damage cost, which is interrelated to the maturity change of fresh produce, if the maturity is not out of a consistent limit, damage cost will not be generated.Damage cost can be obtained by:

4) MATHEMATICAL MODELING
For each decision cycle t, the mathematical model of FPJDOP is defined as follows.
k∈V i∈S r∈G i∈H x rik + j∈H r∈G In the above model, the objective function ( 13) minimizes the total cost in each decision cycle, where the first component is the fixed cost of opening common distribution centers, the second component is the fixed cost of vehicles, the third component is the travel cost, the fourth component is the time window violation cost, and the last component is the damage cost.Constraint (14) forces each customer can only be served by one vehicle.Constraint (15) ensures that the vehicle capacity cannot be exceeded.Constraint (16) means that the vehicle must leave the node after arrival.Constraint (17) states each vehicle can only depart from at most one distribution center.Constraint (18) vehicles can be dispatched to serve customers only if the distribution center is selected.Constraint (19) forces each vehicle to depart from and return to the same distribution center, if employed.Constraint (20) eliminates the sub-tour.

D. DATA COLLECTION
According to the above FPJDOP model and corresponding solution methodology, enterprises will adjust the actual fresh produce joint distribution network according to the best solution of this problem, that is, determine which joint distribution center to use, how to allocate all customers and arrange the vehicle routing.After the implementation of the actual fresh produce joint distribution network, new features are generated such as delivery time and other features such as seasonality will be known early, which are fed into the NLPCA-GWO-SVR model to reforecast the fresh produce logistics demand.Aiming at the problems of errors between the actual logistics demand and the predicted logistics demand, the parameters of the prediction model need to be adjusted combining data-driven approaches.

E. PARAMETERS ADJUSTMENT OF THE PREDICTION MODEL
Under the rolling adjustment framework model established in this paper, for each decision cycle, when the fresh produce joint distribution network optimization (Section B) result execution is over (Section C), the indicator data related to agricultural product logistics demand within the decision cycle has been obtained, and parameters (c, σ ) of the NLPCA-GWO-SVR prediction model need to be adjusted.A prediction model with rolling time domain framework is proposed [32], the actual logistics demand in decision cycle t will be considered as a new index put into the NLPCA-GWO-SVR model when predicting the logistics demand in (t+1) decision cycle.

IV. SOLUTION APPROACH OF THE ROLLING ADJUSTMENT FRAMEWORK MODEL A. SOLUTION APPROACH OF THE MATHEMATICAL MODEL OF FPJDOP
Genetic algorithm (GA) is a highly parallel algorithm with strong global search ability, which is developed from the natural selection and evolution ideas of the biological world and has a wide range of applications in dealing with NP-hard problems [33], [34].However, GA has the disadvantage of easily falling into local optima.To improve this disadvantage, a catastrophe adaptive genetic algorithm with variable neighborhood search (CAGA-VNS), which is more suitable for solving the model of fresh produce joint distribution network established in this paper.Figure 4 illustrates the flowchart of CAGA-VNS algorithm.

1) ENCODING AND THE DECODING
The natural number coding method is used to encode chromosomes in CAGA-VNS population.Each chromosome consists of three substrings, for example, there are 9 customers, 3 vehicles and 3 potential distribution centers.Figure 5 illustrates the process of chromosome coding.
1) Substring1 shows which vehicle is responsible for each customer.There are N locations in substring1, each of which is randomly generated from 1 to K (K is the maximum number of vehicles that can be selected).If the value at the nth location is k, it means that the customer at the corresponding 111160 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
locus is served by the vehicle k, and the customer at the corresponding locus is determined by the value above the corresponding gene in substring2.
2) Substring1 and substring2 correspond one-to-one to determine the delivery routine of each vehicle.There are N locations in substring2, each of which is randomly generated from 1 to N (N is the number of customers).
3) Substring3 shows which distribution center each vehicle belongs to.As seen in substring1, there are K vehicles served, substring3 is related to substring1, so there are K locations in substring3, and the value of each location in substring 3 is up to the number of potential distribution centers, each of which is randomly generated from 1 to R (R is the total number of potential distribution centers).If the value at the kth locus is r, it means that the r distribution center is selected and the vehicle k belongs to distribution center r, meanwhile the vehicle k starts from the distribution center r and returns to the same.If distribution center r is not showing up in substring3, the potential distribution center r is not selected.

2) INITIALIZING THE POPULATION AND FITNESS
Chromosomes are randomly generated.Each chromosome corresponds to a feasible solution and a corresponding fitness value can be calculated, as expressed by (21), which indicates that the objective of the mathematical model is to achieve the minimum total cost.
where F t represents the total cost, fit is the fitness value of a chromosome.The smaller the total cost, the higher the fitness value, the better the genes of the chromosome.

3) GENETIC OPERATION a: CROSSOVER OPERATION
Due to the chromosome designed in this paper consisting of three substrings, and each substring has a different range.Crossover operation is operated, respectively.OX crossover is adopted to substring1∼substring3.Taking substring2 as Randomly choose a chromosome from the population to perform two-points mutation.Mutation rate p m is defined by cloud generator according the whole level of the population and the chosen population, and generate a random number rand ∈ [0,1], if rand < p m , then perform mutation to the chosen chromosome, randomly choose two gene to get the new individual C ′ .Figure 7 shows the process of mutation.

c: DYNAMIC CATASTROPHIC MECHANISM
When the genetic algorithm iterates multiple times and its optimal solution does not change, catastrophic mechanism needs to be called.Set the initial value of the catastrophic counter ch = 0, and the critical value of the catastrophic mechanism CH = 0.If the optimal solution changes, ch = 0; otherwise ch = ch+1.When ch ≥ CH, catastrophic mechanism works.SN chromosomes with low fitness in the population will be eliminated, and SN chromosomes will be randomly generated and added to the current population, to improve the diversity of the population.SN is not constant, but changes with the number of iterations, the number of eliminated chromosomes SN can be calculated by: where SN is the number of chromosomes that will be eliminated, SN 0 is the preset catastrophic scale, λ is a controlling parameter, iter_GA is the number of current iteration of GA, iter_GA_max is the number of maximum iteration of GA.

d: SELECTION OPERATION
A combination of elite retention and roulette wheel is adopted to select individuals to enter the next generation, which not only preserve the diversity of the population whereas inheriting the optimal solution, but also improve the convergence speed of the population and enhance the search ability of the algorithm.The specific selection process is as follows: the individuals in the parent population are arranged in the order of fitness value from high to low, and the first 10% of the individuals are directly put into the offspring population, and the rest are selected by roulette wheel.The probability of selection can be obtained by: where fit(x i ) is the fitness value of chromosome x i , P(x i ) is the selected probability of chromosome x i .

e: VARIABLE NEIGHBORHOOD SEARCHING
Neighborhood structures are the key to improve the solving efficiency and quality of CAGA-VNS algorithm.

B. SOLVING PROCESS OF THE ROLLING ADJUSTMENT FRAMEWORK MODEL
Based on the CAGA-VNS hybrid algorithm designed and the rolling time domain algorithm, sequentially solves the optimal solution of the static problem in each decision cycles, when solving the static problem in the next decision cycle, the relevant parameters in the previous decision cycle will be used as input parameters for the prediction model in the next decision cycle.The detailed steps of the rolling adjustment framework model proposed are given as follows: Step1: Initialize the parameters of CAGA-VNS algorithm, initial decision cycle t = 1.
Step2: Divide randomly the original real-world instance set into three subsets: a training set, a validation set, and a test set.
The training subset has 70 percent of the instances [3] and is used to optimize the SVR parameters.The second subset has 15 percents of the instances that are utilized to monitor the validation error in each training epoch.Then, the rest of the instances are used as the testing data for the NLPCA-GWO-SVR prediction model.Finally, the predicted output logistics demand results are used as input for the optimization of the agricultural product joint distribution network.
Step3: Setting the initial iteration number of CAGA-VNS algorithm gen1 = 1, Generate chromosomes of CAGA algorithm randomly.Calculate the fitness value of each chromosome, and record the current optimal solution.
Step6: If ch ≥ CH, use equation (22) to calculate the number of eliminated chromosomes with the lowest fitness; otherwise, record the optimal solution x, and gen2 = 0.
Step9: If gen2 ≥ iter_VNS_max, update the optimal solution and turn to Step10; otherwise, turn to Step7.
Step10: Implement the above optimal solution in current decision cycle and the actual data on fresh produce logistics demand will be collected.If t ≤ T , turn to Step2, adjust the parameters of NLPCA-GWO-SVR prediction model according to the newly collecting data; otherwise, stop the rolling adjustment framework model.

V. NUMERICAL EXPERIMENTS
To verify the effectiveness of the CAGA-VNS algorithm, experiment 1 is designed based on a standard testing benchmark (source: http://www.vrp-rep.org/),and then experiment 2 is carried out to validate the correctness of the rolling 111162 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.adjustment framework model proposed in this paper, which is originated form an actual enterprise.
where (Ex, En, He) is the controlling parameters of p c &p m , f is the average value of all the chromosomes in the current generation, f max is the optimal fitness value of the current generation, f ′ is the larger fitness of the two individuals selected for crossover operation, f is the individual selected for mutation operation, k 1 = k 3 = 0.9, k 2 = k 4 = 0.4, C 1 ∼ C 4 can be calculated by: To validate the performance of the algorithm, the CAGA-VNS proposed in this paper is compared with the classical GA, catastrophic adaptive genetic algorithm (CAGA).Run the above three algorithms 10 times each, and the running results are shown in Table 4.The last row in Table 4 represents the percentage of divergence between each run and the optimal value, the percentage of divergence can be calculated by ( 26): where cr rt is the running result of the rt time.

2) COMPARATIVE ANALYSIS
As shown in Table 4, Comparing GA and CAGA, dynamic catastrophic mechanism, the adaptive changes in crossover probability, and mutation probability can enable GA to obtain better solutions within the same runtime.CAGA-VNS has significant advantages, and the optimal solution obtained by the algorithm is only 5543.08.Moreover, the smaller the difference percentage is, the more stable the algorithm is, CAGA-VNS is the most stable algorithm with the lowest difference percentage being 9.16% among the three algorithms, followed by the CAGA algorithm with a difference percentage of 10.51%, and the GA algorithm with the highest difference percentage being 10.87%.Therefore, the algorithm designed in this paper has more advantages.
For solving the problem of optimizing the joint distribution network of fresh produce, CAGA-VNS is more competitive.

B. EXPERIMENT 2: THE CORRECTNESS OF THE ROLLING ADJUSTMENT FRAMEWORK MODEL 1) REAL-WORLD CASE
For the real-world case, we adopt a representative fresh produce sales company (Enterprise A) in Shenyang China, which is highly trusted enterprise with a wide range of business scope.However, Enterprise A adopts independent delivery to serve customers, resulting in high costs and low distribution efficiency and customer satisfaction.Therefore, Enterprise B is jointed together to perform joint distribution.Table 5 summarizes data with regard to customer nodes in the fresh produce distribution network in part.Table 6 shows the information of potential distribution center.D1∼D4 are potential distribution centers belonging to Enterprise A, D5∼D10 are potential distribution centers belonging to Enterprise B.

2) EXPERIMENTAL RESULTS a: PREDICTION RESULT ANALYSIS
We select the fresh produce demand data of two mentioned enterprises from 2012 to 2021 (2012Q1∼2021Q4) as the training test set, with each cycle length of one quarter and a total of 40 decision cycles.Table 7 shows partly the real fresh produce demand data for all customers served by the two companies during the period of Q1 2020.
Compared with the original demand, the results of forecasting counterpart forecasted by NLPCA-GWO-SVR are drawn in Figure 9.
Taken NO.1 customer as an example, the decision-making cycle range from 2012Q1∼2021Q4, the NLPCA-GWO-SVR model is used to predict.As seen in Figure 9, the blue line with solid square represents the collected actual logistics demand of fresh produce.The solid line with a multiplication sign in orange represents the prediction result.By comparing the prediction logistics demand of fresh produce with the actual changing logistics demand of fresh produce.As seen, the NLPCA-GWO-SVR model established can basically meet the actual logistics demand of fresh produce better after adjusting the parameters of proposed model in each decision cycle.At the same time, a certain buffer space exists, which can cope with the logistics demand of fresh produce in emergencies.
To investigate the superiority of the NLPCA-GWO-SVR forecasting model, two widely used evaluation measures for performance comparison are employed, namely, root mean squared error (RMSE) and mean average percentage error  (MAPE), which are formally defined as follows: where y j (t) is the ground truth value for the logistics demand of customer j in decision cycle t, whereas ŷj (t) is the predicted counterpart, ȳj (t) is the mean value of the original demand.Table 8 demonstrates the performance comparison in terms of RMSE, MAPE and R 2 evaluation metrics defined above, comparing with the baseline methods using the same trainingtest set.As seen from Table 8, the proposed NLPCA-GWO-SVR model achieves an R 2 (R-squared, the coefficient of the determination) of 0.8032, indicating that the forecasting counterpart is close to the ground truth values.Table 8 also summarizes the performance comparison in terms of RMSE and MAPE evaluation metrics.As seen, the proposed NLPCA-GWO-SVR prediction model achieves the lowest RMSE of 7.5161 and MAPE of 0.1849, which outperforms baseline models, such as BP and RF, and shows its ability to implicitly discover complex nonlinear relationships between demand and the influential factors by detecting all possible interactions between predictor variables.The well trained NLPCA-GWO-SVR model with a high prediction accuracy can be implemented for inputting demand parameter, which will be further used in FPJDOP.
Furthermore, Figure 10, 11, 12 present the R-squared, RMSE, MAPE after making a 5-time prediction.There are no exceptional points in these figures.It is clear that the NLPCA-GWO-SVR prediction model has the highest R-squared, compared with other four prediction models in figure 10.However, observing the height of the box, it can be seen that its volatility is relatively high.Figure 11 draws the results of five prediction models for RMSE, NLPCA-GWO-SVR prediction model also achieves the lowest RMSE, and the box of this model is relatively short, its RMSE distribution is very stable, as well as Figure 12.In a nutshell, the prediction model established in this paper achieves a good prediction accuracy, the model fits very well, and basically captures the change trend of the original sample.Meanwhile, to verify the robustness of the proposed NLPCA-GWO-SVR prediction model with high changing fresh produce logistics demand, a new experiment with high changing signals is carried out.300 samples of fresh produce demand are generated randomly.Figure 13 shows the result of ground truth value of fresh produce logistics demand and the forecasting counter counterpart.The NLPCA-GWO-SVR prediction model achieve a RMSE of 2.1045 in the testing period, a MAPE of 0.1175 in the testing period, which proves the proposed prediction model still has good fitting ability.

b: JOINT DISTRIBUTION CENTER SETTINGS FOR DIFFERENT DECISION CYCLES
Figure 14 shows the number of distribution centers selected in 10 decision-making cycles from 2018Q1 to 2020Q2.For example, in the cycle 2018Q1, five distribution centers (D2, D4, D6, D7, D9) are selected according to the optimization results, which covered the logistics demand of fresh produce of 35%, 22%, 18%, 15% and 10%, respectively.In the cycle 2018Q4, only four distribution centers (D5, D7, D9, D10) need to be selected to cover the demand for agricultural products of 47%, 34%, 11% and 8%, respectively.That is, through  the distribution center sharing strategy proposed in this paper, it can be seen that the number of distribution centers has been reduced and the utilization rate of distribution centers has been improved.In the joint distribution network of fresh produce at the end of the city, the setting up of joint distribution centers not only can better cover the logistics demands of customers, but also can make full use of joint distribution to form scale benefits and improve the utilization rate of distribution center.Therefore, with the seasonal changes in the logistics demand of fresh produce, it is necessary for the distribution network to make corresponding adjustments to help save enterprise costs while quickly delivering the fresh produce to customers.

c: DISTRIBUTION ROUTING ARRANGEMENTS FOR DIFFERENT DECISION CYCLES
Table 9 displays the optimal distribution routes and costs for the cycles 2018Q4 and 2019Q4.In two different optimization cycles, different optimal number of distribution centers are selected, where number 0 is used to separate different vehicles within the same distribution center.In the cycle 2018Q4, four distribution centers were selected, including D1, D5, D7 and D9, respectively.In the cycle 2019Q4, three distribution centers were selected, including D2, D7 and D9, respectively.The selection of distribution centers in the two phases is very different, except D7 and D9.Therefore, reselecting the distribution centers and optimizing the routines in different cycles is necessary.

d: COMPARISON BETWEEN DISTRIBUTION CENTER SHARING STRATEGY AND NON-SHARED STRATEGY
With the above algorithm parameters unchanged, using data from the cycle 2018Q3, this paper compares the distribution network optimization under distribution center sharing strategy (named Strategy A) with the distribution network optimization under the independent distribution mode (named Strategy B).Run the two programs 10 times, and take the average value as the final result, as shown in Table 10.TD represents the total delivery distance (unit: kilometers), TC is the total cost (unit: Yuan), VN is the number of delivery vehicle.
As can be seen from Table 10, from the perspective of total travel distance TD, the joint distribution strategy designed in this paper (Strategy A) is superior to traditional independent distribution (Strategy B), with the former saving travel distance of 14.72%, compared to the latter; In terms of TC, the former saves up to the cost of 19.46%, compared to the latter, while the former's solution reduces the use of distribution 111166 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.centers and improves the utilization rate of distribution centers compared to the latter; Regarding the number of delivery vehicles, the former uses fewer vehicles of 20% than the latter.The distribution scheme of the joint distribution strategy designed in this paper is far superior to the traditional independent distribution strategy, which can simultaneously reduce the use of distribution centers and vehicles, shorten the distribution distance, and reduce the cost.
As shown in Figure 15, comparing the optimization results of a joint distribution network based on distribution center sharing with the traditional independent distribution network optimization results in the various costs, each itemized cost is lower than the cost generated under the independent distribution mode.In terms of the opening cost of the distribution centers, the former saves 20% compared to the latter, reducing the use of the distribution center, and improving the utilization rate of the distribution centers.With respect to the time window penalty cost, it can be seen that the distribution center sharing strategy reduces the violation of customer time windows to a certain extent; Regarding the cost of fresh produce damage, the distribution center sharing strategy can help enterprises to quickly deliver agricultural products to end customers while reducing the loss of fresh produce.

VI. CONCLUSION
The research on fresh produce joint distribution network optimization has become an essential field in agricultural logistics management.The seasonality changes and maturity changes in the demand for fresh produce themselves make the distribution of urban fresh produce significantly different from industrial products.With that in mind, an innovative rolling adjustment framework model based on data-driven is presented for optimizing the distribution network of fresh produce.Each decision cycle under this framework model encompasses four progressive links, i.e., the forecast of fresh produce logistics demand, joint distribution network optimization of fresh produce, data collection and processing, and parameters adjustment of the prediction model.The main theoretical contributions of this paper can be summarized.
First, from the data-driven perspective, a rolling adjustment framework model for the optimization of agricultural joint distribution network is constructed, which is not one-off distribution network optimization in the traditional sense, and has strong applicability; In addition, the implementation of joint distribution network based on the distribution center sharing strategy enables the distribution resources of fresh produce at the end of the city to be rationally allocated according to the logistics demands of fresh produce, and improves the utilization rate of distribution centers.It provides a new way of thinking for the optimization of the urban last-mile logistics distribution network of fresh produce in China, and also provides some instructive decision reference for related enterprises.
Nevertheless, in this paper some directions are suggested for further research.One exciting research path would investigate the optimization of fresh produce distribution network under low-carbon background, due to global warming which is mainly caused by carbon dioxide emission and poses growing threat to environment and human beings.Another promising future research path would introduce disruption management to fresh produce distribution network to cope with the disruption.

FIGURE 1 .
FIGURE 1.The rolling adjustment framework model of fresh produce joint distribution network optimization based on data-driven.

Figure 3 (
b) illustrates the implementation and influence of distribution center resource sharing, where only three distribution centers and six vehicles are deployed.Each customer is served by the closest distribution centers, improving utilization of distribution centers and eliminating long and empty-vehicle trips.
(ii) Reassigning all customers of each enterprise; (iii) Vehicle routine arrangement for each joint distribution center;2) MODEL ASSUMPTIONSTo represent the real-world conditions of the fresh produce joint distribution network as close as possible, model assumptions are designed as follows: S = H ∪G(i, j ∈ S), H = {i|i = 1, 2, . . ., N } is the set of customers, G = {r|r = 1, 2, . . ., R} is the set of potential distribution centers, V = {k|k = 1, 2, . . ., } is the set of vehicles, T = {t|t =1, 2, . . ., } is the set of decision cycles.

FIGURE 3 .
FIGURE 3. Comparison of before and after achieving joint distribution under distribution center sharing.

Figure 8 (
a), (b) ∼ (e) show the original routine and routines with different neighborhood structures.Neighborhood Structure 1 (NS1): selects gene with maximum and minimum-value and then exchanges their positions.Neighborhood Structure 2 (NS2): elects two genes randomly, then swaps their positions.Neighborhood Structure 3 (NS3): randomly selects two genes from left to right, shifts the second gene to the first gene's neighbor and the units between two gene to the right.Neighborhood Structure 4 (NS4): based on NS 3 , reverse two chosen gene.

FIGURE 9 .
FIGURE 9. Comparison between the original demand and the forecasting counterpart.

FIGURE 10 .
FIGURE 10.R 2 of the five prediction models.

FIGURE 11 .
FIGURE 11.RMSE of the five prediction models.

FIGURE 12 .
FIGURE 12. MAPE of the five prediction models.

FIGURE 13 .
FIGURE 13.The forecasting result with high changing signals.

FIGURE 14 .
FIGURE 14.The number of joint distribution center setting in different decision cycles.

FIGURE 15 .
FIGURE 15.Comparison of costs under sharing and unsharing strategies.

TABLE 3 .
p c & p m .

TABLE 4 .
Comparison of the three mentioned algorithms.

TABLE 5 .
Coordinates and time window of customers.

TABLE 6 .
Coordinates and cost of potential distribution centers.

TABLE 7 .
Logistics demand data of fresh produce.

TABLE 8 .
Performance comparison of different forecasting models.

TABLE 9 .
The optimal distribution routines and cost in different cycles.

TABLE 10 .
Comparison on computational results of two strategies.