Two-Stage Optimization Model of Agricultural Product Distribution in Remote Rural Areas

Due to the lack of systematically optimized logistics networks in many remote rural areas, the online sales of agricultural products in these areas have the disadvantages of high cost, high damage and slow speed. To address these logistic problems, this paper proposes a two-stage layout optimization model of agricultural product joint distribution centres based on the geographical features of remote rural areas. The number and location of distribution centres are selected in two stages to optimize the logistics network. Chengkou County, located in Chongqing city, China, has been selected as the study area. In the first stage, AP clustering is carried out using the distance between the logistics nodes of villages and towns, and the transit station of the source of agricultural products is obtained using the correlation between the nodes, which are regarded as alternative locations for the second-level logistics nodes. In the second stage, a joint distribution centre optimization model with the lowest cost is constructed. A fruit fly optimization algorithm is used to select the optimal locations from the alternative locations as the second-level logistics nodes and obtain the specific delivery path. Optimizing the logistics network of remote rural areas can reduce the logistics costs of agricultural products in these areas, promote online sales of agricultural products, and provide the government and logistics companies with theoretical references to open up new markets in remote areas.

INDEX TERMS AP clustering, fruit fly optimization algorithm, joint distribution centres, two-stage logistics layout optimization.

NOTATIONS
One can build the joint distribution model by setting the following parameters and variables: -required area for a single distribution centre; C conB -annual construction cost of distribution centre B; C w -individual distribution centre facility and operating costs per year; D -maximum capacity of the agricultural product distribution centre; Dis AB -spatial distance from A to B; Dis BC -spatial distance from B to C; q AB -annual average agricultural product traffic from A to B; q BC -annual average agricultural product traffic from B to C; t -fixed life of each agricultural product distribution centre; AS -average annual total supply in the region; θ -agricultural product corruption rate; p -average selling price of agricultural products; h -unit cost per delivery vehicle; h AB -unit cost per delivery vehicle from A to B; h BC -unit cost per delivery vehicle from B to C; x B -one Boolean variable to mark whether to build a distribution centre in B; x B = 1,setting up a distribution center in candidate B 0,otherwise y AB one Boolean variable to mark whether to transport agricultural products from A to B; y AB = 1, transport agricultural products from A to B 0, otherwise y BC one Boolean variable to mark whether to transport agricultural products from B to C; y BC = 1, transport agricultural products from B to C 0, otherwise ABBREVIATIONS AP -affinity propagation FOA -fruit fly optimization algorithm

I. INTRODUCTION
The logistics industry integrates transportation, warehousing, freight forwarding, information and other industries, thereby playing an important role in supporting the development of the national economy. Operations in logistics are significant economic activities for competitive businesses, but with the increasing use of e-commerce and cargo transportation, the current problem regarding logistics has attracted the attention of several production companies and scholars. Taking China as an example, according to the ''China Cold Chain Logistics Development Report (2018)'', with the application and popularization of the Internet, the transaction scale of China's fresh agricultural products e-commerce market has maintained a growth rate of more than 50% for five consecutive years. However, there are many problems in China's agricultural product logistics, especially in remote areas. These problems are mainly reflected in the high cost and loss rate of agricultural product logistics. According to the data, the logistics costs of rural areas with convenient transportation are 2-3 times those of urban areas, and the logistics costs of remote villages are 4-5 times the urban costs. The circulation costs of agricultural products in some regions generally account for 30%-40% of the total cost, and fresh products account for more than 60%. Logistics cost of e-commerce for fresh agricultural products accounts for 25%-40% of the sales price, which is much higher than the logistics cost of other products, especially the distribution cost accounts for 35% of the total logistics cost [1]. Agricultural product logistics costs account for 70% of the cost of perishable goods, and according to international standards, the perishable goods logistics cost does not exceed 50% of its total cost [2]. According to statistics, the average loss rates of fresh agricultural products, such as fruits and vegetables, can reach 25%-30% because of picking, transportation and storage logistics, while the loss rates of fruits and vegetables in developed countries are less than 5% and only 1% in the United States [3].
Based on the characteristics of the Internet era and the remote rural agricultural product logistics in the context of rural revitalization, this paper selects the remote rural agricultural product logistics model, considers the optimization method of logistics nodes layout of remote areas, and constructs its optimization model. This paper establishes a twostage multi-logistical node location model. In the first stage, the first-level source transfer stations are obtained by clustering method and used as alternative locations for the secondlevel logistics nodes. In the second stage, an optimization model of the joint distribution centers for online sales of agricultural products in remote rural areas was constructed. The locations are screened and the final joint distribution centers are calculated using Fruit Fly Optimization Algorithm as the second level logistics nodes. In the process of layout optimization, the specific geographical situation of the scattered agricultural products in remote villages is considered, which makes the joint distribution center established through two stages more tally with the actual situation. In the site selection, not only the loss cost, transportation cost, operation cost and other factors are fully considered in the established location model, but also the route optimization is carried out when the distribution center is selected.
With the development of location theory, numerous methods have been proposed for the location of logistics distribution centres. Due to the different algorithms and factors considered, many types of logistics distribution centre location problems exist. Based on the economic benefits of the logistics system and the specific case, it is easy to find a problem of two-stage multi-logistics distribution centre locations based on cost minimization. Therefore, this paper aimed to determine the number and location of the two levels of logistics nodes and transportation route planning.
The logistics network optimization problem in this paper minimizes the cost of the whole logistics system by solving the following sub-problems: -How to determine the number and location of the source transfer stations; -How to determine the location of the joint distribution centres when the number of distribution centres needed to be built is known; -How to optimize the distribution routing after determining the number and location of the joint distribution centres.
For the first sub-problem, because the construction cost of the source transfer stations is not high, the focus is on how to solve the location to minimize the distribution cost, which depends mainly on the distance between logistics nodes. Thus, solving the first sub-problem requires minimizing the total distance between logistics nodes by site selection.
For the second and third problems, it is necessary to consider the characteristics of traffic conditions and infrastructure in remote villages and a series of constraints such as customer demand and vehicle load restrictions. Finally, a distribution centre optimization model with the lowest total cost can be built using a suitable heuristic algorithm.
Because the distribution centres in the county are far from the demand points outside the county, so the demand points outside the county are only served by the distribution centres outside the county. Therefore, the main service receivers of the distribution centre in the county are the demand points inside the county and the distribution centres outside the county. Thus, the overall distribution flow chart is constructed as shown in Fig1.

II. LITERATURE REVIEW
Because of increasing logistics costs, many scholars have realized the importance of joint distribution. Kazemi et al. [4] noted that joint distribution not only enables enterprises to enjoy long-term economic benefits but also improves the operational efficiency of the entire supply chain, serving as an effective solution to distribution planning. Sheng [5] constructed a four-in-one rural e-commerce alliance operation mode based on a long tail theory of ''third-party logistics + postal logistics + passenger transport logistic + grassroots logistics''.
In logistics studies, many papers have been published on the location problem of distribution centres and the distribution routing problem. In 1929, Harold Hotelling proposed the location of two competing suppliers on a straight line. Kuehn and Hamburger [6] proposed a heuristic algorithm to solve the warehouse location problem. Many scholars have carried out in-depth research on the location of distribution centres since then, including establishing various mathematical models [7] and solving algorithms [8], [9]. Various forms of location models have been developed, which can be categorized into analytic, continuous and network models [10]. The continuous location model is calculated with the centre of gravity method [11], [12]. Aiken [13] proposed several discrete location models, such as dynamic programming models, linear programming models [14]- [16],and 0-1 integer programming models [17], when studying the location model of a distribution centre. Through in-depth analysis and comparison of several models, it can be seen that these models are aimed at achieving the minimum total cost of selecting facilities, but the difference lies in that different programming models have different forms of objective functions and constraints. The Delphi method, which is based on qualitative analysis, is commonly used in the fuzzy evaluation method [18], [19]. Chen [20] made location decisions for fuzzy multi-objective decision-making data and put forward a multi-objective optimization decision-making method. Klapita and Švecová [21] dealt with a possible method of finding the optimal location of logistics centres at uncertain costs represented by fuzzy numbers to minimize the complete costs of a system. Turskis and Zavadskas [22] presented a newly developed ARAS-F method to select the most suitable site for a logistics centre among a set of alternatives. With an illustrative examplethe selection of a logistics centre location-the proposed methodology was validated. Xu et al. [23] solved the singleplant and the multiplant location decision problem by establishing the conceptual and mathematical model.
For the computation of a location optimization model and distribution routing optimization, the gravity centre location model is mainly calibrated by the iterative method. Various heuristic algorithms are used to solve the NP hard problems, mainly including the ant colony algorithm [24], the genetic algorithm [4], the particle swarm optimization algorithm [25], [26] and corresponding improvements [27], and the fruit fly optimization algorithm [28]. Hiassat et al. [29] considered the characteristics of perishable goods, established an optimization model of the distribution centre inventory problem, and used a genetic algorithm and local search heuristic method to solve it. Song and Ko [30] considered the characteristics of cold chain cars and established a nonlinear mathematical model. Bo [31] established the mathematical model of logistics distribution center location, and optimized the solution to realize the optimal allocation of distribution path. A fruit fly optimization algorithm based on Logistic chaotic system was proposed. Mulloorakam et al. [32] considered the combined objective capacitated vehicle routing problem (CVRP) based on the genetic algorithm. Simsir and Ekmekci [33]used the artificial bee colony (ABC) algorithm to produce low-cost solutions based on few parameters. Habibi et al. [34] presented a location inventory routing problem (LIRP) optimization model to reduce the total cost. Xu and Yin [35] introduced ant colony algorithm and particle swarm optimization algorithm and it provided a reference for the algorithm in this paper.
Tuzun and Burke [36] and others used a two-stage tabu search algorithm to solve the LRP problem, which separated the location problem from the routing problem and used a two-stage idea to solve it. Wu [37] and others used a simulated annealing algorithm to solve the LRP problem of multiple distribution centres and multiple vehicles. Albareda-Sambola et al. [38] used a simulated annealing algorithm to solve the LRP problem. Albareda-Sambola et al. [38] used tabu search to study a deterministic LRP with a single time window. Qureshi et al. [39] used the tabu search algorithm to solve the LRP location allocation problem and a simulation system to solve the vehicle routing problem. Sadjady and Davoudpour [40] formulated a two-echelon supply chain network design problem as a mixed-integer programming model and solved it using a Lagrangian-based heuristic algorithm. In the literature on logistics location, there are many studies on agricultural product location logistics. The study [41] of logistics can be considered as the location of non-linear transportation costs in the distribution system of agricultural products, and the problem was solved by the branch and bound method. Hwang [42] considered that the decay rate of goods presents an exponential function, and a two-stage random coverage location model for perishable goods was proposed. Gong et al. [43] proposed a location model of a perishable goods distribution centre based on particle swarm optimization (PSO) considering the security inventory and the capacity limitation of a distribution centre.
Although the above scholars conducted research on logistics networks, including distribution centre locations, common distribution models, optimized distribution paths, etc., there is room for improvement. First, most of the location optimization models established at present are for singlelevel distribution centres, but there are few studies on multilevel distribution centre location models, and the algorithm process is complex and computationally intensive. The multistage distribution center can better complete the distribution task and is more convenient and efficient for remote areas. The multi-level distribution center serves as a bridge between the single-level distribution center and the delivery point. Second, these studies solved the layout problem or the distribution path problem separately but lack the research that considers these two issues comprehensively. Third, there are few studies on the logistics network optimization of remote rural areas as a specific case. Therefore, it is necessary, important, and innovative to study the layout of agricultural product logistics and the distribution routes of fresh agricultural products in remote areas.

III. AP CLUSTERING ALGORITHM
This study is mainly designed to optimize the location of distribution centres in remote areas that have relatively scattered residential areas and wide geographical areas. Therefore, in the first stage, one clustering algorithm must be used to cluster these isolated villages into several regions and designate a source transfer station in each region.
The clustering algorithms include the k-means algorithm, SOM neural network, FCM algorithm, hierarchical clustering algorithm, etc. However, in the clustering process, the clustering number of most clustering methods is difficult to determine. The application of the AP clustering algorithm in the location selection of the agricultural product logistics centre does not need to specify the number of clusters in advance, which can simplify the selection process of rural logistics centre nodes and is more suitable for the complex situation of the rural areas of China. Therefore, the AP clustering algorithm is used to optimize the location of the source transfer stations.
The basic idea of the AP clustering algorithm is to select the location of the agricultural product logistics centres as follows. According to the correlation between the rural logistics nodes (the AP clustering algorithm in this chapter measures the correlation between each node based on the spatial distance between the logistics nodes), first cluster the logistics nodes in rural logistics systems, then regard the obtained exemplars as the logistics centres, and finally regard the other nodes within the cluster as the service objects of the exemplar.

A. PROCESS DESCRIPTION
First, the distance between the nodes is derived from the position information of the logistics nodes to form the node distance matrix. Second, the distance matrix is preprocessed so that the distance matrix can better reflect the relationship between the logistics nodes. Finally, based on the preprocessed data used to cluster logistics nodes, regard the exemplars obtained as the locations of the logistics centres and the number of clusters as the number of logistics centres; the flow chart is shown in Fig. 2.

B. DESCRIPTION OF AP CLUSTERING ALGORITHMS 1) DISTANCE MATRIX ACQUISITION
First, we obtain the spatial distance between the nodes according to the position information between the rural nodes and construct the distance matrix. Due to the inadequacy of rural transportation facilities, the geometric distance between nodes does not necessarily reflect the cost of logistics transportation. Therefore, this paper uses the length of the road between nodes to represent the ''space distance'' between nodes. Node i and Node j represent any two logistics nodes, and Dis (i, j) represents the distance between the nodes. The distance matrix of n rural logistics nodes reflecting the spatial distance between the logistics nodes is DisD, and the matrix element is DisD (i, j).

2) DISTANCE MATRIX PREPROCESSING
The AP clustering algorithm clusters all nodes by measuring the correlation between nodes according to the value of the matrix elements. The larger the value of the matrix elements is, the larger the correlation between the nodes. However, the distance matrix reflects the opposite: the larger the original value of the matrix is (the larger spatial distance between the nodes), the smaller the correlation between the nodes, which is different from the input data required by the AP clustering algorithm. Therefore, the distance matrix must be preprocessed first.
The element values of the distance matrix are processed as shown in formula (1).
After preprocessing, the values of the elements of the distance matrix, which range between 0-1, can reflect the correlation between the nodes.
Because rural areas are generally broad, the distance between the logistics nodes in the rural logistics system may vary widely. The clustering results obtained by directly inputting the distance matrix into the AP clustering algorithm may not be ideal [45]. Therefore, the distance matrix after preprocessing is processed again using the ''mapping mechanism'' to adjust the results to some extent so that the distance between the nodes is mapped in a relatively small range.
The mapping rule is as follows: use the matrix element plus adjustment coefficient b (the value is smaller), and then take the logarithm of a as the base to generate a new value f (i, j) of the matrix element and the distance matrix [f (i, j)] after mapping. During the test, this paper assumes a = 2, b = 1, as shown in formula (2).

3) CLUSTERING OF LOGISTICS NODES
The preprocessed distance matrix [f (i, j)] is used as the input data of the AP clustering algorithm, and the output result is the collection of cluster centre points. The clustering process of the AP clustering algorithm on the input matrix is described in detail below: The AP clustering algorithm regards n logistics nodes as potential cluster centres based on two types of information passed between nodes: responsibility and availability. If node k represents the candidate node, S (k, k) is the correlation between nodes. R (i, k) represents the degree of responsibility which is the numerical information sent from the logistics node i to the candidate k and is used to describe the responsibility of node k regarding as the logistics distribution center of node i; A (i, k) represents the degree of availability, which is the numerical information sent from candidate k to logistics node i and is used to describe the possibility of node i to select node k as its logistics distribution centre. [R (i, j)] is a responsibility degree matrix, and [A (i, j)] is a candidate degree matrix.
The calculation process of the node responsibility value and the availability value is shown in formulas (3) (4) and (5).
The iterative process continuously updates the r and a values of each point until R (i, k) and A (i, k) remain unchanged, generating multiple high-quality exemplars (multiple final logistics centres).
Since the execution process of the AP clustering algorithm is prone to oscillation, the damping factor γ is introduced to adjust the calculation speed and avoid oscillation. In the current iteration process, the update result of responsibility value R (i, k) and the availability value A (i, k) of the logistics node i is obtained by weighting the result of the previous iteration. The value of γ is generally larger than or equal to 0.5 and less than 1. In this calculation, it is set to 0.5. If the number of iterations reaches the set number of times, the calculation process is terminated; otherwise, it continues to iterate. The calculation process is shown in formulas (6) and (7).
After the distance matrix of the logistics nodes is clustered, the logistics nodes are divided into N clusters according to the correlation between them. Each cluster is recorded as C N (N < n) such that: C i N is the node in the logistics system, C c N (i, j) is the location of the logistics centre, which is a suitable logistics centre for C 1 N and C 2 N and other logistics nodes.

IV. FOA
The FOA is a new group intelligence optimization algorithm based on the food finding behaviour of the fruit fly, proposed by Pan [44]. The fruit fly optimization algorithm is an abstraction of powerful smell and visual searches of food finding behaviour. It has the advantages of group collaboration, information sharing, easy programming, and faster search speed. The second stage selects several source transfer stations from alternative distribution centres as the joint distribution centres. This is a non-deterministic polynomial (NP) problem. Therefore, when solving an NP problem of multidistribution centre location selection, a heuristic algorithm is more convenient.

A. SOLUTION ALGORITHM
First, initialize the population size, Sizepop, the maximum number of iterations, Maxgen, and randomly initialize the position of the fruit fly group (X _axis , Y _axis ). The random initial fruit fly swarm location is shown in Fig. 3.
Taking the initial position as the starting point, the flying direction (RandoM ) and the optimization step (Value) of the individual flies are randomly given, and the RandomValue is used as the search distance. The formula is as follows: The distance to the origin (Dist) is estimated first, and then the smell concentration judgement value (S) is calculated, which is the reciprocal of distance.
Substitute smell concentration judgement value (S) into the smell concentration judgement function (fitness function) to find the smell concentration (Smell i ) of the individual location of the fruit fly; the formula is as follows: Determine the fruit fly with optimal smell concentration (the maximal value) among the fruit fly swarm and record the optimal individual smell concentration (bestSmell) and the current optimal individual number (bestIndex).
[bestSmell,bestIndex] = max(Smell) Keep the best smell concentration value (bestSmell i ) and X , Y coordinate, and at that moment, other individuals of the fruit fly swarm will fly to the optimal fruit fly through visual observation.
Start the iteration, repeat the implementation of Steps 2-5 when the number of iterations is less than the minimum number of iterations, then judge if the smell concentration is superior to the previous iterative smell concentration; if so, implement Step 6; otherwise, keep the previous minimum taste concentration and end the algorithm.

B. DISTRIBUTION CENTRE LOCATION OPTIMIZATION BASED ON FOA
The FOA optimizes the location of the logistics distribution centres as follows. Fig. 4 shows the flow chart of optimization.
Step 1: Set the population size of the IFOA (improved fruit fly optimization algorithm) algorithm Sizepop, the maximum number of iterations Maxgen; VOLUME 8, 2020 Step 2: According to the fitness function formula (12), calculate the fitness function value of the individual fruit fly and find the position and optimal value of the individual and global optimal individual of the fruit fly; Step 3: Update the speed and position of the fruit fly population; Step 4: Calculate the fitness and update the position and speed at the same time; Step 5: If gen > Maxgen, save the optimal solution; otherwise gen = gen +1, and go to Step 2; Step 6: Select the best location and optimal distribution range of the corresponding logistics distribution centre according to the optimal location.

V. MATHEMATICAL FORMULATION A. HYPOTHESES
The specific location of each supply location and the number of suppliers is known; There are transportation costs that are proportional to the transportation volume and transportation distance; The construction cost of each source transfer station is the same; The construction area of each distribution centre is the same; The damage costs of agricultural products in warehouse storage and the storage costs of distribution centres are not considered; Demand outweighs supply; The maximum storage per distribution centre is the same and known; It is assumed that goods of each township logistics node are only transported to one source transfer station, goods of each source transfer station are only transported to one distribution centre, and demand points can only be served by one distribution centre; The specific steps for calculating the number of distribution centres are as follows: Use linear regression to predict the annual average agricultural products and the average annual agricultural commodity rate in the region in the next decade to calculate the total annual agricultural product supply. Then, the annual average total supply of agricultural products is divided by the maximum storage capacity of the distribution centre to obtain the number of distribution centres.

B. JOINT DISTRIBUTION CENTRE LOCATION MODEL
The objective of the joint distribution centre model is to minimize the total cost. The objective function and constraints are written as follows: The fixed cost of the joint distribution centre could be calculated according to the following equation: The transportation cost from the source transfer stations to the joint distribution centres can be expressed as follows: M B=1 N A=1 q AB h AB Dis AB (16) The transportation cost from the joint distribution centres to the demand points can be expressed as follows: The damage cost of agriculture products transported from the source transfer stations to the joint distribution centres and to the demand points can be represented as follows, respectively.
Then, the objective function is used to make the construction and operation costs of the joint distribution centres and the distribution and damage costs from the source transfer stations to the joint distribution centres the lowest, which is calculated as follows: The number of joint distribution centres selected from the alternative centres is expressed as follows: The quantity of goods transported to the joint distribution centres does not exceed the maximum storage capacity, which is written as follows: It is assumed that the goods of each township logistics node are only transported to one source transfer station, and the goods of each source transfer station are only transported to one distribution centre as follows: The total amount of agricultural products transported from the source transfer stations to the joint distribution centres is equal to the total agricultural product supply in the area, which is as follows:

VI. CASE STUDY A. BACKGROUND DESCRIPTION
Chengkou County is located in the northernmost part of Chongqing, which is at the junction of the three provinces (cities) of Chongqing, Shanxi and Sichuan in China. The county covers an area of 3,292 square kilometres. There are two subdistrict offices in Gecheng and Fuxing and 31 townships. There are 184 administrative villages in 22 communities. In 2017, the registered population reached 253,000, of which the agricultural population was 216,500. Chengkou County has an excellent ecological environment and outstanding agricultural resources. The agricultural land area is very large, and the output is high. There are many free-range chickens, potatoes, freshwater fishes and other agricultural products. Chengkou free-range chickens, honey, artichokes, etc. However, the economic development of Chengkou County is low. In 2016, the GDP in Chengkou County was only 4.512 billion yuan, twenty times less than Wan County, with the highest GDP of 89.739 billion yuan. At the same time, the agricultural commodity rate in Chengkou County is low, far lower than the city's average level, which is far from the Dadukou District, which has the highest agricultural commodity rate in the city. Chengkou County has the lowest agricultural commodity rate except for Wuxi County and Wushan County. The specific data are shown in Fig. 5 [46].
Generally, Chengkou County has the following agricultural logistics problems: Insufficient investment in agricultural products logistics, and there is a lack of large-scale specialized logistics enterprises.
Insufficient logistics infrastructure: there are no railways or national highways in Chengkou County, which seriously hinders the external distribution of agricultural products.
Remote geographical location leads to long-term transportation of agricultural products, and decay and deterioration easily occur.
Farmers' logistics awareness is weak, and they are reluctant to hand over the logistics business to professional logistics enterprises, resulting in product dispersion, low transportation efficiency and high logistics costs.
Lack of distribution mode for fresh agricultural products. The status quo of agricultural products logistics still cannot adapt to agricultural development, which becomes the bottleneck of agricultural development in Chengkou County.

B. OPTIMIZATION OF SOURCE TRANSFER STATION LOCATION
To solve the related problems of Chengkou County logistics, the established model and method are used. Therefore, it is necessary to address the first-stage problem, that is, to determine the location of source transfer stations in towns and villages.

1) SPATIAL DISTANCE ACQUISITION OF TOWNSHIP LOGISTICS NODES
By consulting relevant data, the relevant information of rural logistics nodes in 31 townships in Chengkou County is obtained as shown in Table 1 and Fig. 6.
VOLUME 8, 2020   Table 2 shows part of the spatial distance (unit: km) between the 31 township logistics nodes shown in Fig. 6. The details are shown in Appendix A. To make the experimental results more realistic, the actual distance is used instead of the straight-line distance between the nodes.
To make the matrix better reflect the correlation between nodes, it is necessary to preprocess the spatial distance between the rural logistics nodes shown in Table 2 and Appendix A and obtain the matrix as shown in Table 3 and Appendix A.

2) ACQUISITION OF THE SIMILARITY MATRIX
By preprocessing the distance matrix, the following similarity matrix is obtained in Table 3.
The matrix data shown in Table 3 and Appendix are the input matrix data of AP clustering. The matrix element values reflect the ''correlation'' of the logistics nodes: the larger the value, the higher the correlation.

3) ANALYSIS AND DISCUSSION OF CLUSTERING RESULTS
In this paper, Python 3.7.3 is used to calculate the results. The AP clustering algorithm flow is used to process the data, setting the maximum number of iterations of AP clustering 'max_iter'= 500, the damping factor 'damping'= 0.5, and the convergence coefficient 'convergence_iner'= 15. A CPU with an Intel (R) Core(TM) i5-5200U is used with a Windows system of 2.20 G. The results show that the original 31 township logistics nodes are divided into 8 clusters, and the clustering results are shown in Fig. 6, which results in the location of 8 logistics centres in the rural logistics system shown in Fig. 7.

C. OPTIMIZATION OF JOINT DISTRIBUTION CENTRE LOCATION
In the first stage, 8 source transfer stations were selected by the AP clustering algorithm in Chengkou County, Chongqing. The second stage uses the fruit fly algorithm to obtain joint distribution centres.           First, 8 supply source transfer stations will be used as the supply points and candidate points of the joint distribution centres, and the centres of neighbouring counties and the next-level distribution centre will be used as the demand points (a total of seven demand points) to construct the joint distribution centre location model. The distribution path diagram is shown in Fig. 9.
The following data information is obtained. The set values of the relevant parameters in the model are shown in Table 4. The distance between the alternative distribution centres is shown in Table 5. The distance between the alternative distribution centres and demand points is shown in Table 6. The annual supply of agricultural products at various supply points is shown in Table 7. The unit land price, average annual construction cost and capacity limit of each alternative distribution centre are shown in Table 8. The source transfer stations of Bashan town, Miaoba town, Mingtong town, Xianyi town, Beiping township, Xiuqi town, Longtian township and Huang'an township, are marked as X 1 ,X 2 , X 3 ,X 4 , X 5 ,X 6 , X 7 , and X 8 , respectively. The seven demand points of Chengkou County, Wuxi County, Kaizhou, Wanyuan city, Kaijiang County, Dazhou, and Wanzhou, are marked as Y 1 , Y 2 ,Y 3 , Y 4 ,Y 5 , Y 6 , and Y 7 . They are shown in Table 9.

VII. RESULTS AND DISCUSSION
Three of eight source transfer stations need to be chosen as the logistics distribution centres. Setting the fruit fly population VOLUME 8, 2020   size Sizepop = 20, Fig. 10 shows the optimization result of 2000 iterations of the algorithm, and Fig. 11 shows the total cost and distribution routing. Table 8 shows the final location and dist Problems and Policy Suggestions of ribution path from the source transfer stations to the joint distribution centres. Table 10 shows the distribution path from the joint distribution centres to the demand points.
As shown in Fig. 9, the number of iterations is 2000, and the total cost of the remaining joint distribution centre locations has reached the optimal value of 2,190,402 yuan, or approximately 2.2 million yuan. Based on the geographical factors, the final iteration result of this paper is the final plan of site selection, namely, Xianyi town, Xiuqi town and Huang'an township, which are the most preferred site centres. The agricultural products of Mingtong town are distributed to Xianyi town, the agricultural products of Longtian township and Miaoba town are distributed to Xiuqi town, and the agricultural products of Bashan town and Beiping township are distributed to Huang'an township. The agricultural products of Xianyi town are distributed to Wuxi County, Kaizhou and Dazhou. The agricultural products of Xiuqi town are distributed to Kaijiang County. The agricultural products of Huang'an township are distributed to Chengkou County, Wanyuan city and Wanzhou. The details are shown in Table 10 and Table 11.
Through the study, it is found that the final joint distribution centers can meet the conditions of close distance to the county center and convenient transportation. It shows that the logistics network optimization for online sales of remote rural agricultural products designed by this way is feasible, and proves the theoretical and practical significance of the paper.

VIII. CONCLUSION
This study found that final joint distribution centres can meet the conditions of close distance to a county centre and con- venient transportation. The model shows that a logistics network optimization for online sales of remote rural agricultural products designed in this manner is successful. Rural logistics construction has significant importance and necessity for rural revitalization. The logistics of online sales of agricultural products is a key part of rural logistics construction. For remote villages with relatively inadequate infrastructure, it is extremely important to address the problem of online sales of agricultural products. Based on the background of rural revitalization, this paper analyses the problem of agricultural products in remote rural areas, combining the characteristics of agricultural products and the logistics characteristics in the Internet era and remote rural areas. This paper proposes a logistics distribution model for agricultural products in remote villages. On the basis of fully considering the distribution characteristics of agricultural products in remote VOLUME 8, 2020 rural areas and the principle and influencing factors of site selection, logistics network optimization is carried out for the first kilometre of remote villages. The first stage clusters the township logistics nodes to obtain alternative joint distribution centres. The second stage builds a joint distribution centre location model for remote rural agricultural products based on the existing agricultural product distribution centre location model. Combined with specific cases, the AP cluster-ing algorithm is used to cluster the township logistics nodes to obtain the source transfer stations. Finally, the source transfer stations are regarded as the alternative distribution centres, and the fruit fly optimization algorithm and MATLAB software are used to solve the problem. The joint distribution centre location model for remote rural agricultural products has a certain reference value for the same problem in other remote areas. There are still some limitations in this paper, which need to be improved. First, this paper lacks spot investigation and expert consultation. The main considerations are economic benefits in modeling, but without considering the natural conditions, the calculated location results may not be the final site. In the future research process, it is necessary to use a variety of methods and consider a variety of factors to conduct a more comprehensive analysis of the location problem. Second, Fruit Fly Optimization Algorithm has the disadvantages of easy to fall into premature convergence and low precision. In the future research, if Fruit Fly Optimization Algorithm is used to study related problems, it is necessary to improve Fruit Fly Optimization Algorithm by combining various heuristic algorithms. VOLUME 8, 2020

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.

APPENDIX B
Affinity Propagation Clustering Main program from sklearn.cluster import AffinityPropagation from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs import numpy as np import xlrd # Define input matrix function def excel(path):   [2])) # Create whether to distribute matrix c = (Dis_AB * q_AB * fa.T).T # c is the transport scheme matrix b = sum(sum(c)) # Total distribution costs aa = q_AB * p bb = (1-cita) * * Dis_AB cc = 1/bb-1 dd = (aa * cc * fa.T).T # Damage cost matrix for each location ff = sum(sum(dd)) # Total damage cost gg = ff+b return gg,fa cost_AB() # Define the function to be solved (named fun1) def fun1 ( He is the author of six books, more than 50 articles, and more than six software copyrights. His research interests include supply chain and logistics management, digital management, and modeling and simulation of complex systems.
Dr. Zhang was a recipient of the Outstanding Paper Award from the China Society of Logistic, the BaoGong Logistics Award, the Humanities and Social Sciences Award, Heilongjiang, and the Beijing Teaching Achievement Award from the Beijing Technology and Business University.
HUIXIA FENG was born in Xinzhou, China, in 1996. She received the bachelor's degree in management from the Shanxi University of Finance and Economics, in 2018.
Since 2019, she has been a Postgraduate Research Student with the School of E-Business and Logistic, Beijing Technology and Business University. Her research interests include supply chain and logistics management, digital management, and modeling and simulation of complex systems.
Dr. Feng was a recipient of the Third Prize in the Mathematical Contest in Modeling, Shanxi, in 2016, and the Freshman Admission Scholarship from BTBU, in 2019.
HONGMEI WANG received the bachelor's degree in management from the Business School, Beijing Technology and Business University, in 2019.
She is currently a Postgraduate Research Student with Beijing Jiaotong University. Her specialization is logistics management. Her research interests include supply chain management and data mining and analysis. She was a recipient of the Second Prize of the University Data Analysis Contest twice in a row, the Third Prize in Thesis Writing Contest during the undergraduate period, and the Freshman Admission Scholarship from BJTU, in 2019. VOLUME 8, 2020