Joint Distribution Center Location Problem for Restaurant Industry Based on Improved K-Means Algorithm With Penalty

The location of joint distribution center (JDC) is of great importance in urban freight system. This work aims to minimize the total cost, including the fixed cost, the transportation cost, and the penalty cost for the missed deliveries, considering the joint distribution willingness (JDW) of restaurant and the coverage radius of JDC. The integer programming model is formulated to select the optimal location of JDC and opens no more than k-JDCs. To solve the problem, an improved K-means (I-K-means) algorithm combining local search with penalty is proposed. The delivery problems obtained by freight survey from 114 restaurants in Beijing, China are taken as a case study. The formulation in this paper can reduce the cost of deliveries in restaurant industry, decrease the number of freight vehicles on road, and promote the joint distribution mode in urban freight system.


I. INTRODUCTION
Owing to the development of online food orders, the demand for suppliers in the restaurant industry is growing sharply [1]. The great amount of freight demand put high requirements on the distribution efficiency. Most suppliers distribute food materials directly to restaurants. The mode of direct distribution is shown in Fig. 1 (a). The direct distribution requires a large number of vehicles, which are partially loaded and occupy the road resource. The joint distribution can significantly save the transportation cost, increase the load utilization of vehicles, and reduce number of establishments. The concept of joint distribution organized for the restaurant industry is illustrated in Fig. 1 (b). JDC can provide the logistics activities such as sorting, warehousing, and customized packaging. The location of JDC significantly affects the efficiency of logistic system and should balance the interests of participators involved in the joint distribution alliance [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Rashid Mehmood . The joint distribution willingness (JDW) can reflect the attitude of receivers to participate in the joint distribution. This paper aims to fill the gap and gain insight into JDW of restaurants to determine a cost-effective location of JDC.
The formulation assumes that each JDC has a coverage radius, which is determined by JDW of restaurant and the geospatial distance between JDC and restaurant. If the restaurant locates far from JDC or the JDW of restaurant is low, JDC will miss these deliveries. The missed delivery will incur a penalty cost and the restaurant not be served will be seemed as 'punished'. To ensure that the food materials could arrive on time from JDC to restaurants, these materials are transported from supplier to JDCs by trucks, and then distributed from JDCs to restaurants within the coverage radius by vans.
The contributions of this paper are as follows: (1) constructs the restaurant's JDW function and define the coverage radius of JDC; (2) formulates the location selection model considering the fixed cost, the transportation cost, and the penalty cost for the missed deliveries; (3) designs the I-K-means algorithm combining local search with penalty to identify the optimal location of JDC. The magnitude of the proposed model is that it can help stakeholders to determine the location of JDC to save the total cost. The remainders of this paper are organized as follows: The related works are reviewed in Section II. The methodology of determining the location of JDC is proposed in section III. I-K-means algorithm combining local search with penalty is presented in section IV. Section V conducts the numerical experiments using the actual freight survey data from 114 restaurants in Beijing. Finally, a summary of the work and an outlook on future research are put forward in section VI.

II. RELATED WORKS
In order to study the location problem of JDC for restaurant industry, the related literatures are classified into four categories: food supply chain, joint distribution, the distribution center location problem, and solution algorithms.

A. FOOD SUPPLY CHAIN
A well-designed supply chain can improve the efficiency of the logistic system and serve the customers in a timely manner [3]. The design and the management of supply chain are complicated with the characteristics of the products. Donk et al. [4] put forward that the food supply chain should provide high volume and fast moving products that were accessible to consumers. Ala-Harja and Helo [5] pointed that the hub-and-spoke structure could describe the distribution network of the food supply chain well. The important hubs such as distribution centers (DCs) could help the logistic system achieve a high utilization. Aghazadeh [6] indicated that the food supply chain was a dynamic environment and receivers had high expectations for the food safety and quality. Beske et al. [7] believed that DCs played an important role in food supply chain and the customers had high expectations for the food safety. Bosona and Gebresenbet [8] developed a coordinated distribution system to guarantee the food quality. Due to the perishable nature and temperature sensitivity of food material, the receivers had high requirements on the delivery time to reduce the losses of food materials [9]. In the food supply chain, the description of distance cannot be simply expressed by the spatial distance. The time-based distance and psychological-based distance should be considered to describe the food accessibility [10]. Wu et al. [11] deemed the time deadline as a constraint to optimize the perishable food supply chain network. Hubley [12] used the spatial terms to map all the demographic points of interest to define the food accessibility to receivers. According to the amount and time of the food available to consumers, Yeager and Gatrel [13] weighted the food outlets to describe the food accessibility.

B. JOINT DISTRIBUTION
As an indispensable activity in the supply chain, transportation occupied nearly 50% in the total logistics costs [14]. In order to acquire a more guaranteed-quality logistics and reduce the total cost, the joint distribution strategies could be adopted among cooperators, and in some cases, even competitors [15]. In the joint logistics system, multiple carriers could form alliances to optimize their transportation operations by sharing the vehicle capacities and order requests [16]. Through resource sharing, the joint distribution could be a new business mode for transportation industry [17]. Sun et al. [18] pointed that the joint distribution strategy could serve multiple customers on the same route, which increased the truck utilization and reduced the total operational cost. Frisk et al. [19] put forward that the long transport distances, increasing fuel prices and environmental concern made it important to improve the transportation planning and reduce costs. The cost savings of the joint distribution were in the range of 5-15%. Wang et al. [20] indicated that the traditional logistics distribution network assumed that each logistics facility was willing to collaborate. They designed an appropriate two-echelon logistics joint distribution network optimization structure with a reasonable cooperation mechanism considering the logistics participants' willingness. Holguín-Veras and Sánchez-Dían [21] found that the amount, time, destination, and mode of deliveries to receivers influenced the cargo consolidation in the joint distribution system.

C. DISTRIBUTION CENTER LOCATION PROBLEM
The distribution center location problem is an instance in the facility location problem. DCs should meet the requirements of receivers with the shortest time and the lowest cost. A suitable location of DC can increase the receivers' satisfaction, minimize the operation cost, and promote the sustainable development of logistic system [22]. Dell'Olio et al. [23] discovered that the factors influencing the receivers' acceptance towards DC included the distance, the tax reductions, and the time goods delivered to receivers. Oum and Park [24] proposed that the most important factors effecting the site of consolidated DCs were the geographical location, VOLUME 8, 2020 the market size, the region accessibility, and the modern logistics service level. Heuvel et al. [25] pointed that the accessibility was an important factor in freight system. Good accessibility to DC could meet the demand of receivers in more markets [26]. Konuk [27] pointed the price influenced the receivers' buying intentions. Roth and Bösener, [28] stated that the price was a primary factor influencing the receivers' satisfaction. Heydari and Norouzinasab [29] deemed the price as the main factor to analyze the receivers' demand towards the supplier. Zeithaml [30] analyzed the relationship between the price, quality and value of products from the customer's point of view. Marshall [31] mentioned that the price sensitivity varied with the expected price and actual price.
In terms of the model formulation of distribution center location, Li et al. [32] proposed a mixed-integer linear programming model to minimize the total cost, including the economics cost, environmental cost, and socio-economic cost to determine the location of urban distribution center. Casas-Ramírez et al. [33] presented a bi-level capacitated facility location model. The aim was to minimize the fixed cost and distribution cost of the facilities and maximize the customers' preferences towards the facilities. Wu et al. [34] deemed the receivers' delivery time as a main constrain to select the location of DCs. Kolen and Tamir [35] put forward that there were three types of costs including the set up cost, transportation cost, and penalty cost in the facility location problem. Wang et al. [36] proposed a k-facility location problem with linear penalties. They opened no more than k facilities and payed the penalty cost for the non-served clients. To limit the influence of far-away receivers on the location of DC, the coverage radius was used as a criterion in the location selection model [37]. Bhattacharya and Nandy [38] raised that the coverage radius of facility could maximize the number of users it served. Miyagawa [39] calculated the maximum joint distribution distances between two facilities to analyze their location reliability.

D. SOLUTION METHODS
The DC location problem is a combinatorial NP-hard problem. The methods used to solve this problem can be categorized as the exact methods, the heuristics and the metaheuristics. The exact methods include the Bender decomposition [40] and branch-and-bound [41]. The optimal solutions of the linear programming [42], integer programming [43], mixed integer programming [44] and quadratic programming [45] location problems can be obtained. The exact methods are suitable for the small-scale problems. The heuristics and the meta-heuristics can solve the large scale problem in a fast and effective computational speed. The heuristics and meta-heuristics mainly include the clustering method [8], local search [46], Tabu search [47], double-annealing [48], Lagrange relaxation [49], genetic algorithm [50], and artificial neural network method [51].
The K-means algorithm is one of the most common and widely used partitioning-based clustering algorithms due to its simplicity and ease of use [52]. It takes the distance as the similarity evaluation index [53]. Peña et al. [54] put forward that the initialization methods was important for the K-means algorithm. Razi [55] used the K-means algorithm to solve the facility location problem and selected a Pareto optimal location of the maintenance stations. Javadi and Shahrabi [56] adopted the K-means algorithm to investigate the facility location problem. Considering the geographical barriers, they evaluated the distance functions based on the demand point's allocation, the logistics cost, and the response time of facility. The local search heuristic is good at avoiding convergence to a locally optimal solution with a high computational efficiency. Kanungo et al. [57] presented a local search approximation algorithm by swapping the centers in and out to find a simple and practical solution for the K-means clustering problem. Zhang [58] proposed a 2 + √ 3 + τ polynomial-time approximation algorithm based on local search approach to solve the K-facility location problem. Zhang et al. [59] improved the approximation ratio to 25+τ by using multi-swap operation to solve the K-means problem with penalties.
Mainly, many researchers have studied the food supply chains problem and the joint distribution strategy. The distance between DCs and receivers and the delivery time of food materials were regarded as the main constraints in most distribution center location models. It is necessary to consider the heterogeneous difference of receivers to participate in the joint distribution.

III. METHODOLOGY
To reflect the subjective differences of restaurants towards the joint distribution strategy, a function of JDW is constructed in this section. The coverage radius of JDC and the relationship between the cost in the freight system are also be analyzed. Then the location model is formulated to minimize the fixed cost, the transportation cost, and the penalty cost.

A. JOINT DISTRIBUTION WILLINGNESS
The JDW can reflect the restaurant's acceptance of the joint distribution mode. It can describe the impedance relationship between JDC and restaurant. The function of JDW is constructed considering the price and the delivery time of the food materials provided by JDC. The price satisfaction is sensitive to the actual price with different elasticity. Without loss of generality, suppose that restaurant i has the highest acceptable price a max i and the lowest acceptable price a min i . The actual price of food materials provided by JDC j is a j . If a j is lower than a min i , the price satisfaction will obey a concave function distribution but not be greater than 1. The price satisfaction decrease linearly with the actual price if a j is within (a min i , a max i ). The price satisfaction value will be 0 if a j exceeds a max i . Fig. 2 (a) shows the relationship between the price satisfaction and the actual price of food materials.
Another key variable influencing JDW is the delivery time of the food materials. The restaurant has a high requirement of the delivery time in order to guarantee the food quality. Assume that the delivery time of the food materials complies with the hard time window requirements. Given the time window (t min i , t max i ) of restaurant i to accept the delivery of food materials. If the delivery time t ij from JDC j to restaurant i is earlier than t min i , the time satisfaction will be 1. If t ij is within (t min i , t max i ), the time satisfaction will decrease linearly with the delay of the delivery time. The time satisfaction will be 0 if t ij is later than t max i . The relationship between the time satisfaction and the delivery time is shown in Fig. 2 (b).
The increase of actual price and delivery time of food materials will decrease the JDW. Formula (1) defines the value of JDW by weighting two components. Formula (2) and formula (3) represent the satisfaction function of the food materials' price and delivery time. The range of parameters are defined in formula (4).
where ε ij is the JDW of restaurant i to accept the delivery service of JDC j, u ij is the price satisfaction function, v ij is the time satisfaction function, α, r 1 are the coefficients of the function.

B. RELATIONSHIP BETWEEN COSTS
The total cost in the logistic system mainly includes the fixed cost and transportation cost, which are related to the number and location of JDCs. The relationship between these costs are shown in Fig. 3 (a). The fixed cost is linear with the number of JDCs. The transportation cost occupies a large part in the total cost. Increasing the number of JDCs can reduce the transportation distance between JDCs and restaurants. But this will increase the distribution frequency and then increase the transportation cost. Whether to increase the number of JDCs to improve the distribution efficiency, or to reduce the number of JDCs to achieve the scale economies is a key matter in the facility location problem. In order to ensure the profitability of the joint distribution strategy, each JDC is given a coverage radius. The coverage radius influences the profit rate of JDC. The profit rate is the ratio of the operation revenue and cost generated by the joint distribution service, which is formulated in formula (5). The operation revenue is proportional to the capacity utilization of JDC. The transportation cost is proportional to the coverage radius of JDC, which is defined in formula (6). The relationship between the profit rate and coverage radius is shown in Fig. 3 (b). If the profit rate is higher than 1, the coverage radius range is within (R 1 , R 2 ). The coverage radius R * indicates that the profit rate of JDC reaches the maximum. When JDC is break-even, the coverage radius is expressed in formula (7). The penalty cost means the loss of profit if JDC misses the delivery to restaurant. It is proportional to the freight demand of the restaurant, which is defined in formula (8).
where Q is the capacity of JDC. θ is the utilization of capacity. b is the unit profit of the capacity. f is the fixed cost of opening JDC. c is the transportation cost. δ is the unit VOLUME 8, 2020 transportation cost, R is the coverage radius of JDC, p is the penalty cost, q is the food materials demand of restaurant.

C. MODEL FORMULATION 1) PROBLEM DESCRIPTION
In this model, there are m potential JDCs and n restaurants with different freight demand and JDW. The joint distribution is provided by the third-part logistics service company. The total cost includes the fixed cost for opening JDC, the transportation cost for providing the joint distribution service to restaurant and the penalty cost for missing the delivery.

2) ASSUMPTIONS
To facilitate modeling, several assumptions are made as follows: x JDCs provide one generic product to restaurants. y The location, the freight demand, and the JDW of each restaurant are given.
z The fixed cost of JDC in different location is the same.
{ The restaurant not be served will incur a penalty cost. | JDC can supply generic product to multiple restaurants. The restaurant can only receive the generic product from one JDC at most.
} The road condition and the unit transportation cost between JDCs and restaurants are the same.
The capacity of JDC can meet the freight demand of restaurants.
The number of opened JDCs has a limit. JDC has a coverage radius constraint. The assumptions x−{ are designed to simplify the model. The assumptions |−} are set to guarantee the single source of the food materials transported from JDC to restaurants. The assumptions~− constrain the service capability of JDC.

3) DEFINATION AND DESCRIPTION OF SYMBOLS
Indices, sets, parameters and decision variables are given in Table 1 as follows:

4) OBJECTIVE FUNCTION AND CONSTRAINTS
The objectives of the model are to determine: which locations should be set as JDCs; which restaurants are served by JDCs; and which restaurants are not served to minimize the total cost. The formulation of the model is shown in formula (9).
The constraints of the model are as follows: x ij ≤ y j , ∀i ∈ I , j ∈ J (11) m j=1 y j ≤ k (12) l ij x ij ≤ γ j , ∀i ∈ I , j ∈ J (14) Formula (9) represents the total cost including the transportation cost, the fixed cost and the penalty cost. Constrain (10) ensures that the restaurant should be served or punished. Constrain (11) guarantees that it exists restaurant be served by the opened JDC. Constrain (12) limits the number of JDCs. Constrain (13) limits the capacity of JDC. Constrain (14) restricts the coverage radius of JDC. Constrain (15)-(17) represent the 0-1 property of the decision variables. Constrain (18) represents the relationship between the sets of restaurants.

IV. THE IMPROVED K-MEANS ALGORITHM COMBINING LOCAL SEARCH WITH PENALTY A. ALGORITHM DESCRIPTION
Euclidean distance is one of the most popular cost function used in K-means algorithm. In order to divide the points into k clusters and minimize the sum of impedance distance within a cluster, the function in formula (19) are used as the cluster criterion in I-K-means algorithm. min i∈I j∈J The I-K-means algorithm has an effective iterative performance and a fast convergence speed. The local search can calculate the change of the objective function accurately. It can jump out of the local optimal solution to guarantee the optimality of the cluster result. The procedures of the I-K-means algorithm and local search with penalty are shown in Fig. 4.

B. SELECT THE INITIAL CLUSTER CENTERS
Two most important matters in I-K-means algorithm are the determination of k-value and the choice of initial cluster centers. The k-value represents the number of JDCs located within the research scope. It is determined by the total food materials demand of restaurants and the supply capacity of JDC in formula (20).
where ϕ is a parameter used to adjust the value of k to be integer and it ranges from 0 to 1. In order to obtain the scattered initial cluster centers and avoid the local optimization of the algorithm, the joint distribution distance is used to limit the distance between the initial cluster centers, which is calculated in formula (21).
where S represents the area of the research region. The steps of selecting the initial cluster centers are as follows: Step 1: Select a point randomly in the study area as the first cluster center to join in the set of initial JDCs.
Step 2: Select another point randomly and calculate the Euclidean distance from this point to the last cluster center. If the distance is greater than γ , this point is deemed as the second JDC; otherwise, select another point until the number of the initial JDCs is up to k.

C. CLUSTER AND ITERATE
The steps of the cluster and iteration are in follows: Step 1: Calculate the penalty cost p i , the impedance distance l ij from restaurant i to the nearest JDC j, and the transportation cost c ij . If l ij is smaller than γ j and c ij is smaller than p i , restaurant i will be served by JDC j. Otherwise, restaurant i will be punished.
Step 2: Calculate the total cost in formula (9).
Step 3: Determine whether the number of iterations reached the maximum. If not, go back to step 1 and keep iterating.

D. LOCAL SEARCH
The fastest descent strategy is used to select the optimal solution in a neighborhood of the local search operation. The neighborhood of a solution P is defined as Neighbour(P). The strategy aims to find the solution P satisfying P, P = arg max P ∈Neighbour(P) (C(P) − C(P )). This means that there is C(P ) < C(P) for any P ∈ Neighbour(P). The single-swap operation of the local search is used to swap the cluster center in and out. After these operations, the solution with the minimum total cost can be output. VOLUME 8, 2020

V. CASE STUDY AND NUMERICAL ANALYSIS A. FREIGHT SURVEY DATA
To test the model and I-K-means algorithm, the data used in this case study are from the real freight survey of 114 restaurants in Beijing, China. The location and the JDW of the restaurants were obtained from the survey. The data about the freight demand of restaurants can be obtained from our previous study [60]. The values of other parameters used in the algorithm are as follows: f = 15000, Q = 5000, b = 0.2, δ = 0.1, λ = 0.05, k = 4, gen max = 100. The unit of the cost is RMB. The unit of the distance is kilometer. The unit of the fright demand is kilogram. Fig. 5 shows the locations of JDCs and cluster results of restaurants. The pentacles with yellow color represent the location of JDCs. The restaurants are classified into five clusters and they are served by four JDCs. The points with red color represent the restaurants be punished. The points with other colors represent the restaurants served by four JDCs.

B. RESULT ANALYSIS
In order to analyze the stability and convergence of I-K-means algorithm, the total cost under different iteration numbers are shown in Fig. 6 (a). When the number of iterations is more than 80 times, the value of the total cost is stable.
The I-K-means algorithm combining local search has a fast convergence speed to the optimal solution and can avoid misclassification to guarantee the accuracy of cluster results.
The k-value is varied from 2-10 to analyze it impact on total cost. The total cost under different k-values are shown in Fig. 6 (b). When the k is 4, the total cost reaches the minimum.
The cluster result of I-K-means algorithm can make the similarity within the same cluster be the largest and the similarity among different clusters be the smallest. Two evaluation indicators are calculated to compare the cluster results between I-K-means algorithm and K-means algorithm. The indicator representing the mean intra-cluster distance in cluster j is defined in formula (22).
The mean intra-cluster distance after normalization is calculated in formula (23).
The indicator representing the mean inter-cluster distance among different clusters is defined in formula (24).
The mean inter-cluster distance after normalization is shown in formula (25). Table 2 shows the comparison of the indicators calculated in formula (23) and (25). Comparing with the traditional K-means algorithm, the intra-cluster distance in I-K-means algorithm is increased by 16.89% and the inter-cluster distance is decreased by 13.15%. The cluster performance of I-K-means algorithm is better than K-means algorithm.
In order to analyze the influence of JDW value on the location of JDC, four distribution functions including the uniform distribution, normal distribution, linear distribution and limit distribution are used to generate JDW value from 0-1. The locations of restaurants under different JDW values   are shown in Fig. 7. The blue color represents the a low JDW value and the red color represents a high JDW value.
After calculating formula (9) using I-K-means algorithm. The locations of JDCs and the cluster results of restaurants under different JDW values are shown in Fig. 8. The cluster result of restaurants are represented by the points with different colors. The pentacles represent the locations of JDCs.
It can been seen from Fig. 7 and Fig. 8 that for the restaurants with high JDW and located near JDC, JDCs are more inclined to provide joint distribution service to them. Due to the high cost to improve the JDW or the high transportation cost, JDCs will miss the deliveries of the restaurants with low JDW value and located far from JDC. However, if the restaurants have high JDW, JDCs can provide joint distribution service to them within their coverage radius.

C. MODEL APPLICATION
The proposed model and I-K-means algorithm can significantly improve the cluster results. The increased intra-cluster distance and reduced inter-cluster distance can save the transportation cost by nearly 30%. In practice, there involves many stakeholders of the joint logistics system in the restaurant industry. The scale economies of the joint distribution mode will increase as the number of participators increase.

VI. CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS
A novel model for the location selection problem is formulated to minimize the fixed cost, the transportation cost and the penalty cost. I-K-means algorithm is proposed to calculate the optimal solution of the model. The main conclusions of this work include: 1) The price and delivery time of food materials are considered in the integer programming model to reflect the willingness of restaurant to accept the joint distribution mode. 2) Numerical tests based on the survey data show that it is effective to use I-K-means algorithm to obtain a more optimal solution than K-means algorithm. The k-value and JDW value have influence on the location of JDCs. The results can improve the logistics efficiency, increase the potential market for joint distribution mode, and reduce the transportation cost in the restaurant industry. This can give guidance for the joint distribution center location problem in other industries.
The location problem of JDC in reality is far more complex than in the theoretical research. In the future research, the demand of food material can be considered in an uncertain environment. Other factors such as the drivers' payment and the routing planning can also be introduced in the model.