A Stochastic Local Search Algorithm for the Partial Max-SAT Problem Based on Adaptive Tuning and Variable Depth Neighborhood Search

The Partial Max-SAT (PMSAT) problem is an optimization variant of the well-known Propositional Boolean Satisfiability (SAT) problem. It holds an important place in theory and practice, because a huge number of real-world problems, such as timetabling, planning, routing, bioinformatics, fault diagnosis, etc., could be encoded into it. Stochastic local search (SLS) methods can solve many real-world problems that often involve large-scale instances at reasonable computation costs while delivering good-quality solutions. In this work, we propose a novel SLS algorithm called adaptive variable depth SLS for PMSAT problem solving based on a dynamic local search framework. Our algorithm exploits two algorithmic components of an SLS method: parameter tuning and neighborhood search. Our first contribution is the design of an adaptive parameter tuner that searches for the best parameter setting for each instance by considering its features. The second contribution is a variable depth neighborhood search (VDS) algorithm adopted for PMSAT problem, which our empirical evaluation proves is a more efficient w.r.t. single neighborhood search. We conducted our experiments on the PMSAT benchmarks from MaxSAT Evaluation 2014 to 2019, including more than 3600 instances which have been encoded from a broad range of domains such as verification, optimization, graph theory, automated-reasoning, pseudo Boolean, etc. Our experimental evaluation results show that AVD-SLS solver, which is implemented based on our algorithm, outperforms state-of-the-art PMSAT SLS solvers in most benchmark classes, including random, crafted, and industrial instances. Furthermore, AVD-SLS reports remarkably better results on weighted benchmark, and shows competitive results with several well-known hybrid PMSAT solvers.


I. INTRODUCTION
Partial Max-SAT (PMSAT) problem is an optimization variant of Propositional Boolean Satisfiability (SAT) problem, which is a fundamental problem in computer science and artificial intelligence [1], [2]. PMSAT problem is an NP-hrd problem that is important for the theory and practice of a range of applications, including timetabling [3], scheduling [4], planning [5], routing [6], software debugging [7], and bioinformatics [8]. Actually, many optimization problems can be naturally expressed as a PMSAT problem. PMSAT The associate editor coordinating the review of this manuscript and approving it for publication was Xujie Li . asks to find an assignment to the Boolean variables of a given Boolean formula expressed in the Conjunctive Normal Form (CNF), which satisfies all hard (mandatory) clauses and the maximum number of soft (non-mandatory) clauses. Maximum Boolean Satisfiability (Max-SAT) problem is a specialization of PMSAT problem, where all clauses are soft and the goal is to satisfy the maximum number of clauses. PMSAT is a designation given to Max-SAT problem with hard and soft clauses in 1996 by Miyazaki et al. [9].
There are two state-of-the-art approaches for solving PMSAT problem: exact methods and stochastic local search (SLS) methods. There are also hybrid methods that combined both exact and SLS methods [10]- [17]. Exact methods (also known as complete methods) implicitly enumerate all the solutions of the considered instance of an optimization problem using a search tree to explore the entire search space and to prove the satisfiability. Recently, almost all Max-SAT and PMSAT solvers that participated in MaxSAT Evaluation (MSE) are SAT-based [11], [16], [18]- [21], where a given problem instance is solved through successive calls to a SAT solver. Exact methods can provide optimal solutions to small or medium-sized problems with reasonable computational costs. However, many real-world applications, especially engineering and industrial applications, often involve far larger scales which exact methods cannot handle, hence the need for SLS methods [22]- [26].
SLS methods (also known as incomplete methods) have become more popular because of their ability to provide high-quality solutions to large-size problems with reasonable computation costs. An SLS is a local search method that incorporates a stochastic (i.e. randomness) property. Local search methods are general methods that are widely used to solve hard combinatorial optimization problems [27]- [31]. A local search method is defined by four main components: search space, neighborhood relation, objective function, and move method [30]. Each component may have one or more parameters that determine its functioning [32]. In a local search method for a propositional Boolean problem, and starting from a complete assignment, a neighborhood solution is obtained by flipping the truth value of one variable (1 − flip) or a small set of variables (k − flip). At each step, the neighborhood is examined for a truth assignment that decreases the number (or total weight) of unsatisfied clauses. If such an assignment is found, the algorithm flips the value of the corresponding variable (or set of variables), and continues the search until a stopping criterion is encountered [33], [34].
Our investigation shows that there are three main stateof-the-art PMSAT SLS-based methods: distinction-based method [35], configuration checking-based method [36], and dynamic local search [37] method. These SLS methods are built around two algorithmic components: a variable-pick heuristic and a weighting scheme. However, compared to the breakthrough progress of SLS methods on random and crafted benchmark instances, the performance of SLS methods on industrial benchmark instances lags far behind, especially on weighted industrial benchmark instances. Those instances are often large-sized instances that involve huge neighborhood size or highly complex structures.
In this paper, we introduce a novel SLS algorithm named adaptive variable depth SLS, that employs an extended framework of dynamic local search method. We chose the dynamic local search SLS method, because it showed competitive performance on unweighted industrial benchmarks in MSE 2018. Our method is based on the results of our study of the state-of-the-art PMSAT SLS methods' strengths and limitations. A problem of fundamental interest and practical importance is how to exploit SLS method components with respect to constraints in order to manage high complexities and improve algorithm performance [38]. In this work, we propose a novel method based on two components of an SLS method: parameter tuning and neighborhood search.
First, we propose an adaptive parameter tuner that searches for the best possible parameter values per instance. We studied the relationship between parameter setting and features of input instances, and found that parameter setting is related to some features of input instance. Second, we propose a variable depth neighborhood search (VDS) method [39] adopted for PMSAT SLS to explore very large neighborhoods. The VDS method defines a dynamically determined number of sequence of moves that varies from one iteration to another (i.e. variable k − flips), where for each step leading to a different trial solution, the compound move that yields the best trial solution is the one chosen [40]. Almost all SLS methods proposed for solving PMSAT problem are based on single neighborhood definition and single (1 − flip) move, whose performance declines when exploring huge neighborhoods for large-scale problem instances (i.e. industrial instances).
To evaluate the performance of our novel algorithm, we implemented the AVD-SLS solver. We compared AVD-SLS with the state-of-the-art PMSAT SLS solvers and PMSAT hybrid solvers that participated in the MSE 2014-2019, on a broad range of random, crafted and industrial benchmarks. AVD-SLS reported the best results compared with all of the PMSAT SLS solvers on weighted crafted and industrial benchmark instances. Compared to PMSAT hybrid solvers, AVD-SLS shows a competitive performance. Based on our experimental evaluation study, AVD-SLS ranked among the top three best PMSAT solvers for MSE 2015, 2016, 2017, 2018 on the weighted benchmark, and was also among the top three best PMSAT solvers for MSE 2014 and 2017 on the unweighted benchmark.
The reminder of this paper is organized as follows. The next section introduces preliminary knowledge. The related work are presented in Section III. Then, we introduce our AVD-SLS method in details in Section IV. The experimental evaluation study is presented in Section V. Finally, we elaborate on the results and conclude the paper in Section VI and Section VII respectively.

II. PRELIMINARIES
In this section, we briefly introduce the main notations, definitions, and background knowledge.
A Propositional Variable v i is a Boolean variable assigned a truth value (true or false).
A Literal l i is a propositional variable (v i ) or its negation (¬v i ).
Propositional Operators are logical connectives defined as a set of three operators {¬, ∧, ∨} that represent the negation, conjunction, and disjunction operators respectively.
A Clause c j is a disjunction of literals defined by k i=1 l i (e.g., c j = l 1 ∨ l 2 . . . ∨ l i ∨ l k ), where k represents the number of literals in each clause. A clause c j is said to be satisfied if c j evaluates to true, such that at least one literal l i ∈ c j is assigned to true; otherwise c j is said to be unsatisfied.
If c j is a hard (mandatory) clause, then it must be satisfied. Otherwise, c j is a soft clause that can be satisfied. A Weighted Clause is a pair (c j , w j ), where c j is a clause and w j is an associated positive number that represents the cost of c j unsatisfaction.
A Unit Clause c j is a clause that contains only one literal. A Conjunctive Normal Form (CNF) formula F consists of conjunction of clauses defined by m j=1 c j (e.g., F = c 1 ∧ c 2 . . . ∧ c j ∧ c m ), where m represents the number of clauses in the CNF formula F.
The Set of All Variables appearing in F is defined by where n represents the number of variables in F.
An Assignment α of a given CNF formula F sets the truth value of each variable The Cost of an assignment α is defined by cost(α), that denotes the number (or total weight) of the unsatisfied clauses under the current assignment α. We say that α 1 is better than The Score of a variable v i is defined by score(v i ), that denotes the increment of the number (or total weight) of satisfied clauses by flipping the truth value of a selected variable v i from true to false or vice versa. The score The property make(v i ) is defined as the total number (or total weight) of clauses that would become satisfied if variable v i is flipped and the break(v i ) is the total number (or total weight) of clauses that would become unsatisfied if variable v i is flipped.
The Hard Score of a variable v i is defined by hscore(v i ), that denotes the increment of the number (or total weight) of satisfied hard clauses by flipping the truth value of a selected variable v i .
The Soft Score of a variable v i is defined by sscore(v i ), that denotes the increment of the number (or total weight) of satisfied soft clauses by flipping the truth value of a selected variable v i .
A Decreasing Variable v i denotes that the score(v i ) > 0; if the score(v i ) < 0 then v i is said to be an increasing variable. hscore(v i ) > 0 denotes a hard decreasing variable v i and sscore(v i ) > 0 denotes a soft decreasing variable v i .
A Non-decreasing Variable v i , defined as 0 − score, denotes that the number (or total weight) of unsatisfied clauses is not changed by flipping the truth value of a selected variable v i . 0−hscore denotes a non-decreasing hard variable A Configuration Checking of a variable v i is the state that indicates whether any variable v j ∈ N (v i ) : i = j has been selected and flipped since v i was last flipped. If at least one variable v j ∈ N (v i ) : i = j has been flipped since v i was last flipped, then the configuration of v i is changed and configuration(v i ) = 1; otherwise, configuration(v i ) = 0.
(Weighted) Partial Maximum Boolean Satisfiability (PMSAT) problem is to find an assignment of truth values to V (F) that satisfies a partial set of clauses, called hard clauses, and minimizes the number (or weight) of unsatisfied soft clause. An assignment α for the PMSAT problem is feasible, if and only if, it satisfies all hard clauses. Minimizing (maximizing) the number of unsatisfied (satisfied) soft clauses is a measure of solution quality.
(Weighted) Maximum Boolean Satisfiability (Max-SAT) problem is to find an assignment of truth values to V (F) that minimizes (maximizes) the number (weight) of unsatisfied (satisfied) clauses.

III. RELATED WORK
Our investigation shows that there are three main state-ofthe-art SLS-based methods for PMSAT problem solving: distinction-based [35] method, configuration checking-based [36] method, and dynamic local search method [37]. These SLS methods are built around two algorithmic components: a variable-pick heuristic and a weighting scheme. SLS methods differ in the variable-pick heuristic selected as they employ priority-based variable-pick heuristics differently based on the scores of variables. Clause weighting is a weighting scheme that make SLS methods dynamic [41]. The weights are adjusted during the search to determine the search trajectory while maintaining reasonable weight differentials [42]. The following paragraphs review each of state-of-the-art SLS methods.
The distinction-based method was proposed by Cai et al. in 2014 [35], which distinguishes between hard and soft clauses by maintaining a weighting scheme for hard clauses only. The weighting scheme for hard clauses is called the hard pure additive weighting scheme (HPAWS), that was inspired by the pure additive weighting scheme (PAWS) algorithm proposed by Thornton et al. [41] to solve SAT problem. PAWS is a dynamic local search algorithm that has been used in many SLS algorithms [43]- [46]. Whenever the search stagnates and no improving neighborhood solutions are found, the weights of the unsatisfied (hard) clauses are adjusted to help escape from the local minimum. Additive weighting schemes show better scalability than other weighting schemes such as DLM [29] and SAPS [47]. HPAWS incorporates a diversification probability, called the smoothing parameter (sp is a real number that controls the hard clause weights), to decide whether to increase the weights of unsatisfied hard clauses or to decrease the weights of satisfied hard clauses.
Based on the concept of separation between hard and soft scores, the distinction-based method uses a multi-level priority-based approach for the variable-pick heuristic. Hard decreasing variables (hscore > 0) have the highest priority, and the hard variable with the greatest hard score may be selected. In the second priority level come the non-decreasing hard variables (0 − hscore) that are also soft decreasing (sscore > 0), and the variable with the greatest soft score may be selected. The third level indicates that the search stagnates and two actions take place. First, the weights of the hard clauses are updated according to HPAWS. Second, a random unsatisfied hard clause, if any, is chosen; otherwise, from a randomly selected unsatisfied soft clause, either a random variable v or the variable with the greatest soft score is selected based on a given random walk probability wp (a real number controls the activation of the random walk heuristic). The distinction-based is designed with a simple neighborhood definition and 1-flip move at each step of the search.
There are six PMSAT SLS solvers based on the distinction method: Dist [35], [48], HS-Greedy, 1 Dist1, 2 Dist2, 3 Distr, 4 and DistUP [48]. The parameters were tuned either with an automatic offline parameter tuning tool [42] or manually based on researcher's experience. Furthermore, some solvers adopted the Best from Multiple Selections (BMS) heuristic proposed in [49]. Generally, BMS is a probabilistic sampling method that selects randomly t candidate decreasing variables and returns the best one. The source code of distinctionbased solvers are not available online.
The configuration checking-based method was first proposed by Cai et al. in 2011 [50]. The configuration checkingbased method, similar to the distinction-based method, uses HPAWS for hard clauses weighting scheme and maintains separate scores for hard and soft clauses. The variable-pick heuristic is based on the configuration checking property of variables. The configuration checking property serves two purposes: to avoid cycling and to promote the diversification [50]. The configuration checking property is inspired by the tabu search mechanism [36], [50].
The configuration checking-based method starts the search with a probability p, to decide on whether to perform a random move that is biased towards the satisfaction of hard clauses or to switch to a greedy mode. In this mode, the configuration checking-based method uses a multi-level prioritybased approach for the variable-pick heuristic. The highest priority is for the hard decreasing variables (hscore > 0) whose configuration changed, and the hard variable with the greatest hard score may be selected. If there are no hard decreasing variables, the weights of the hard clauses are updated according to HPAWS. Then, the decreasing variable (score > 0) with the greatest score whose configuration changed may be selected, if any. Otherwise, when the search stagnates, a variable v from a random unsatisfied hard clause c, if any, is chosen or from a randomly selected unsatisfied soft clause c. The configuration checking-based method is similar to the distinction-based method, as it implements a simple neighborhood definition and 1-flip move at each step of the search.
However, different priority-levels were adopted in literature [36], [51], [52]. For example the hard clause state-based configuration checking (HCSCC) [51] forbidding mechanism that is similar to the one proposed by Luo et al. [53], but it emphasizes on hard clauses with a configuration that consists of the states of hard clauses instead of neighboring variables (i.e., captures the global effect of flipping a selected variable v). This variable-pick heuristic, improves the results on some real-world application of wighted PMSAT problem instances [51].
There are six SLS solvers proposed to solve the PMSAT problem which are based on the configuration checkingbased method: configuration checking local search (CCLS) [36], configuration checking with emphasis on hard clauses (CCEHC) [51], and hard neighboring variables with configuration checking (HNVCC) [52]. HNVCC is based on CCEHC with a new forbidding strategy [45] for hard variables only. Details of both solvers CCMPA 5 and SC2016 6 were not published. The parameters were tuned either with an automatic offline parameter tuning tool [42] or manually based on researcher's experience. Furthermore, some solvers adopted the Best from Multiple Selections (BMS) heuristic. And the source code of configuration checking-based solvers are not available online.
The dynamic local search method was proposed by Lei and Cai in 2018 [37]. Unlike the distinction-based and configuration checking-based methods, the dynamic local search method uses one weighting scheme (called Weighting-PMS) for both hard and soft clauses based on PAWS [41], but the increments of soft clauses' weights are limited by a maximum weight threshold ζ , to avoid favoring soft clauses over hard clauses, which may mislead the search. For the hard clauses, each hard clause is associated with value one for the weight that is then updated during the search by a constant amount (h i nc). Whereas the original input weights of the soft clauses are used. However, dynamic local search uses a simpler variable-pick heuristic. This variable-pick heuristic considers three break levels on updating the scores of the variables. SATLike [37], [54], [55] solver is based on this dynamic local search method and was the first SLS solver that competes on industrial benchmarks [56]. The parameters were tuned manually based on researcher's experience. Furthermore, the Best from Multiple Selections (BMS) heuristic was adopted by SATLike. The source code of SATLike solver is available online at MaxSAT Evaluation (MSE) 2018 website.

IV. ADAPTIVE VARIABLE DEPTH SLS ALGORITHM
The novel algorithm proposed in this work extends the stateof-the-art dynamic local search framework by incorporating two main components of an SLS method: parameter tuning and neighborhood search.
In local search algorithms, even relatively minor deviations from optimal parameter settings (i.e., parameter values) can lead to a substantial increase in the expected time at which a given problem instance is solved; this sensitivity seems to increase with the size and hardness of an instance to be solved [57]. In PMSAT SLS solvers, two main methods are used to tune parameters: manual tuning (i.e., based on a researcher's experience) and automatic offline parameter tuning [58]. The former is confronted with a serious difficulty; that is, a parameter setting that results from the tuning of some problem instances may not be applicable to other (difficult) instances of the same class. This means that this process is time consuming. By contrast, automatic offline parameter tuning methods determine parameter settings on the grounds of a representative set of benchmark instances (i.e., per-set tuning) during a training phase before algorithm deployment. Generally, these methods entail significant computational cost and experimentation effort, but their outcomes are nevertheless often reusable in a wide range of similar problems [59]. Some PMSAT SLS solvers [48], [51], [52] have been incorporated with an automatic offline parameter tuner called the sequential model-based algorithm configuration (SMAC) [60].
In contrast to automatic offline methods, adaptive parameter tuning is aimed at dynamically establishing the parameter values of an algorithm on the basis of feedback derived during its run. Such tuning is easier to deploy because limited user intervention is required. However, automatic adaptive methods may suffer from two major drawbacks: overspecialization if developed for a specific algorithm or problem and the requirement for additional parameters, which may expand the parameter domain that also requires tuning [58], [61].
Several online tuners have been proposed in the literature. McAllester et al. [62] proposed a tuning method for six different variable-pick heuristics based on a WalkSAT algorithm to solve the SAT problem. They found that when the noise parameter p is tuned well for each heuristic, the performance will be approximately the same. They called this a noise level invariant. They suggested that given the best performance of a heuristic, p can be tuned for the other heuristics for a given measure of performance. The proposed tuning method is based on the statistical properties of the search: mean and variance of the violation count (i.e., invariant ratio). Their empirical study showed that the best parameter tuning method is related to both the problem instance and the underlying algorithm.
Patterson and Kautz [63] extended the work of McAllester et al. [62], and proposed the Auto-WalkSAT tuning method. This method is based on the standard deviation of the invariant ratio and Brents method [64]. The tuner searches the invariant ratio space by two mathematical methods: recursive bracketed and parabolic interpolation. Then, the optimized p-value is used during the search. The experimental evaluation showed that the best parameter tuning method is related to whether p should be decreased on increased during the tuning process.
Hoos [65] proposed a simple self-tuning noise mechanism for the noise parameter p. This adaptive noise mechanism was applied for different variants of a WalkSAT algorithm. The basic idea is to increase the value of p to high values when the search stagnates, leading to more diversification. The empirical results showed that in some cases, the method significantly improves the results in comparison to the basic WalkSAT. Similarly, the reactive scaling and probabilistic smoothing (RSAPS) method, proposed by Hutter et al. [47], is also based on a simple self-tuning noise mechanism, but for the smoothing parameter p smooth that controls the weighting scheme. The experimental evaluation showed that the performance was better for the RSAPS method than the basic WalkSAT.
Two other research studies were based on Hoos's method [65]. Li et al. [66] proposed adaptG 2 WSAT, which combines the adaptive noise tuning in [65] and Look-Ahead for a variable-pick heuristic to solve the SAT problem. The empirical results show that adaptG 2 WSAT method significantly improved the results in comparison to G 2 WSAT and other variants of the basic WalkSAT. Furthermore, the adaptive memory-based local search (AMLS) method, which Lu and Hao [67] proposed for solving the Max-SAT problem, employs a variable-pick heuristic based on a tabu mechanism. The parameters p and p w are both adaptively tuned based on Hoos's method [65]. Moreover, the authors proposed an additional random perturbation operator for diversification. AMLS showed better results for some Max-SAT instances compared to the basic WalkSAT. However, the empirical evaluation showed that the initial setting of some other parameters may need to be better tuned.
Prestwich [68] proposed a reinforcement learning algorithm to automate the tuning process of the parameters using the average-reward method. The basic idea is to learn better noise levels against objective function values. That method considers the average progress towards a solution, which is called gain optimality. The experimental evaluation showed that this method achieves a better performance than other learning methods, such as Q-learning.
Based on the discussion presented above, we developed a novel adaptive parameter tuner (Section IV-A) that considers Hoos's tuning method [65] and a set of features for an input PMSAT problem instance for initial parameter setting selection.
How a neighborhood is defined and explored is a critical factor that affects the performance of local search algorithms. The larger the neighborhood, the better the local optima but the higher the computational cost. The key feature is to improve neighborhood search for large problem instances without explicitly evaluating all neighbors each time a search is trapped in a local minimum [69]. State-of-theart large neighborhood search (LNS) methods have shown outstanding results in solving various problems from different domains. An LNS search method can explore complex neighborhoods and find better candidate solutions in each iteration, thereby guiding searches toward more promising search paths [70], [71]. LNS methods are grouped into different categories, such as variable neighborhood search (VNS) and variable depth neighborhood search (VDS). In this work, we put forward a VDS algorithm for PMSAT problem solving. An AVD algorithm can search deeper neighborhoods in a heuristic way using only one parameter that controls the depth of the neighborhood search [39], [70]. Our proposed VDS algorithm is presented in Section IV-B.

A. ADAPTIVE PARAMETER TUNING
Building an adaptive parameter tuner for a PMSAT SLS algorithm is a challenging task for two main reasons: the time constraint involved and the number of parameters needed to be tuned. Our aim is to design a simple yet efficient tuning algorithm. In our proposed algorithm, we build an adaptive tuner for the underlying dynamic local search SLS solver to tune nine parameters. Four parameters are used in the weighting scheme, two random walk parameters used by a variable pick heuristic, one parameter for the BMS heuristic, one parameter that controls the depth of the VDS algorithm, and one parameter used by the search algorithm for diversification.
We have studied the relationship between the parameters' values and some features of input instances on the basis of a subset of weighted and unweighted benchmarks from MSE 2017 and 2018 (see Appendix A and Appendix B). For many instances, some parameter settings are more related to some features of an input instance. One of the most important features for our adaptive tuner is the hard ratio, which is the ratio of the number of hard clauses to the number of hard variables of an input PMSAT problem instance. Fig. 1 shows the main steps of the proposed adaptive parameter tuner. Starting from initial default parameters' values and extracted features of the input PMSAT problem instance (F), the tuner calls the underlying solver to solve F and find a solution (α). The tuner then checks the quality of the output solution, and if a feasible solution is found, the tuner stops and the variable adaptive SLS search starts. Otherwise, the tuner will re-tune each parameter separately and recall the solver again until a feasible solution is found or the time limit is reached. In this re-tune step of each parameter, the tuner is looking for improved solutions and a parameter that results in a better solution is set as a sensitive parameter of F and added to the list of sensitive parameters for future recall of adaptive parameter tuner.

B. VARIABLE DEPTH NEIGHBORHOOD SEARCH (VDS)
As presented in Section III, state-of-the-art PMSAT SLS algorithms rely on a single neighborhood search with single (1 − flip) move in each iteration.However, this technique fails to explore huge neighborhoods when solving largesized PMSAT problem instances. To tackle this limitation, we propose a variable depth neighborhood search (VDS) [39] adopted for PMSAT problem solving. To the best of our knowledge, this is the first research work adopted VDS for PMSAT problem solving. Algorithm 1 VDS Input: α*, a selected variable v i by adaptive variable depth SLS algorithm Output: return α*, cost* // let α* the best solution found so far // let cost the number (or total weight) of unsatisfied clauses of α and cost* is the cost of α* // let BEST a list of decreasing neighborhood variables 1: BEST ← v i 2: while depth > 0 do 3: if BEST = φ then 4: v best ← remove a random variable from BEST 5: α ← α with v best flipped 6: end if 8: depth ← depth − 1 9: end while 10: return α*, cost* The VDS (Algorithm 1) maintains a list called BEST that keeps deceasing neighborhood variables [line 1] of a selected variable v best . Starting with a selected variable v i by adative variable depth SLS algorithm (Algorithm 2), VDS flips the selected variable and then adds to the best list all of its decreasing neighborhood variables [lines [4][5][6]. In the next iteration, a random variable from the best list is selected, flipped, and then removed from the best list and all of its decreasing neighborhood variables are addded to the best list. The process is repeated until the depth is reached [line 2] or the best list becomes empty [line 3]. Finally, VDS returns the best solution (α*) found. We believe that our proposed VDS algorithm works simply and is naturally similar to Distinction-based method and Configuration Checkingbased method which maintain a list of hard decreasing variables, but without using any additional data structures for VDS. Algorithm 1 has a time complexity of O(depth) and a memory complexity of O(depth).

C. ADAPTIVE VARIABLE DEPTH SLS ALGORITHM
The complete algorithm of adaptive variable depth SLS is presented in this section. Our novel algorithm extends the framework of state-of-the-art dynamic local search algorithm SATLike [37] by using two main components: adaptive parameter tuning and variable depth search (VDS). The adaptive variable depth SLS (Algorithm 2) works as follows. First, an initial assignment α is randomly generated with unit propagation-based decimation preprocessing [line 1]. Second, the adaptive parameter tuner is activated [lines [8][9] before the search begin as described in section IV-A.
Then the search begins; based on the variable-pick heuristic, a decreasing variable v is selected, if any. Otherwise, the weighing scheme is activated, and the best variable v is chosen from c a randomly selected unsatisfied hard clause, if any. If not, it is chosen from a randomly selected unsatisfied VOLUME 9, 2021  soft clause [lines [11][12][13][14][15][16][17][18][19][20][21]. The selected variable v is then sent to VDS (Algorithm 1) [line 22]. VDS explores the (huge) neighborhood deeply with a variable number of moves (k − flips) at each iteration as described in Section IV-B. If no feasible solution is found before the cutoff -time is reached, the adaptive parameter tuner is re-activated to re-tune the parameters [lines [8][9]. Algorithm 2 has a time complexity of O(N ) and a memory complexity of O(N + C), where N represents the total number of variables, and C represents the total number of clauses.

V. EXPERIMENTAL EVALUATION OF ADAPTIVE VARIABLE DEPTH-BASED SLS SOLVER
In this section, we empirically evaluate our algorithm presented in Section IV by implementing a PMSAT SLS solver that is based on it. The solver is called AVD-SLS, which is evaluated on both unweighted and weighted PMSAT benchmarks from MaxSAT Evaluation (MSE) 2014-2019. In this evaluation,, we refer to the unweighted PMSAT benchmark as PMS and to the weighted PMSAT benchmark as WPMS. The reported results are then compared to the results of the PMSAT solvers that participated in MSE 2014-2019. In this experimental evaluation, we do not consider the published results of state-of-the-art PMSAT SLS solvers because of two main reasons. The first reason is the inconsistency of published results. A solver was ran more than once on each benchmark with different results reported for each run, although all adopted the MSE rules including the ''run once'' rule, as described in Section V-A under Evaluation Methodology. Moreover, detailed results are not available for comparison with per-instance solution for the number of solved instances, best solutions found, and best family results. Algorithm 2 Adaptive Variable Depth SLS Input: PMSAT problem CNF instance F, initial parameter setting, cutoff -time Output: return α*, cost* if α* is a feasible solution. Otherwise print ''No solution found'' // let α* the best solution found so far // let cost* the number (or total weight) of unsatisfied clauses of α* 1: α ← a randomly-generated complete initial assignment of F with UP-based decimation preprocessing 2: α* ← α 3: cost* ← +∞ 4: while elapsed time < cutoff -time do 5: if unsatisfied hard clauses and cost(α) < cost* then 6: α* ← α, cost*= cost(α) 7: end if 8: if tuner activated then 9: new parameter setting ← tuner(current parameter setting, α*) 10: end if 11: if ∃ decreasing variables then 12: v ← a variable with the greatest score; breaking ties for the one that is least recently flipped 13: else 14: update weights of clauses 15: if ∃ unsatisfied hard clauses then 16: c ← a random-seleted unsatisfied hard clause 17: else 18: c ← a random-seleted unsatisfied soft clause 19: end if 20: v ← the variable from c with greatest score; breaking ties for the one that is least recently flipped 21: end if 22: VDS( α*, v); 23: end while 24: return α*, cost* The second reason is that the state-of-the-art SLS solvers that participated in MSE 2014-2019 showed better results than the published ones.

A. EXPERIMENTAL SETUP
Our experiments were conducted on Shaheen, a supercomputer consisting of a 36 rack Cray XC40 system. The frontend environment is running SUSE Linux Enterprise Server 15.The system has 6,174 dual sockets compute nodes based on 16 core Intel Haswell processors running at 2.3GHz. Each node has 128GB of DDR4 memory running at 2300MHz. 7 AVD-SLS solver was implemented in C++ and compiled by g++ with '−O3' option. The execution environment for MSE 2014-2019 benchmarks are shown in Table 1. The time to find the best solution is not considered here, since the processing time is machine-dependent. As such, it was not considered by the latest MSE since 2017. 7 https://www.hpc.kaust.edu.sa/content/shaheen-ii

1) BENCHMARKS
We consider in this experimental evaluation all the benchmarks from incomplete track of MSE 2014-2019 for two time limits 60 and 300 CPU seconds. The MSEs are affiliated events of the International Conference on Theory and Applications of Satisfiability Testing (SAT) that is held every year since 2006 [72]. The MSEs are devoted to empirically evaluate MaxSAT and PMSAT solvers, and to publish public benchmarks. There are two main tracks in the MSE: the complete track and the incomplete track. The complete track includes all complete solvers that are based on exact methods and the incomplete track for hybrid and local search solvers. Table 2 and Table 3 Table 15 -Appendix B. We have excluded these two subsets of instances from this evaluation study; thus, a total of 3476 instances were used in this evaluation: 1665 instances from the PMS benchmark and 1811 from the WPMS benchmark.

2) EVALUATION METHODOLOGY
We follow for each MSE the same methodology adopted in the incomplete track: • Each solver is executed once on each instance within a time limit which is set to 60 CPU second for MSE 2017-2019 benchmarks and to 300 CPU seconds for MSE 2014-2019 benchmarks.
• In each run, the solver prints successively the best solution it has found so far.
• The total number of instances in each benchmark is denoted by #inst. and number of families by #families.
• For each solver on each benchmark, we report number of solved instances denoted by #sol., within parentheses the number of best solutions denoted by (#wins), and score. The best results is presented in bold font face for each benchmark and each evaluation criteria. In this work, we consider three evaluation criteria for each MSE result: number of solved instances, number of best solutions found, and score. Score is based on the cost of the output solutions and measures how far are the solutions found by a solver from the best ones, taking into account the number of solved instances. Solvers with a higher number of best solutions and solved instances have higher scores.

4) INITIAL PARAMETER SETTING FOR ADAPTIVE TUNER
The following nine parameters' values are adjusted by the adaptive tuner starting from initial default values adopted from [37] and based on our experimental evaluation on tuning our solver on a subset of benchmark instances from MSE 2017-2018 (Appendix A and Appendix B). However, based on our experimental evaluation, we set the initial parameters' values for some PMS and WPMS instances to different initial values based on an instance hard ratio (i.e. ratio of number of hard clauses to hard variables). We found that for some different ranges of hard ratios, the PMS initial parameters' values with the value of max_non_improve_flip = 65M and depth = 3 are better as initial values. The hard ratio ranges that have been found to be better with this initialization values are 1.9 to 9.5 for WPMS instances, and 2 to 6.25 for PMS instances. Then, the adaptive tuner will adjust the initialized values to guide the search towards better (feasible) search regions. The adaptive tuner is called before the search begin and then whenever the max_non_improve_flip is reached and no feasible solution found. We set the maximum time limit for the adaptive tuner to 10% of time limit for each call based on our experiments.

B. PERFORMANCE EVALUATION OF AVD-SLS SOLVER
In this section, we evaluate the performance of AVD-SLS solver on both PMS and WPMS, and on each class: random, crafted, and industrial instances. Then, we further focus on the evaluation results based on benchmarks' families. We classify the performance of AVD-SLS based on the results into 5 different classes as shown in Table 4 and Table 5. We consider AVD-SLS fails to solve a benchmark family and categorize a benchmark family as a hard family, if AVD-SLS is unable to solve more than 50% of benchmark's instances for 300 CPU seconds. Fig. 2 shows the overall performance results of AVD-SLS on both PMS and WPMS. For PMS, the total number of solved instances is 1492 (89.55%), and the average time to find the best solution is below 50 CPU seconds. On the other hand, for WPMS the total number of solved instances is 1774 (97.96%), and the average time to find the best solution is also below 50 CPU seconds as the case of PMS. The results indicate that AVD-SLS, in general, is an efficient solver w.r.t time and percentage of solved instances. Moreover, AVD-SLS remarkably perform better on WPMS than PMS.

1) AVD-SLS RESULTS ON PMS
For random class, AVD-SLS was able to solve all the instances from random PMS except one instance that was never solved by any solver. The average time to solve all random PMS instances is 2.35 seconds. Table 4 shows that AVD-SLS has an excellent performance on random families, that were completely solved by AVD-SLS.
For crafted class, AVD-SLS was able to solve 94.13% of instances from crafted PMS. The average time to solve crafted PMS instances is 44.34 seconds. Table 4 shows that AVD-SLS has an excellent performance on 88.5% of crafted families, that were completely solved by AVD-SLS. However, AVD-SLS fails to solve many instances from reversi family (multi-agent endgame). Excluding pseudo-Booleanprimes family (as it was never solved by any solver), Fig. 3 shows that the number of solved instances from reversi family is increased with time.
For industrial class, AVD-SLS was able to solve 81.26% of instances from industrial PMS and the average time to solve them is 63.49 seconds. Table 4 shows that AVD-SLS has an excellent performance on 55.17% of industrial families.   However, five industrial families have fail performance classification. AVD-SLS was unable to solve any instances from atcoss mesat (air traffic controller shift scheduling) and circuit trace compaction families. Des (diagnosis of discrete event systems) family and atcoss-sugar also reported very poor results. However, the hs-timetabling family has only two instances where only one is solved, which results in 50%. As shown in Fig. 3, we can see that as the time increased, the numbers of solved hard instances from industrial families are modest or non-existent.

2) AVD-SLS RESULTS ON WPMS
For random class, AVD-SLS was able to solve all the instances from random WPMS except one instance that was never solved by any solver. The average time to solve all random WPMS instances is 2.11 seconds. Table 5 shows that AVD-SLS has an excellent performance on random families, that were completely solved by AVD-SLS.
For crafted class, AVD-SLS was able to solve 97.95% of instances from crafted WPMS. The average time to solve crafted WPMS instances is 54.74 seconds. Table 5 shows that AVD-SLS has an excellent performance on 92% of crafted families, that were completely solved by AVD-SLS. However, AVD-SLS was unable to solve more than 50% of mip-lib-mps family from PB domain. However, two miplib instances were never solved before, and we can see that (Fig. 4) as the time increased, the number of solved instances from miplib family is increased.
For industrial class, AVD-SLS was able to solve 96.20% of instances from industrial WPMS and the average time to solve them is 86.30 seconds. Table 5 shows that AVD-SLS has an excellent performance on 72.72% of industrial families. However, none of robot navigation family three instances was solved by AVD-SLS. And, as shown in Fig. 4 as the time increased, the numbers of solved hard instances from industrial families robot navigation and shift design are modest or non-existent.
In concluding this section, we show that AVD-SLS has an excellent performance results on both PMS and WPMS. However, AVD-SLS performing remarkably better on WPMS than PMS; especially on industrial families. Out of 125 benchmark families, AVD-SLS shows an excellent performance results on about 80% of benchmark families which completely solved by AVD-SLS. On the other hand, few hard instances were solved over time.

C. COMPARISON OF AVD-SLS WITH STATE-OF-THE-ART SLS SOLVERS
AVD-SLS is compared to each SLS solver participated in MSE 2014-2019. The results on PMS are shown in Table 6 and Table 7. The detailed results by family-based benchmarks for PMS instances are shown in Appendix C. We highlight here best and worse results of AVD-SLS on family-based benchmarks compared to SLS solvers results.
For PMS 2014: Most SLS solvers were able to solve random instances and AVD-SLS was the best SLS solver for all three evaluation criteria except for total number of solved crafted instances, where Dist solver was able to solve two more instances. Dist was the second best solver that is competitive for random and crafted benchmarks. However, AVD-SLS is a prominent solver for merged benchmark for all three evaluation criteria. For familybased results as shown in Table 16 -Appendix C, AVD-SLS has the best number of solved instances and average score with 20 best families, where second come Dist with 10 best families. Moreover, only Dist and AVD-SLS were able to solve a number of instances from atcoss-sugar family. And all SLS solvers in this MSE failed to solve any of atcoss-mesat and circuit-trace-compaction families instances.
For PMS 2015: Most SLS solvers were able to solve random instances. For crafted and industrial benchmarks, AVD-SLS, Dist2 and DistUP were competitive. For the merged benchmark, Dist2 solver was able to solve one more instance but AVD-SLS have best number of best solutions and best score. Dist2 was the second best solver that is competitive for random and crafted benchmarks. For family-based results as shown in Table 17 -Appendix C, AVD-SLS has the best number of solved instances and average score with 20 best families, where second come Dist2 with 10 best families. Moreover, all Distinction-based solvers, CCEHC and AVD-SLS were able to solve a number of instances from atcosssugar family, where CCEHC has the best results. For Des family, most of SLS solvers including AVD-SLS were able to solve a number of Des instances, where DistUP has the best results. And as in MSE 2014, all SLS solvers in this MSE failed to solve any of atcoss-mesat and circuit-tracecompaction families instances. VOLUME 9, 2021  For PMS 2016: Similar to PMS 2014 and 2015, most SLS solvers were able to solve random instances. For all benchmarks, AVD-SLS was the best solver for all three evaluation criteria. Dist was the second competitive solver. For familybased results as shown in Table 18 -Appendix C, AVD-SLS has the best number of solved instances and average score with 17 best families, where second come Dist with 10 best families. Moreover, Dist, CCEHC and AVD-SLS were able to solve a number of instances from atcoss-sugar family, where CCEHC has the best results. For Des family, most of SLS solvers including AVD-SLS were able to solve a number of Des instances, where Dist has the best results. And as in MSE 2014 and 2015, all SLS solvers in this MSE failed to solve any of atcoss-mesat and circuit-trace-compaction families instances.
For PMS 2017: AVD-SLS was the best solver for all three evaluation criteria. The results of CCEHC were improved as time limit increased to 300 seconds. In this MSE, AVD-SLS is a prominent solver. For family-based results as shown in Table 19 -Appendix C, AVD-SLS has the best number of solved instances and average score with 14 best families, where second come CCEHC with 6 best families. Moreover, AVD-SLS and CCEHC were able to solve a number of instances from atcoss-sugar family. And all SLS solvers in this MSE failed to solve any of atcoss-mesat, Des, and closesolutions (from Satisfiabily domain) families instances.
For PMS 2018: AVD-SLS was the best solver for all three evaluation criteria. And the number of best solutions found so far were approximately more than 1.7 times for both solvers as time limit increased to 300 seconds. It is remarkable to see that the number of solved instances are not increased relatively as the number of best solutions found is increased. For family-based results as shown in Table 20 -Appendix C, AVD-SLS has the best number of solved instances and average score with 14 best families, where second come SATLike with 8 best families. AVD-SLS was able to solve a number of reversi instances that was never solved by SATLike. Moreover, AVD-SLS and CCEHC were able to solve a number of instances from atcoss-sugar family. And both AVD-SLS and SATLike in this MSE were failed to solve any of atcoss-mesat, Des, and close-solutions families instances.
For PMS 2019: AVD-SLS is a competitive solver, although no other SLS solver participated in this evaluation. The scores and number of best solutions are better than some other well-known hybrid solvers. In this evaluation, AVD-SLS has the best number of solved instances and average score with 7 best families as shown in Table 21 -Appendix C. As AVD-SLS is the only SLS solver in this section, the results shown in Table 7 -section PMS_2019 -are based on the best solution found so far for each instance by any SLS solver during the past MSE 2014-2018. If an instance is never tested before by any SLS solver, we set the solution found by AVDS-SLS as the best solution found so far (about 12% of unweighted benchmark instances were never tested before by SLS solvers). In this evaluation, AVD-SLS failed to solve any of atcoss-mesat, and pseudo-Boolean-primes families instances, that never solved by any SLS solver. Three new benchmark families were included in this MSE, where AVD-SLS perform very well and the reported results has a score range from 0.860 to 0.987 as shown in Table 21 -Appendix C. • the colored part (e.g. light violet for AVD-SLS) represents the percentage (%) of total number of best solutions found by a solver.
From this summary, we show that AVD-SLS has the best results throughout MSE 2014-2019 w.r.t MSE three evaluation measures: number of solved instances, score and number of best solutions found. AVD-SLS report the best performance results on many graph-theory families (such as max-clique, maxcut, ramsey, and set-covering), on some verification families (such as mbd and bcp-syn), and on some optimization problems such as uaq (User Authorization Query problem). However, the hardest PMS families for AVD-SLS are: atocss-mesat, circuit trace compaction, and robot navigation. However competitive PMSAT SLS solvers such as AVD-SLS, SATLike, Dist, and CCEHC were able to solve some instances from other hard benchmark families such as atcoss-sugar, des, close-solutions, and reversi.
Next, the results on WPMS are shown in Table 8 and Table 9. The detailed results by family-based benchmarks for WPMS problem instances are shown in Appendix D. We highlight here best and worse results of AVD-SLS on family-based benchmarks compared to SLS solvers results.
For WPMS 2014: AVD-SLS was the best SLS solver for all three evaluation criteria for industrial and merged benchmarks. However the number of best solutions found for random benchmark is the worse among all solvers. For crafted benchmarks, Dist has the best score. However, AVD-SLS was the best solver on crafted benchmark for two evaluation criteria. For family-based results as shown in Table 22 -Appendix D, AVD-SLS has the best number of solved instances and best average score with 19 best families, where second come Dist and CCMPA with 6 best families. We found that all SLS solvers have reported the best results for two auctions benchmark families. Moreover, AVD-SLS was able to solve more instances from hard pseudo-miplibmps and hs-timetabling benchmark families than other SLS solvers. And for upgrade-ability problem family, AVD-SLS was the only solver that solved all instances, where non solved by other SLS solvers. It is remarkable that both Dist and Distr were not able to solve any instance from frb, ramsey, and maxcut benchmark families.
For WPMS 2015: Most SLS solvers were able to solve random instances but AVD-SLS, as in MSE 2014, has the worse number of best solutions among all solvers. For crafted benchmark, CCEHC has the best score. However, AVD-SLS was the best solver for crafted benchmark for two evaluation criteria and was the best solver for all three evaluation criteria on industrial and merged benchmarks. For family-based results as shown in Table 23 -Appendix D, AVD-SLS has the best number of solved instances and best average score with 17 best families, where second come CCEHC with 7 best families. And as in MSE 2014, all SLS solvers have reported the best results for two auctions benchmark families. Moreover, AVD-SLS were able to solve more instances from hard hs-timetabling benchmark families than other SLS solvers. Both HS-Greedy and AVD-SLS were able to solve all instances from timetabling family.
For WPMS 2016: For the random benchmark, only three solvers have the best results: CCEHC, Ramp, and SC2016. For crafted benchmark, CCEHC has the best score. However, AVD-SLS was the best solver on crafted benchmark for two evaluation criteria and was the best solver for all three evaluation criteria on industrial and merged benchmarks. For family-based results as shown in Table 24 -Appendix D, AVD-SLS has the best number of solved instances and best average score with 22 best families, where second come Ramp with 7 best families. And as in MSE 2014 and 2015, all SLS solvers have reported the best results for two auctions benchmark families. And for abstraction-refinement family, AVD-SLS was the only solver that solved all instances, where non solved by other SLS solvers. Also, HS-Greedy were able to solve more instances from hard hs-timetabling benchmark family than AVD-SLS solver, that was the best for this family.
However, AVD-SLS was the only solver that was able to solve all instances from relational-inference family. And as in MSE 2014, both Dist and Dist-r were not able to solve any instance from frb, ramsey, and maxcut benchmark families.
For WPMS 2017: AVD-SLS was the best solver for all three evaluation criteria. The results of all solvers has very slight improvement as time limit increased to 300 seconds. In this MSE, AVD-SLS is a prominent solver. Generally, many good-solutions were found by SLS solvers which indicated by the enhanced score. For family-based results as shown in Table 25 -Appendix D, AVD-SLS has the best number of solved instances and best average score with 14 best families, where second come CCEHC with 6 best families. Moreover, AVD-SLS the only solver that was able to solve all instances from min-width family. And AVD-SLS solved a number of instances from pseudo-miplib-mps and shift design hard benchmark families, where non solved by other SLS solvers.
For WPMS 2018: AVD-SLS was the best solver for all three evaluation criteria. And the number of best solutions found so far were approximately doubled for AVDS-SLS and 4.4 times increased for SATLike as time limit increased to 300 seconds. It is also remarkable as for PMS, that the number of solved instances are not increased relatively as the number of best solutions found increased by double or more. For family-based results as shown in Table 26 -Appendix D, AVD-SLS has the best number of solved instances and best average score with 11 best families, where SATLike was the best for 7 families. For causal-discovery family, AVD-SLS was the best and found optimal solutions for all instances (zero cost). Also, AVD-SLS was able to solve more instances from hard hs-timetabling benchmark family. The hardest families that never solved by SLS solvers in this MSE are: robot navigation and pseudo-miplib-mps families.
For WPMS 2019: AVD-SLS is a competitive solver, although no other SLS solver participated in this evaluation. The scores and number of best solutions are better than some other well-known hybrid solvers. In this evaluation, AVD-SLS has the best number of solved instances and average score with 5 best families as shown in Table 27 -Appendix D.
As AVD-SLS is the only SLS solver in this MSE, the results shown in Table 9 -section WPMS_2019 -are based on the best solution found so far for each instance by any SLS solver during the past MSE 2014-2018. If an instance is never tested before by any SLS solver, we set the solution found by AVDS-SLS as the best solution found so far (about 20% of weighted benchmark instances were never tested before by SLS solvers). The hardest families that never solved by SLS solvers in this MSE are: robot navigation and shift design families. Fig. 6 shows the performance comparison on MSE 2014-2018 crafted and industrial instances, where PMSAT SLS solvers have been participated. For each column of each solver per year: • length of a column represents the percentage (%) of total solved instances per year.
• the colored part (e.g. light violet for AVD-SLS) represents the percentage (%) of total number of best solutions found by a solver. VOLUME 9, 2021 From this summary, we show that AVD-SLS has the best results throughout MSE 2014-2019 w.r.t MSE three evaluation measures: number of solved instances, score and number of best solutions found. AVD-SLS report the best performance results on many graph-theory families (such as maxcut, and set-covering), on realizability optimization problems such as power distribution, and other optimization problems such as random network, min-width, abstraction refinement, causal discovery, and CSG families. However, the hardest PMS families for AVD-SLS are: robot naviagtion and shift design. Competitive PMSAT SLS solvers such as AVD-SLS, SATLike, Dist, and CCEHC were able to solve some instances from other hard benchmark families such as pseudo-miblip-mps and hs-timetabling.
We conclude this experimental evaluation results of AVD-SLS and PMSAT SLS solvers that participated in MSE 2014-2019 on both PMS and WPMS with more than 3400 instances, that AVD-SLS reported the best results and outperform all PMSAT SLS solvers, based on the evaluation criteria presented in this study that were adopted from MSE 2014-2019. Generally, AVD-SLS improves the quality of found solutions when the time limit increased. The minimum score for AVD-SLS in this evaluation, as shown in Fig. 5 and Fig. 6, ranges from 0.820 to 0.900. This evaluation study shows that AVD-SLS perform remarkably better on WPMS instances.  Table 10 and Table 11. In this section, we plot distribution of scores per instances figures for each MSE; the same used by latest MSE since 2017. In each figure, each solver results are represented by a curve, where each point represent the score of each solved/unsolved instance. We encoded each solver results with a unique color, where results of AVD-SLS are represented with a violet color in all figures. We ordered PMSAT solvers in the legend from best-scored to worsescored.
For PMS 2014: Both AVD-SLS and Dist have best results on random instances. However, AVD-SLS has the best number of best solutions for crafted benchmark. AVD-SLS is competitive to a number of hybrid solvers. Based on score measure results shown in Fig. 7, AVD-SLS is the third best solver in this Evaluation after WPM-2014-in and optimax2-rn-i. For family-based results as shown in Table 16 -Appendix C, AVD-SLS has the best average score for 4 benchmark families.
For PMS 2015: Most SLS solvers have best results on random instances. AVD-SLS is competitive to a number of hybrid solvers. Based on score measure results shown in Fig. 8, AVD-SLS is ranked the fourth best solver in this Evaluation. For family-based results as shown in Table 17 -Appendix C, AVD-SLS has the best average score for 2 benchmark families.
For PMS 2016: For random benchmark, almost all SLS solves random instances with best solutions. For crafted benchmark, AVD-SLS was able to solve the maximum number of instances. Based on score measure results shown in Fig. 9, AVD-SLS ranked in the fifth position in this Evaluation. For family-based results as shown in Table 18 -Appendix C, AVD-SLS has the best average score for 4 benchmark families.
For PMS 2017: For 60 seconds time limit, AVD-SLS has the best number of best solutions and ranked the first based on score measue. For 300 seconds, maxroster solver results improved significantly as the time increased, and AVD-SLS ranked as the second best solver as shown in Fig. 10. For family-based results as shown in Table 19 -Appendix C, AVD-SLS has the best average score for 3 benchmark families.
For PMS 2018: AVD-SLS is a competitive solver with number of hybrid solvers, where the best number of best solutions is achieved by AVD-SLS for both time limits. AVD-SLS ranked as the third best solver for 60 seconds time limit and fourth for 300 seconds time limit as shown in Fig. 11; maxroster solver results improved significantly as the time increased. For family-based results as shown in Table 20 -Appendix C, AVD-SLS has the best average score for 4 benchmark families.
For PMS 2019: Similar to MSE 2018 results, AVD-SLS has shown it is a competitive solver with number of hybrid solvers, where the best number of best solutions is achieved by AVD-SLS for both time limits. The increasing of best solutions found by AVD-SLS is remarkable as the time limit increased. Based on score measure results shown in Fig. 12, AVD-SLS was competitive for a number of hybrid solvers. For family-based results as shown in Table 21 -Appendix C, AVD-SLS has the best average score for 7 benchmark families.
Next, the results on WPMS Benchmark are shown in Table 12 and Table 13.
For WPMS 2014: Only SAT4J-ms-inc solver was unable to solve the random instances. However, AVD-SLS shows competitive results and has the best number of best crafted solutions. For industrial instances, AVD-SLS shows competitive results with best performing hybrid solvers as shown in Fig. 13, where AVD-SLS ranked as the fourth best solver. For family-based results as shown in Table 22 -Appendix D, AVD-SLS has the best average score for 8 benchmark families.
For WPMS 2015: AVD-SLS is ranked the first based on score measure results as shown in Fig. 14 and was able to VOLUME 9, 2021  solve the maximum number of instances on all classes. For family-based results as shown in Table 23 -Appendix D, AVD-SLS has the best average score for 5 benchmark families.
For WPMS 2016: Similar results to MSE 2015 are reported to AVD-SLS. AVD-SLS was able to solve the maximum number of instances on all classes and ranked here as the third best solver after WPM3-2015-in, based on score measure results shown in figure 15. For family-based results as shown in Table 24 -Appendix D, AVD-SLS has the best average score for 7 benchmark families.
For WPMS 2017: For 60 seconds time limit, AVD-SLS has the best number of best solutions and ranked as the best solver. For 300 seconds time limit, AVD-SLS ranked the third based on score measure results shown in Fig. 16. For family-based results as shown in Table 25 -Appendix D, AVD-SLS has the best average score for 2 benchmark families.
For WPMS 2018: AVD-SLS is a competitive solver with number of hybrid solvers, and is ranked here as the third best solver for 300 seconds time limit as shown in Fig. 17. While AVD-SLS is the best solver for 60 seconds time limit based on score measure and number of best solutions. For family-based results as shown in Table 26 -Appendix D, AVD-SLS has the best average score for 5 benchmark families.
For WPMS 2019: In this evaluation, no SLS solver was participated. A number of well-known hybrid solvers participated in this evaluation such as TT-Open-WBO-Inc, Loandra, LinSBPS2018, and Open-WBO solvers variants. However, AVD-SLS was able to compete with a number of solvers on the number of best solutions found as shown in Fig. 18. For family-based results as shown in Table 27 -Appendix D, AVD-SLS has the best average score for 5 benchmark families.
We conclude this experimental evaluation of AVD-SLS and PMSAT solvers that participated in MSE 2014-2019 on both PMS and WPMS with more than 3400 instances, that AVD-SLS is a competitive solver as shown in Fig.s         domains under three classes: random, crafted, and industrial instances.
Furthermore, the MSE represent the evolution of PMSAT solvers development, which can be traced and studied throughout the years. State-of-the-art solvers have shown great advancements in solving PMSAT problem instances. However, the evaluation results of MSE 2014-2019 show that many benchmarks instances related to weighted industrial applications are still beyond the capacity of existing PMSAT SLS solvers. This present work is a contribution that aims to advance the development of PMSAT SLS solvers. Our proposed solver incorporates an adaptive parameter tuner and a variable depth neighborhood search (VDS) method adopted for solving PMSAT problem, which were combined with the dynamic local search solver SATLike. To the best of our knowledge, our proposed components have never adopted previously for PMSAT solving by SLS methods.
In this study, we evaluate the performance of our AVD-SLS solver that is implemented based on the proposed algorithm presented in Section IV. First, we evaluate the performance of AVD-SLS on all unweighted and weighted instances. Second, as the main goal of this paper, we compare the results obtained by AVD-SLS to those drawn by state-of-the-art SLS solvers that participated in the MSE 2014-2019. Finally, we compare the results of AVD-SLS to those obtained by all state-ofthe-art PMSAT solvers that participated in MSE 2014-2019,  including many well-known hybrid solvers. Almost all hybrid solvers are SAT-based solvers [18].
We classify the performance of the AVD-SLS solver in this study based on the percentage of solved instances per bench-mark family into five different classes, as shown in Table 4 and Table 5: excellent (100%), very good (80-99%), good (66-79%), poor (51-65%), and fail (0-50%). We consider that AVD-SLS has failed to solve a benchmark family and  categorize the benchmark family as hard if AVD-SLS is unable to solve more than 50% of the benchmark's instances for 300 CPU seconds.

A. DISCUSSION OF THE RESULTS ON PMS
AVD-SLS shows excellent performance on random instances like most SLS solvers, whereas the hybrid solvers were not competitive in this class.
For the crafted class, AVD-SLS is competitive with all PMSAT solvers with regards to the number of best solutions found. However, Dist and Dist2 SLS solvers solved a few more instances than AVD-SLS from MSE 2014 and 2015 PMS. Furthermore, based on our categorization of AVD-SLS performance, AVD-SLS shows excellent performance in 23 crafted families. Only one benchmark family is a hard family: reversi. We found that some reversi instances can VOLUME 9, 2021  be solved when the time limit is increased. However, we did not include the pseudo-Boolean-primes benchmark family as hard here, as it was never solved by any solver.
For the industrial class, AVD-SLS shows excellent performance in 16 benchmark families and is the best SLS solver on this class. Moreover, AVD-SLS is competitive with the PMSAT solvers with regards to the number of best solutions found. AVD-SLS have five hard benchmark families: atcoss-mesat, atcoss-sugar, circuit-compaction, des, and hstimetabling. The hs-timetabling benchmark family only has two instances, and AVD-SLS solved one; we believe this result does not reflect the performance of AVD-SLS in this family. On the other hand, instances from atcoss-mest and circuit compaction families were never solved by any SLS solver. In atcoss-mesat, we found very large-sized instances with hundred thousands to millions of variables and clauses. Meanwhile, the circuit compaction instances have only a few thousands of variables and clauses, but had complex structures. Such hard instances may require more pre-processing techniques or structure extraction methods for them to be solved by SLS solvers. Most SLS solvers only consider unit propagation (UP) for pre-processing, as is the case for our solver AVD-SLS.
This evaluation study shows that AVD-SLS is the best solver among SLS solvers ( Fig. 5 and Fig. 6) and that it is competitive with a number of well-known hybrid solvers, as shown in Fig.s 7 to 12. For example, AVD-SLS is among the top three solvers for MSE 2014 and MSE 2017. It is remarkable that for most of solved instances, AVD-SLS is able to find the best solution or near best solutions. AVD-SLS may be improved to solve more hard instances either by means of new pre-processing techniques [74], [75] to reduce    the dimensionality of large-size instances, by extracting complex structure such as Boolean gates of input CNF formula F [76], or by using both. Almost all complete and hybrid PMSAT solver incorporate many pre-or co-processing techniques to reduce the dimensionality of large-sized instances.

B. DISCUSSION OF THE RESULTS ON WPMS
AVD-SLS shows excellent performance on random instances like most SLS solvers, with regards to number of solved instances, but is able to find only few best solutions. Only few hybrid solvers were able to solve all random instances.
For the crafted class, AVD-SLS perform competitively with all PMSAT solvers with regards to the number of best solutions found and is the best SLS solver on this class. Furthermore, based on our categorization of AVD-SLS' performance, it demonstrates excellent performance in 23 crafted families. In fact, only one benchmark family is hard: miplib.     We found that some miplib instances can be solved when the time limit is increased.
For the industrial class, AVD-SLS shows excellent performance in 16 benchmark families and is the best SLS solver on this class. Moreover, AVD-SLS perform competitively with the PMSAT solvers with regards to the number of best solutions found. AVD-SLS has two hard benchmark families: robot navigation and shift design. We found that the robot navigation instances of large-size with hundred thousands of variables and clauses, whereas the shift design instances have very large size with hundred millions of variables and clauses.
As discussed in the PMS results, such hard instances may need more pre-processing techniques or structure extraction methods for them to be solved by SLS solvers.
This evaluation study shows that AVD-SLS is the best solver among SLS solvers ( Fig. 5 and Fig. 6) and that it is competitive to a number of well-known hybrid solvers, as shown in Fig.s 13 to 18. For MSE 2015 to MSE 2018, AVD-SLS is among the top three solvers. It is remarkable that for most of solved instances, AVD-SLS is able to find the best solution or near best solutions. As discussed in the PMS section above, AVD-SLS may be improved to solve more hard   instances by means of new pre-processing techniques [74], [75] to reduce the dimensionality of large-size instances or by extracting complex structure [76] or by using both. Almost all complete and hybrid PMSAT solvers incorporate many preor co-processing techniques to reduce the dimensionality of large-sized instances.

VII. CONCLUSION
In this work, we have presented an adaptive variable depth SLS algorithm for the PMSAT problem. In this novel algorithm framework, we have proposed two main components: an adaptive parameter tuner and a VDS algorithm adopted for the PMSAT problem. This work provides a comprehensive evaluation of the AVD-SLS solver implemented based on our proposed algorithm with the use of MSE 2014-2019 benchmarks. AVD-SLS was evaluated on more than 3,600 instances from MSE weighted and unweighted benchmarks.
As expected, the experimental evaluation study in Section V demonstrates that our solver AVD-SLS is a highly competitive SLS solver. AVD-SLS proves that PMSAT SLS solvers have the capability to compete with hybrid PMSAT solvers in both PMS and WPMS. Generally, AVD-SLS improves the quality of solutions when the time limit is increased. This emphasizes the critical role of each SLS method's component.
Furthermore, AVD-SLS outperforms all PMSAT SLS solvers that participated in MSE 2014-2019 on crafted and industrial benchmarks with regards to three evaluation criteria: number of solved instances, number of best solutions, and score measure. AVD-SLS performs remarkably better in WPMS than in PMS. For example, if we compare the rankings of AVD-SLS and SATLike on PMS (Fig. 11) and on WPMS (Fig. 17) in MSE 2018, AVD-SLS ranked fourth and third, respectively, whereas SATLike ranked fifth and seventh, respectively.
Our investigation shows that state-of-the-art PMSAT SLS solvers are built around two important algorithmic components: a variable-pick heuristic and a weighting scheme. However, some hard benchmark families constitute very large and/or complex structures, which are still beyond the capacity of existing SLS solvers. Based on the evaluation results in Section V, we found that the general framework of an SLS method consists of additional algorithmic components that can be exploited to improve state-of-the-art SLS solvers. In this study, we selected two components, namely, parameter tuning and VDS for large neighborhood search. Other algorithmic components include new pre-processing techniques, neighborhood definition and search method, and other diversification techniques such as reset and restarts.
In our proposed algorithm, we designed the adaptive parameter tuner based on our study of the features of an input instance. In this study, we considered the problem size features, variable-clause features, and balance features [77]. More features may be included in the future to improve the adaptive tuner, such as variable-graph features, clause-graph features, and local search probes including the minimum fraction of unsatisfied clauses per run, the number of steps to the best local minimum per run, etc. In addition, more heuristics may be adopted with the VDS algorithm to facilitate its improvement, such as the highest cumulative score heuristic [78].
Several algorithmic components of an SLS algorithm can be exploited to improve its performance. The results of the evaluation study emphasize many of these components, such as preprocessing techniques, parameter tuning, neighborhood definition and search algorithms, diversification techniques, and exploitation of the structure of instances.
Nevertheless, SAT and PMSAT solvers have been shown to be competitive for solving hard constrained combinatorial problems in many different domains with various techniques while speeding up solving, including automated software and hardware engineering problems such as fault test, detection and diagnosis [79], [80], upgradability [81], circuit design diagnosis [82], etc., that encoded as SAT and PMSAT problems and solved by SAT solvers [83] and PMSAT solvers [84]- [86]. However, robustness and correctness are essential criteria, since these solvers are used as core decision engines and optimization methods [87]. Automated software engineering approaches, such as combinatorial testing (CT), can be used as systematic techniques that detect faults and failures in the software under testing (SUT) [83].
Thank you for the team of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) for their support and kindness. For computer time, this research used SHAHEEN hpc, one of the resources of the Supercomputing Laboratory at KAUST in Thuwal, Saudi Arabia.

APPENDIX A UNWEIGHTED BENCHMARK USED FOR INITIAL TUNING
See Table 14.

APPENDIX B WEIGHTED BENCHMARK USED FOR INITIAL TUNING
See Table 15.

APPENDIX C UNWEIGHTED FAMILY-BASED BENCHMARKS
See Tables 17-21.

APPENDIX D WEIGHTED FAMILY-BASED BENCHMARKS
See Tables 22-27.