An Effective SAT Solver Utilizing ACO Based on Heterogenous Systems

This paper presents new parallel strategies for preprocessing and solving the issue of Boolean Satisfaction (SAT) on Heterogeneous systems of multicore and many-core CPU and Graphics Processing Unit (GPU) using Open Multi-Processor (OpenMP) and NVIDIA - CUDA. We propose exceptionally proficient and parallel techniques for SAT simplifications using the variable elimination method based on the Davis-Putnam-Logemann-Loveland (DPLL) slitting rule algorithm performed with a shared-memory model on a multicore CPU platform, where the clause elimination subsumption and the pure-literal removal techniques are completely performed on the CUDA framework. We demonstrate how efficient an evolutionary SAT solver is by using the suggested heterogeneous pre-processing, leading to important acceleration improvements in the solution’s quality enhancement. The penalization of the transformative SAT solver is executed with Ant Colony Optimization (ACO) scheme utilizing CUDA. (Compute Unified Device Architecture) We perform thorough benchmarks to test the performance of our preprocessor and solver implementations against various random SAT formulas. The promoted H-SAT pre-processor scheme has gotten a speed-up of a factor 15x over the sequential implementation with statistical reductions on the original CNF which becomes up to 49% and 43% in case of literals and clauses numbers exclusively, where the H-SAT gain strength the solvability of the ACO solver by 100% in some cases.


I. INTRODUCTION
For a multitude of reasons, interest in Boolean satisfaction is growing as more issues are now being solved more quickly by SAT solvers over others. This is undeniable because satisfaction is at the intersection of logic, fault diagnosis [1]- [3] automatic program testing [4]- [6], auto debugging systems based on real-time [7], biological systems [8], [9], and computer engineering studies in general [10]- [13]. Particularly many problems stemming from one of these areas has various Satisfaction translations or encodings, and there are numerous numerical techniques accessible for the SAT solution to help in solving them with improved performance.
In particular, several modern evolutionary solvers have been implemented for heterogeneous parallel architectures with prior simplifications. Numerous techniques are generally used in the field of computational problems, which can The associate editor coordinating the review of this manuscript and approving it for publication was Daniel Grosu . produce significantly faster algorithms: complete and incomplete (tentative) techniques.
Although no schemes are familiar for fixing the SAT problem effectively, or optimally for all possible cases or formulas, there are some of the problems as circuit design [16], [17], and automatic theorem proofing [18], can be resolved rather efficiently using incomplete or empirical SAT-solvers. Such schemes are not accepted to be effective on all SAT occasions, however tentatively these schemes will in general function admirably for some reasonable applications. Even though the solution quality, i.e. how many trials required to find best possible solution, of this kind of solvers can be further enriched by using variable and clause eliminations for reducing SAT input formulas.
Modern solvers and preprocessors such as Non-increasing Variable Elimination Resolution (NiVER) [19] and SatELite [20] are based on DPLL algorithm [21]. Subsumption, Unit propagation, and pure-literal removal [22], [20], [23], [24] are the best well-known simplification methods used in most of the DPLL based SAT preprocessors. The NiVER process is like to other simplifiers accepts a Conjunctive Normal Form (CNF) as inputs and outputs with a less or equal number of variables by resolving away variables that have a limited number of occurrences, i.e., how many times a variable appears in a SAT formula.
The NiVER technique considers the least constrained variables to be removed away, where our implementation of a variable elimination technique considers the Extreme Variated Variables (EVVs), i.e., the variables that appear in the highest number of clauses. These factors/ variables are settled utilizing the slitting rule of the DPLL algorithm, which depends on crafted by Moritz and Springer [25]. The SatELite extends the NiVER process by subsumption elimination, where the slitting of a CNF with a particular variable yield an enormous number of subsumed clauses and pure-literals. The clauses which enfold a purely literal and thus that have been subsumed ought to be detected and removed instantly to save memory and simplify further the output of the CNFs.
Currently, multicore and many-core multiprocessors are becoming prevalent in SAT tackling problem, where several major solvers are skillfully introduced to minimize the timing output as observed in [26], [27], and [28]- [30]. The previous analysts utilized an arrangement of comparing successive schemes got through cautious varieties of the standard DPLL scheme to build on CPU. The other is a parallel version from the Mini-SAT solver [31]- [35] that uses the farm strategy which creates a master process responsible for slitting the original formula with guiding paths (assumptions) and sending to slaves.
There might be multiple sub-formula per slave, but each receives one at a time. When a slave is finished with its offer, it sends its outcomes to the master and waits for further research. A master sends more work to the slaves while no solution is found or while there are sub-formulae to be solved. The latter introduces a 3-SAT solver that uses CUDA to adopt a deterministic strategy implemented on the GPU. Since every one of its clauses is 3 literals long, it picks a clause and tests 3 mixes of factors attributions for its literals: The first is true; the first is strict and the second is real and the first and subsequent literals are independently false and genuine. It is intended for arbitrary occurrences, which are commonly difficult to comprehend, despite when little, for lacking inward structures to be exploited.
So far, none of these measures were intended to be parallel incomplete SAT solver that enhanced with a parallel preprocessor on the heterogeneous multi-core processing unit architecture. This paper shows a proficient, and quick parallel heuristic SAT solution with H-SAT pre-processor. The solver applies the ACO algorithm [36]- [39] based on Springer's work [25] and implemented with CUDA on GPU. [40], [41]. Our suggested H-SAT preprocessor uses OpenMP to display a skilled variable removal method [42], [43] to make full use of the multicore CPU based on a sharedmemory model and notable quick parallel subsumption algorithms and pure-literal elimination architecture based on (Single-Instruction Multiple Thread) SIMT shared-memory architecture for complete GPU operation with CUDA.
The primary enrichment of this article is to use variable elimination, subsumption and pure-literal cuts on the CPU-GPU system using the parallel SIMD architectures to achieve a fine-simplified SAT CNF that is proper for our solver utilizing the Max-Min Ant System (MMAS) method to SAT solving [44]- [47], requiring trivial formulas to be processed.

II. BACKGROUND AND RELATED WORK
This area audits the SAT issue and how it very well may be preprocessed utilizing the DPLL slitting rule (variable elimination), the clause subsumption elimination algorithm, pure-literal elimination algorithm, and outlines the basic procedure of the MMAS for SAT ex-plaining. For extra subtleties, we urge the per user to read the authentic works of Subbarayan and Pradhan [19]; Eén and Biere [20]; Zhang [48]; Stützle [44]; Moritz and Springer [25]; Villagra and Barán [45]; Youness et al. [46].

A. BOOLEAN SATISFIABILITY ISSUE
The Boolean or propositional satisfaction problem can only be answered in one sentence: given a Boolean formula, it is conceivable to decide if Boolean qualities are allocated to the propositional factors in the recipe so that the formula is assessed as real. The formula is considered satisfactory if such an assignment exists; otherwise, it is unsatisfactory.
For a combinatorial problem to be solved using the latest SAT schemes, it usually has to be encoded into a CNF: sequence of clauses ∧ i c i , where, each clause c i is a disjunction of k literals ∨ m l m , and every literal l m being either a Boolean variable v or its negativev. There are some variations of k-SAT formulas like 3-SAT (k = 3) which also falls within the NP-complete problem category. 3-SAT restricts the literal numbers for each clause to precisely 3 literals. CNF is a simple form, easy to implement, and its common format for files. A format for files CNF SAT issues conceived and pursued since in the DIMACS Challenge [49]. The popular file format promoted the compilation of SAT benchmark issues on the SATLIB website, as well as the periodic SAT solver competitions [50], which stimulated much research into effective algorithms and implementations.
There are two primary algorithm classes that were created to fix SAT cases. The first class is the full algorithms that are guaranteed to end with a right choice as to whether the CNF is satisfied or unsatisfied. DPLL and Clause Learning (CDCL) Conflict-Driven algorithms fall into the full algorithm category [26], [21], [51]- [55]. The second class is incomplete schemes that don't give the assurance that a good satisfactory assignment will either be reported in a preset time limit or declared unsatisfactory, but a solution can be found quicker than a complete algorithm. Our parallel ACO-SAT solver on GPU [46] is based on incomplete approaches which will be revised briefly in this paper. VOLUME  In this method, the SAT formula is factored or split by choosing a variable v, generating two simplified formulae. Factored formulas can again be factored by another variable [25], [56], [57]. We assign the chosen variable v once a true value to get one of the two new formulas and once a false value to get the other formula. If we can prescribe a CNF formula in the following form: where C i and C t are clauses wherein v andv do not appear together, and S r is a set of clauses in which v andv do not appear, then we can acquire two formulae S = C i ∧ . . . ∧ C n ∧ S r and S = C j ∧ . . . ∧ C n ∧ S r .
The set S is unsatisfiable if and only if S and S are unsatisfiable, where S and S are pure-literal formulations. The slitting rule is applied by recursively removing the clauses that have the positive literal v because it is now satisfied, otherwise we remove the negative literalv from any clauses if found. The factored formula is unsatisfiable if we end up with an empty clause and satisfiable if we end up with no clauses.
We introduce a new sequential implementation for this method using dynamic programming to over-whelm the pitfalls of recursion and the out-of-memory exceptions (see Algorithm 1). Dynamic programming is an approach for optimization that converts a complicated issue into a sub-problem series; its essential characteristic is overlapping these sub-problems without any recursion taking far less time than the other traditional methods.
The input parameter, Pro of the procedure is a vector of variables chosen to fragment the input formula. Each variable v returned by this vector is stored in a CPU register (step 4) to accelerate the operations performed on v thru minimizing the system memory traffic. Note that the formula and propositional vectors are updated and reallocated inclusively inside the procedure; which is considered a major advantage for using the dynamic programming concepts. A modern innovated algorithm calculates the number  of variables (parameters) in a propositional that called parameters_settings (as shown in Algorithm 2). This algorithm acquires the most appropriate number of the EVVs consistent with their appearance in the formula and an input limit variable called max_nop initiated by the user.

C. SUBSUMPTION
Assume that lit (C) in the formula of CNF shows the set of literals in clause C. Given the clauses C 1 and C 2 , if lit (C 1 ) ⊆ lit (C 2 ) then C 1 subsumes C 2 . A subsumed section is redundant and can be withdrawn without changing the representation of Boolean functions from the CNF formula. The main drawback of variable elimination is that it produces many extra clauses which subsume or is subsumed by another clause. Since redundant clauses ingest memory and time in SAT solving, it is more desirable to detect and remove the subsumed clauses immediately after the variable elimination phase is completed.
In modern SAT preprocessors like SatELite, whenever new clauses are added to the formula, it is checked against current provisions in the database to see if they are subsumed or not. This check is called a reverse/backward subsumption [20] that can be applied during SAT goals, which is now being presented in most SAT solvers. Furthermore, the fresh clause is checked against the current provisions to see whether it is subsumed by any of them; this check is referred to as the forward subsumption [20].
In our implementation, we introduce a straightforward brute-force algorithm that is suitable for SIMT architectures and performs extremely fast if it executed on these parallel platforms. The sequential and the parallel techniques are presented in this paper as shown in Algorithms 3, 6, respectively.
First and foremost, at the presented sequential algorithm, we sort the clauses in the input formula according to their sizes, then we do a brute-force search for the subsumed clauses and flag them at once to be removed later. If the size for (L ∈ C ) //L is a group of literal 7: if (L ⊂ C ) 8: subsume = false 9: else 10: Remove (C ) 11: end if 12: end for 13: end if 14: end for 15: end for of the formula is n, it will have a complexity of O(n 2 ) which is considered high compared to its parallels in the present stateof-the-art pre-processors, conversely, the complexity can be O(1) if we allocate each clause to separate worker to do the check. With that granularity, our algorithm is more likely to perform faster than any other counterpart exists if it is implemented and executed on SIMT architectures such as the GPU.

D. ELIMINATION OF PURE-LITERALS
Initially, the pure-literal principle was expected for the advancement of the unit-clause spread of the DPLL technique. The unit propagation or the Boolean Constrained Propagation (BCP) searches over each clause (except for the unit clause itself) that comprises the unit literal l to be removed. If a clause found inclosing l, this literal is deleted.
In the pure-literal rule, in a CNF formula S, a literal l is called pure if and only ifl does not occur in S. Pureliteral words may still be added, without affecting satisfaction, which adds together with the removal of the clauses. As this can make other literals pure, the method has to be iterated in order to produce an equal formula for satisfaction utilize with no pure-literal materials. This is called pure literal removal [24]. Consider the following formula S of the CNF with a pure b literal: By implementing a fresh set, S' of pure-literal removal will be A complete list of the new parallel algorithm for elimination in pure literality can be found in the next section. The algorithm is made up of two principal steps: 1: detect each variable for all clauses, storing their footsteps, counting their appearance (polarity is excluded), and finally summing their values (polarity is included), see Algorithm 7 (detector algorithm).
2: checking the virtue of every factor and correspondingly evacuating the clauses which hold any factor that has been observed pure, as seen in Algorithm 8 (implementer algorithm).

E. MAX-MIN ANT SYSTEM
The SAT procedure Max-Min Ant System (MMAS) has three primary phases attempting to discover the best possible alternative: the production of an ant colony; refreshing of pheromones, and obscuring of pheromones [25]. All phases are repeated until the condition of termination is fulfilled. The ant colony comprises m artificial ants, where m is a parameter defined by the user.
Each ant j constructs its alternative by comparing a random value with a likelihood (random proportional rule) for selecting a literal l ∈ L 2×n positive or negative, where L represents a twin set of factors n and their complement in a b clauses SAT formula. The random proportional law lies in the attractiveness of the pheromone. and the heuristic EVV [25], [46], i.e., the factors appear more desirable by ants in most provisions/clauses.
where: ph lj is the measure of pheromone as of now arranged by subterranean ant j on literal l; evv lj is the extreme variated variables heuristic of exacting l for an ant j; and α and β are client characterized parameters to control the adequacy of ph lj and evv lj . When an ant chooses a candidate for a solution, the candidate will be assessed to compute the quantity of clauses that satisfiable in the SAT formula and the quality of the assessment. The quality of the assessment is the number of clauses complied with multiplied by the weights of certain things calculated by the heuristic rule of weight adaptation [45], [46]. Heuristic weight adjustment is to increase the importance of unresolved provisions during each period of assessment.
After every one of the ants has scanned for choices, it is important to refresh the pheromone esteems for all literals and their limits. To evade the slump of the populace or falling into local optima, every quantity of pheromone is reduced by a factor defined by the client called the dissipation rate ρ [36], [37], [45]. This enables ants to overlook terrible assignments, where all pheromones amount and points of confinement are refreshed as follows: where (ρ) is the dissipation rate 0 ≤ ρ≤ 1, τ l is the ant pheromones level, and i * is the present best arrangement per cycle i. The amount of a pheromone is legitimately corresponding to the target function f(x), i.e., the picked task x duplicated by the assessment quality. The points of confinement of the pheromones refreshed by the accompanying VOLUME 8, 2020 equations as follows [36], [45].

III. CPU-GPU IMPLEMENTATIONS
This segment presents the parallel methodology of the H-SAT preprocessing scheme and the ACO solver on CPU and GPU using OpenMP and CUDA as seen in Fig. 1. A flow diagram in Fig. 2 reviews the workflow of the H-SAT preprocessor implementation on both multicore CPU and many core GPU. The ACO solver may processed a simplified CNF stemmed from the H-SAT or the input original CNF, so we perform benchmarks to test the solving performance of the ACO solver on both fed formulas and compare the solving times of both runs.
Toward the beginning of the ACO strategy, we set the calculation parameters (m, α, β, etc.), parse the SAT occasion which is spoken to in DIMACS file (streamlined or unique), assign memory for memory requirements and grids, make CUDA streams [58], for example an arrangement of instructions that execute in issue-request on the GPU. Furthermore, a CUDA arbitrary number generator RNG sent in cuRAND library [59] that accompanies NVIDIA SDK with various sorts of excellent RNG schemes has been set up. The host (CPU + Machine Memory) side initializes the pheromone concentrations, heuristic EVV, and probabilities. We only copy all data synchronized to the global memory of the GPU once to avoid the possible overhead communication with the CPU.

A. THE CNF PARSER
The initial parsing of the SAT formula, which is usually reproduced in CNF or DIMACS format, is required [49] in case to begin the preprocessing stage. In these formats, each clause is shown as a collection of signed entries, where a negative value is the negated variable; for formula, (1 − 42) stands for the clause (v 1 ∨v 4 ∨ v 2 ). The CNF parser peruses out the number of factors and clauses from the content document, at that point peruses every clause and put it in a separate (Standard Template Library) STL vector [60].
Each vector of the clause is pushed to the STL 2-D vector of formula. We use 1-D host side arrays to allocate memory (CPU + System Memory) to formula clauses and their dimensions. We then only once copy the assigned information to the device's memory (GPU worldwide memory) to perform the necessary subsumption and literal removal computations on the GPU. The information transfers performance penalty is not more than a few microseconds and may not be taken into account. As appeared in Fig. 2 that outline the H-SAT usage.

B. SAT SETTINGS
This phase applies the DPLL slitting rule in the formula at which phases from 3-5 are performed on each processor of the CPU (Fig. 3): 1: calculating the variable's occurrences number in the formula (occurrence subroutine).
4: applying the slitting algorithm (slit subroutine). 5: checking the reduced formulas for their literal number and solution (local Solvable_Min Selection routine).
In 1 and 2, the CPU sequentially performs the occurrence and parameters_settings subroutines. The occurrence subroutine performs two operations; the number of occurrences in each variable in the original formula is counted for each clause and stored in a vector (occur), then the variables are sorted according to their appearance.
The sorted variables are transferred to the EVV vector after completion. The parameters_settings subroutine as mentioned in Algorithm 2 calculates the amount of factors (number of eliminated variables). Each removed variable produces two more propositionals v andv, so we should have (2 factors ) new propositionals or variable candidate combinations g, to be removed from the initial formula. Let q be the CPU amount of cores, consequently, we decompose this number g into q tasks; sequentially, each task runs a removal package called ω such that (ω = g/q). The Kit result is a simple, solvable formula that is saved in an array which is a global with a thread ID (thr id ) index.
The two logical threads are attached to each core, and we generate the p threads using the integrated OpenMP subroutine [61], [42] and tie every thread by its index, thr id such that (0 ≤thr id ≤ 2q) to any subsequent core physique y i . This is possible with the OpenMP KMP_AFFINITY environment variable [62] that established to scatter mode. This mode allocates the threads throughout the entire scheme as uniformly as possible. The granularity of the core is correspondingly defined so that each thread can migrate to any thread context within a core. Following many tests on multiple SAT CNFs, this setup has demonstrated the highest timing efficiency. A detailed description for the task operations is outlined in process Algorithm 4.
For the subroutines discovered in Fig. 3 steps 3 through 5, we allocate a distinct address space for non-shared of each operative. In step 3, the (Pro_Parallel) subroutine is carried out in parallel as what we show in Fig. 3 with different iterators i to generate a binary-like random mixture of variables under the algorithm 5 operation.
The operation is very rapid as it only utilizes a low-memory change using shift and logical AND operations.
After all the steps are over, the Global procedure transfers the reduced formulas stored in the global Solvable_Min selection phase, in order to determine what formula has minimum literal dimensions and whether or not it is resolvable. This means that a solvable formula is returned with the information stored in min_r_Form and C_S arrays via the succession of the call to the slit procedure. If ((C && new_C) < No_C) && (∼ delete (new_C) && (limit(new_C)) > limit(C))) then 4: while (par = 1 to limit(C)) 5: literal = clauses (C * width + par); 6: if (∼ find (Form(new_C), literal, limit(new_C))) then 7: break 8: end if 9: end while 10: if (par == limit(C)) then 11: delete(new_C) = true 12: end if 13: end if In the solvable formula, the last simplified CNF formulation in the SAT settings technique is more important than the other ones.

C. SUBSUMPTION ELIMINATION
To carry out the subsumption trial, a device kernel (as Algorithm 6 for the parallel algorithm and Fig. 4 for a visual insight of the kernel execution) is called by the host which initializes a 2-D grid with blocks (set of threads that can be run parallelly) of ceiling length ceil (N/32) on the x-size and ceil(N /32) on the y-size side, each block size is 32 × 32 threads, with a clause in the formula simulated for each thread. So, the SAT Setting procedure results in a linear array of the streamlined input 2-D SAT formula using the Allocate & Adjust step as shown in Fig. 2.
In a separate array, the limit variable (number of literal k in each clause) is stored for description of head and tail of each clause in the mapping array.
The range of clauses has been loaded for use in the global device memory at the assignment point for execution. The other array was loaded into the device's constant memory for quicker reading [63]. A substantial performance was achieved using GPU SMPs (Streaming Multi Processors), using up to 99 percent of its maximum load with the existing configuration as in Algorithm 6.
As discussed earlier, we can reduce the algorithm's complexity to O(1) by creating the largest possible threads to cover all the clauses that equal to (N ×N ). The delete array in  The variable v is simulated from 1 to M by using the y-thread. Then v is searched in every clause using the x-thread, to locate its position, and count its appearance (exclude polarity). The x-thread sums up its value whenever it is + ve/−ve (include polarity) using the AtomicAdd operations [64]. i.e., a few cycles of memory lock until the threads complete their procedure.
Algorithm 7 sums up the parallel detector that is in the past kernel. In this algorithm, the detect array is a Boolean matrix with a row indicating the absolute variable value and a column indicating the clause position. The matrix is initialized to zeros and, if a clause x contains a variable y, the matrix element is set to 1 at index (y, x). Fig. 5 presents a graphic illustration of the detector technique parallelization.
In the subsequent kernel (implementing) proposed in Algorithm 8, we utilize a similar arrangement as the detector kernel starts. Anyway, the x-thread checks for the virtue of the variable utilizing the insights we get from the detector portion (counter, Sum,Detect). If the variable is uncorrupted, the SAT formula recognized by the delete array removes all the provisions which contain this variable. The implementing algorithm provides a visual example of the parallel execution in Fig 6.

E. MMAS PROCEDURE
This part shows our parallel usage of the MMAS Procedure for SAT calculation on the GPU utilizing CUDA (see Fig. 7).
First, we parse the reduced SAT CNF that yielded by the H-SAT preprocessor, then execute each routine on that formula, and as expressed previously, we have three principles arranges in the MMAS system. We are making the counterfeit subterranean ACO framework in the main stage, which is responsible for the accompanying two stages.: Stage 1: scanning for a competitor answer (ChooseSolution Procedure).
The ChooseSolution subroutine is run on GPU, which originates a kernel that estimates the probability p j (l) and looks at an arbitrary number r l to p j (l) and bounces a positive task TRUE, if r l ≤ p j (l), and a negative task FALSE something else.
The inherent capabilities that performed as Special Function Units (SFUs) inside the GPU system [58] are utilized in the calculation of p j (l), which needs power and division activities. SFUs can manage transcendental and graphical guidelines for interpolation with a minimum amount of IPCs (Instructions per cycle) but reduces the precision of floatingpoint numbers. This allows us to cut down the quantity of floating and integer instructions to 84% and 54% respectively, that lower the execution period by a factor of 2.68 for the ChooseSolution subroutine [46].
An irregular number sequencer r n of consistently dispersed floating-point esteems anywhere in the range of 0.0 and 1.0 (where, 0.0 is rejected) is at first created at the beginning of MMAS run utilizing curandGenerateUniform work in the cuRAND library. In this manner, we create new groupings for the subsequent ants in the territory covering the scattering work with others GPU parts as being shown in Fig. 5 to get the improvement of the simultaneous bit execution ability in the GPU. The competitive execution of several kernels can withstand 32 kernels if various streams are assigned.
1) The kernel evalFullCand Initializes a 2-D range block grid of ceil(m/32) in x-axis and ceil(n/32) in y-axis; every block size is (32 × 32) threads, where apiece thread mimics a literal. The formulas of SAT were a table of clauses that being complete [13], where a table is a q matrix of n×m elements as q ∈ B ∧ (n×m × 2)), where B is a boolean bit. The candidate for the solution is displayed in an array d∈B 1×m .
We calculate procedure q ij d j bit by bit for each clause such that 1 ≤ i ≤ n and 1 ≤ j≤m to test the applicant d, where d fulfills q iff each matrix u 2×m i arising from the previous procedure includes at smallest one TRUE. q is stacked in the initialization period of the common GPU memory (as previously stated) and d has been stacked into the shared memory during the execution of the kernel to gain from a regular access to data.
2, 3) The kernels evalSol1 and evalSol2 calculate the sum of the clauses that have been resolved as s ∈ B ∧ (1 × n)), where s i = i u i and the evaluation quality (E) according to the succeeding calculations: where w i is the clause capacity, 1 ≤ i ≤ k i , and k is the number of the satisfiable clauses. The summation in equalizations 4 and 5 is measured utilizing the parallel decrease calculation as Harris [65]. Brent's theorem [66] says that each thread should add O (logN ) components to the shared memory, and then the tree-based reduction system [67] will be applied to the shared memory. We have adjusted the scheme to include the last partials sums consequence with the atomic-add activity (see Fig. 8) to help the dot product in the form (4) safeguard the time complexity of the scheme to O(N /logN ).
In the subsequent stage, the updatePheromones subroutine updates all pheromones focus τ l and points of confinement (τ min , τ max ) as indicated by the best elective found based on colony of ants and of the highest quality appraisal techniques.
This subroutine dispatches a GPU kernel with enough threads comparable to the m variables. There are 512 threads in a block, where each thread is mapped to a trail of pheromones to vanishing and deposit forms to restrain the pheromones limits.
Finally, the blurPheromones subroutine foggy spots all pheromones τ l by adding the worth r l .ph l to every pheromone amount, where r l is an irregular number with the end goal that −max i ≤ r l ≤max i , where max i is called the most extreme difference parameter and is determined [25] as follows: where µ is the base blurring and σ is the factor of decline. This strategy has given phenomenal outcomes in choosing arrangements competitors just with the obscuring in periodic cycles. The procedure of blurPheromones begins a kernel comparable to the previous updatePheromones kernel, but the method of evaporation and depositing is substituted by the blurring method.

IV. PERFORMANCE BENCHMARKING
In this section, on countless random SAT CNFs, we conduct the benchmarks acquired by executing our preprocessor and ACO solver implementations. We compare these benchmarks to the sequential implementation of our algorithm. The benchmarks introduced in this paper includes the following criteria for efficiency: 1. The running times of our H-SAT preprocessor parallel implementation against serial implementation. 2. The execution times of our parallel implementation of ACO SAT against serial deployment. 3. Acceleration acquired towards the serial equivalents. 4. Statistics on cuts and percentages of the literal and the clauses compared to the initial SAT cases. 5. Comparison of time with and without our preprocessing H-SAT. 6. Comparison of the quality for the solution with and without our preprocessing H-SAT.   We have implemented the SAT Factoring in our H-SAT preprocessor using OpenMP V2.0 with C++ backed by Microsoft Visual Studio compiler (VS2013) running on Intel Core i7 3770 K with 4 cores and 4 threads (one thread/core) running at 3.9 GHz and 8 GB memory. Instructions set for the Advanced Vector Extensions (AVX) [68] is subjugated to boost host execution cycle throughout the host-side program execution phase. Intel AVX is a 256-bit extension to the Intel Streaming SIMD Extensions (SSE) set of guidelines and is specifically intended to enhance intensive data applications efficiency due to bigger vectors, fresh extendable syntax and  rich characteristics. Our subsumption in aggregation with pure-literal removals along with the MMAS SAT solver that programmed utilizing CUDA C++ running with 2880 processing cores on NVIDIA Geforce GTX (15 multiprocessors with 192 processing cores each) operating at 1 GHz and 6 GB of memory [69].  NVIDIA SDK is used on Windows 10 × 64 with CUDA Tool Kit v6.5. CUDA Compute Capability 3.5 optimized the GPU binary code [63] to take the advantage of the NVIDIA's Kepler GK110, which is the architecture for the next generation GPU [70]. The executable sequential code was introduced using C++ and performed  on a single core operating at 3.9 GHz with a single thread.
We have generated 14 benchmark formulas sets of irregular k-SAT (5 each) in DIMACS arrangement utilizing ToughSAT library [71]. Each set name comprises of the SAT sort pursued by the number of factors in this set. For formula, the benchmark set 5_sat_600 implies that the SAT sort is 5 SAT or 5 CNF and the quantity of factors in each occurrence is 600 factors/variables. Within 30 distinct runs, all solving times of k-SAT sets are averaged over 6,000 iterations in the algorithm. We used three kinds of CUDA generators in the GPU application of the ACO SAT solver, PSEUDO_MTGP32, PSEUDO_XORWOW and PSEUDO_MRG32K3A [59]; the average timing for each generator is over 10 distinct runs, that is, 30 runs in total. Table 1 presents the SAT parameters (Variable Elimination or VE) with a max_nop set to 12, the Subsumption Elimination (SE) kernel, and the parallel Pure-Literal Removal (PLR) algorithm. Furthermore, the speed-ups achieved against the sequential implementation of our methods are also provided for the chosen CNF cases. Table 1 findings reveal a velocity up to 4.75x quicker than the sequential counterpart in the parallel execution of kernel deletion subsumption in the GPU and acceleration in parallel execution of CPU of the factor 3.12x elimination variable. Relatively velocity up to 15x demonstrates our fresh parallel algorithm of pure-literal removal.
In Figures 9, 10 The percentages of clause and literal reductions observed for each of our H-SAT preprocessor's simplification methods that appropriately are shown vs. the initial formula. In Fig. 11, We observe a large part of the complete cuts owing to all the simplifications, up to 43% and 48.5% in the number of clauses and literals respectively. Note that the prefix (R_C) concerns the reduction of the clause in a set, with the prefix (R_L) concerns the literal reduction. Table 2 demonstrates our solver's solving times with and without pre-processing H-SAT. The results show that in the case of using the preprocessor actions in solving big SAT cases, the ACO solver is so efficient with a small amount of time. In Table 2, the notation (Excess) implies that the solver surpassed the maximum amount of iterations or tests attempting to fix the SAT equations in a set. Fig. 12, represents the H-SAT solver's solution characteristics without using the preprocessor in S2 set. Similarly, in Fig. 13, we demonstrate the H-SAT solver solution characteristics using the same set of preprocessors. We observe in VOLUME 8, 2020 the graphs (3, 4 and 5) in Fig.13 a quick convergence to the solution over the ones found in Fig. 12.

V. EXPERIMENTAL RESULTS OF REDUCTIONS
The SAT, as a generic problem has many solutions, where the authors used many platforms to solve and to speed up. So, some libraries as SATLIB are used as test banks to get some of the benchmarks for their solutions. Also, present-day SAT solvers are exceptionally subject to heuristics. Consequently, benchmarking is of prime significance in assessing the exhibitions of various solvers. In any case, applicable benchmarking isn't direct. Figures 14, 15 indicate the percentages of actual H-SAT and clause reductions compared to the initial formulae and thus acquired by the SatELite. The figures show that the preprocessor H-SAT defeats the SatELite by 13.5% and 24% respectively in removing the additional amount of literals and clauses, even quicker by 4.41. H-SAT also accomplished more clause and literal cuts than those acquired through the SatELite by 31% and 60.7%. The average decreases in H-SAT provisions and literals compared to the initial cases are respectively 13.7 percent and 14.2 percent.

VI. CONCLUSION
In this research paper, we performed a fresh effective parallel SAT heuristic solver with parallel preprocessor on heterogeneous CPU-GPU systems. We have developed a variable elimination technique that executes the SAT factoring exhausting the most constrained variables of a SAT formula on the CPU's multicore architecture.
We have shown how the subsumption and pure-literal eliminations implemented in our H-SAT preprocessor can benefit from the massive-data parallelism of the GPU CUDA platform. Our benchmarks divulge an increase in speed-up to 15x more quickly than the sequential implementation and significant reductions of 43% and 49% in clauses and literal respectively, compared with the initial CNF.
With the help of our parallel preprocessor, we have shown how a metaheuristic SAT solution can be so efficient in solving broad CNF formulae. In our implemented task, we have considered numerous highlights of the host and gadget SIMT designs, for example, the multicores working with strong CPU frequencies, shared memory, and simultaneous part execution.
Also, they work with atomic activities, two-dimensional grid with a large number of threads that run in parallel, and quick steady memory. Likewise, we have exploited the AVX guidance built-in CPU's sets and GPU's SFUs to produce an effective and quick executable code.