Generative Adversarial Construction of Parallel Portfolios

—Since automatic algorithm conﬁguration methods have been very effective, recently there is increasing research interest in utilizing them for automatic solver construction, resulting in several notable approaches. For these approaches a basic assumption is that the given training set could sufﬁciently represent the target use cases, such that the constructed solvers can generalize well. However such assumption does not always hold in practice since in some cases we might only have scarce and biased training data. This paper studies effective construction approaches for parallel algorithm portfolios that are less affected in these cases. Unlike previous approaches, the proposed approach simultaneously considers instance generation and portfolio construction in an adversarial process, in which the aim of the former is to generate instances that are challenging for the current portfolio, while the aim of the latter is to ﬁnd new component solver for the portfolio to better solve the newly generated instances. Applied to two widely studied problem domains, i.e., the boolean satisﬁability problems (SAT) and the traveling salesman problems (TSP), the proposed approach identiﬁed parallel portfolios with much better generalization than the ones generated by existing approaches when the training data was scarce and biased. Moreover, it was further demonstrated that the generated portfolios could even rival the state-of-the-art manually designed parallel solvers.


I. INTRODUCTION
M ANY high-performance algorithms for solving com- putationally hard problems, ranging from exact methods such as mixed integer programming solvers to heuristic methods such as local search and metaheuristics, involve a large number of free parameters that need to be carefully tuned to achieve their best performance [1]- [4].In many cases, finding performance-optimizing parameter settings is performed manually in an ad-hoc way.However, the manuallytuning approach has two main disadvantages [5]- [8]: (i) it requires considerable human effort; (ii) it is often limited to the exploration of few parameter settings, thus leading to a performance that is far from the optimal.As a result, there have been a lot of attempts on automated parameter tuning (see [6] for a comprehensive review), which is usually referred to as automatic algorithm configuration (AAC) [9].
Here a configuration of a parameterized algorithm refers to a complete setting of the parameters of the algorithm, such that the algorithm's behavior on a given problem instance is completely specified (up to randomization of the algorithm itself).In the last few years, with several high-performance algorithm configurators (i.e., AAC methods) such as ParamILS [6], GGA [10], irace [8] and SMAC [11] being proposed, AAC has become very effective.
As a consequence, recently there is increasing research interest in utilizing these methods to automatically construct effective solvers for a given application.The key idea is to parameterize many aspects of the algorithms and thus come up with a large space of algorithms as the configuration space, from which effective algorithm configurators are used to identify high-performance algorithms.Unlike manual solverdesigning paradigm which usually relies on considerable effort by human experts, the automatic solver construction approaches involve much less human effort and instead usually need to consume large budgets of computational time for configuration.This is acceptable (and even appealing) since the available computing power has been rapidly becoming much cheaper than before 1 .Indeed, such approaches have been demonstrated to be both practical and effective in cases of constructing sequential solvers [12], [13], sequential portfolios [14]- [16] (i.e., algorithm portfolios with selectors/scheduling), and parallel portfolios [17], [18].
Generally all these approaches require that a training set (i.e., a set of problem instances of the problem domain of interest) is available for constructing the solvers, particularly to evaluate the solvers in the construction process.Moreover, for these approaches an indispensable assumption is that the training set is a good representative of the target use cases [19], such that the "trained" solvers can generalize well to the instances out of the training set.In practice given a specific application, it could usually be expected that some data, i.e., the instances that have been encountered for this application before, are available as the training data.However, it is noted in at least two cases such a training set might not be sufficiently representative, which could have a major impact on the applicability of the constructed solvers.First, only a limited number of instances are accumulated and thus can hardly cover the whole possible target cases.Second, the accumulated instances are outdated and could not reflect the properties of current cases well.Actually the above two cases are not rare and have been discussed in different areas in the literature.For examples, it has been reported that in combinatorial optimization some commonly used benchmark instances are not necessarily challenging [20], narrowly defined [21], and distinct from real-world instances [22]; in research areas closely related to real-world applications such as logistics, there are also concerns that the instances proposed decades ago already could not represent the real-world cases of today due to the ever-growing of big cities [23], [24].
Intuitively, to handle this issue, generating some additional instances appears to be an alternative.However, it is also nontrivial to generate good training data in practice.Recall that the ultimate goal for having a representative training set is to achieve good generalization of the constructed solvers.Thus the term "representative training set" depends on the specific solvers considered, while the latter is to be constructed based on the former.In other words, it is very difficult to obtain a concrete definition of representativeness in advance, which is crucial for evaluating a given training set and thereby generating a representative one.This difficulty could be alleviated by allowing some redundancy in the training set, since in the extreme case, one could obtain perfect generalization if all possible target instances are included in the training set.However this could lead to overwhelming cost for getting the training set as well as for constructing the solvers.
This paper studies effective construction approaches for parallel portfolios that are less affected by non-representative training data.The term "parallel portfolio" [25], [26] refers to a portfolio/set of solvers that are run independently in parallel when solving a problem instance (see Section III-A).As a form of solvers, parallel portfolios have several important advantages.First, exploiting parallelism has become very important in designing efficient solvers for computationally hard problems, considering the great development and the wide application of parallel computing architectures [27] (e.g., multi-core CPUs) over the last decade.Parallel portfolios employ parallel solution strategies, and thus could easily make effective use of modern hardware.Second, utilizing several different solvers (as in parallel portfolios) is a simple yet effective strategy for solving computationally hard problems.Such idea has also been realized in the forms of sequential portfolios [28], [29] which try to select the best solvers for solving a problem instance, and adaptive solvers such as adaptive parameter control [30]- [33], reactive search [34], [35] and hyper-heuristics [36]- [38] which seek to dynamically determine the best solver setting while solving a problem instance.In principle, all these methods need to involve some mechanisms (e.g., selection or scheduling) to appropriately allocate computational resource to different solvers, while parallel portfolios do not necessarily require any extra resource allocation since each solver is simply assigned with the same amount of resource.Third, a parallel portfolio could be easily converted to a sequential portfolio by using algorithm selection methods [39] to build selectors on the solvers in the portfolio, which means the portfolios generated by construction approaches (e.g., the approach proposed in this paper) could be further used for constructing sequential portfolios.
In this paper we propose a novel approach called Generative Adversarial Solver Trainer (GAST) for the automatic construction of parallel portfolios.Unlike existing construction approaches, GAST would generate additional training instances and construct a parallel portfolio with the dynamically changing training set.More specifically, GAST puts instance generation and portfolio construction in an adversarial game.The instance generation aims to generate hard problem instances that could not be solved well by the current portfolio; while the portfolio construction aims to find new component solver for the portfolio to better solve these challenging instances.Competition in this game drives the portfolio to satisfactorily solve more and more problem instances, leading to a better and better generalization performance.To the best of our knowledge, this is the first work that simultaneously considers solver construction and instance generation.In the experiments, in comparison with previous approaches, GAST consistently built parallel portfolios with much better generalization across different experimental scenarios, and the portfolios could even achieve the performance level of parallel solvers designed by human experts.
The remainder of this paper is organized as follows.Section II reviews previous related work.In Section III first the problem of parallel portfolio construction is described, and then the general framework of GAST is presented.After that, in Section IV GAST is further instantiated for TSP and SAT.In Section V, the advantages of GAST will be demonstrated through comparison against other portfolio construction methods in data-scarce and data-biased scenarios.In this section the portfolios generated by GAST would also be compared against state-of-the-art manually designed parallel solvers.Finally, conclusions and future work will be drawn in Section VI.

A. Automatic Solver Construction
Investigations on automatic solver construction were initiated by the attempts on automatic algorithm configuration (AAC) [9].A number of algorithm configuraotrs (i.e., AAC methods), ParamILS [6], GGA [10], irace [8] and SMAC [11], have been developed in the past decade.All these methods can be viewed as sharing a common iterative search framework, i.e., candidate configurations are generated and tested iteratively.The biggest difference between them lies in the ways of generating candidate configurations.ParamILS and GGA utilize direct search methods, i.e., an iterated local search algorithm and a gender-based genetic algorithm respectively, to search the configuration spaces, while SMAC and irace both rely on built meta models to guide the sampling of the configuration spaces.With the effective algorithm configurators, there were later attempts on automatically constructing sequential solvers.One prominent example is SATenstein [12], in which ParamILS was used to construct an effective solver for SAT based on a highly parameterized solver framework.Another example is AutoMOEAs [13], in which high-performance multi-objective Evolutionary Algorithms (MOEAs) for the multi-objective permutation flow-shop problems were built by irace with a configuration space defined on a highly parameterized MOEA framework.
By considering more complicated structures of solvers, research evolved into the realm of automatic portfolio construction (APC), i.e., the targeted object is no longer a single solver, but is a portfolio of solvers that are chosen from a configuration space.Such a setting essentially means the search space considered by APC is generally much larger than that considered in the case of constructing sequential solvers, providing more degree of freedom on the resultant solvers and hopefully leading to better performance.According to the ways of using the resultant portfolios to solve a new problem instance, APC was further developed along several directions.Cedalion [16] is a notable approach for constructing portfolios with scheduling for the planning problem, which runs its component planners sequentially with pre-allocated time budgets.For portfolios with selectors which select a single best solver from its component solvers to solve a given problem instance, there are two representative approaches dubbed Hydra [14] and ISAC [15].Hydra constructs a portfolio iteratively by finding a configuration in each iteration that maximizes marginal contribution to the current portfolio, while ISAC clusters the training instances based on features and independently runs an algorithm configurator on each cluster.The basic ideas of Hydra and ISAC were later adapted to be used in constructing parallel portfolios, thus resulting in two new approaches PARHYDRA and CLUSTERING [17].Another key approach for constructing parallel portfolios is PCIT [18], which also adopts an instance grouping strategy like CLUSTERING but will adjust the grouping by transferring instances between subsets in the construction process.Note that how to evaluate candidate portfolios in the construction process depends on the ways of using the resultant portfolios; therefore the latter should be taken into account in the design of an APC approach.
As above mentioned, currently all investigations on automatic solver construction require that a training set is given, and it is assumed that the training set is a (representative) part of the target use cases.Hence, it is non-surprising that most of the above approaches were justified on well-investigated computationally hard problems, such as the planning problems [16], SAT [12], [14], [15], [17], [18], and TSP [18], since for these problems there are quite a few benchmark suites.For these approaches the training set and the test set for empirical studies were usually obtained by randomly and evenly splitting an existing benchmark set into two disjoint sets, such that the training instances can represent the test instances well.However, as aforementioned such a setting could not be always appropriate since in some cases we might only have scarce and biased training instances.

B. Problem Instance Generation
The lack of instances, though less discussed in the context of automatic solver construction, has attracted much attention from the perspective of empirical evaluation of solvers.In this area, the main goal is to automatically generate problem instances with diverse characteristics such as hardness and problem features.Various instance generation methods have been proposed to problem domains such as TSP [40]- [45], SAT [40], [46], job shop scheduling problems [47], constraint satisfaction problems (CSP) [40], [48], graph-coloring problems [21] and bin-packing problems [49].The generated instances are usually further used for comprehensive analysis of the strengths and weaknesses of existing solvers [40]- [43], [45], [48], [49], algorithm performance prediction [41], [42], [45] and algorithm enhancement [44], [47].

C. Generative Adversarial Networks
The general idea of GAST is similar to Generative Adversarial Networks (GAN) [50].GANs also maintain an adversarial game in which a discriminator is trained to distinguish real samples from fake samples synthesized by a generator, and the generator is trained to deceive the discriminator by producing ever more realistic samples.However, there are some main differences between GAST and GANs.First, the overall goals of them are different.GANs focus on the generative models that could capture the distribution of complicated real-world data.For GAST, the main goal is to build powerful parallel portfolios (analogous to the discriminative models in GANs); while the instance generation module as well as the generated instances are more like by-products.Second, domains to which GAST and GANs are applicable are different.Currently GANs (and the more general idea of adversarial learning) are mostly successfully applied to vision-related domains, such as image generation [51], [52], imgae dehazing [53], style transfer [54], [55], image classification [56] and clustering [57], [58].In comparison, GAST is proposed for problem-solving domains such as planning and optimization.Third, the main technical issues in two areas are different.The ones faced by GANs are the difficulties in modeling complex and large-scale realworld data sets (e.g., mode collapse problem) as well as optimizing large-scale deep neural networks used in GANs.It has been observed that appropriate hyper-parameters are crucial for GANs to work well, and there have been a lot of efforts [59]- [61] dedicated to overcoming these difficulties.For GAST, the main difficulties lie in two aspects: (i) how to generate useful instances for portfolio construction; (ii) how to appropriately integrate the instance generation into portfolio construction process, such that the portfolio's generalization performance would be kept getting improved.

A. Parallel Portfolios
A parallel portfolio with k component solvers is denoted as a k-tuple c 1:k = (c 1 , ..., c k ), in which c i represents the i-th component solver of c 1:k .When solving a problem instance, all component solvers of c 1:k , i.e., c 1 , ..., c k , are run independently in parallel until some termination condition is met.Here the termination condition may vary according to the problem domains considered and the performance metrics of interest.When a decision problem (e.g., SAT) is considered, all component solvers will be terminated once any of them outputs an answer to the instance, i.e., SATISFIABLE or UNSATISFIABLE.In this case the runtime needed by c 1:k to solve the instance is the runtime needed by the best component solver for solving this instance.Moreover, usually a cutoff time, i.e., maximum runtime, will be introduced in this case to prevent the solution process from being prohibitively long in which no component solver could solve the problem instance.On the other hand, if an optimization problem (e.g., TSP) is considered, the termination conditions are different according to the performance metrics of interest.If the metric considered is the runtime needed to find a good enough solution of accepted quality level (e.g., within a predefined gap to the optimum), the termination condition is that any of the component solvers finds such a solution.As in the case of decision problem, a cut-off time could be introduced in this case to prevent the solution process from being prohibitively long.If the metric considered is the quality of the best solution found within a time budget, each component solver will be terminated when the time budget is exhausted and the best solution among the ones found by the component solvers will be returned as the output of c 1:k .
Overall, the performance of c 1:k on an instance s, denoted as P (c 1:k , s), is the best performance achieved among c 1 , ..., c k on s: where m(c j , s) is the performance of c j on s according to a performance metric m (e.g., runtime or solution quality).Without loss of generality, we assume a smaller value is better for m.Note that in practice when an optimization problem is considered, the runtime metric might not be measurable.The reason is that usually we do not know whether the found solutions by the component solvers are of accepted quality levels (thus terminating all component solvers), since the optimal solutions of the problem instances are unknown.However, this does not affect the above definition.The performance of c 1:k on an instance set I is an aggregated value of the performances of c 1:k on all instances in I. Specifically, the following weighted average function, which is widely used in the literature, is used for calculating the performance of c 1:k on I, i.e., P (c 1:k , I): where |I| refers to the number of the instances in I and the weight w is introduced to handle different scales of the performances on different instances (usually used when m is related to solution quality).

B. The Problem of Parallel Portfolio Construction
When constructing a portfolio c 1:k with automatic algorithm configuration, each component solver of c 1:k is an individual configuration selected from a configuration space C, i.e., c 1 , ..., c k ∈ C. C is induced by a set of parameterized solvers B, called base solvers.As illustrated in Fig. 1, if there is only one base solver, the configuration space is exactly the solver's parameter space; otherwise the configuration space takes each base solver's parameter space as a subspace, and would include an additional top-level parameter to decide which subspace (base solver) would be used.The full configuration space of

C. Generative Adversarial Solver Trainer (GAST)
Overall, there are two key design principles for GAST.The first concerns generating useful training instances.Nonrepresentative training set generally means that some target cases are not covered.It is thus necessary to generate additional training instances.On the other hand, the instances that are out of the training set but can already be solved well by the solvers being constructed are actually of no use for improving the generalization of the solvers.Hence, a desirable generated instance should be not present in the training set and meanwhile hard for the solvers being constructed.
The second principle concerns the complementarity [14], [15], [17], [18], [62] among the component solvers, which is crucial for the effectiveness of any parallel portfolio.According to Eq. ( 1), the performance of a parallel portfolio on an instance depends on the best-performing component solver on the instance.Since it is unlikely that a unique component solver performs the best on all instances, it is more desirable that different component solvers are good at solving different problem instances.In other words, GAST should promote different component solvers to handle different instances.
The pseudo-code of GAST is given in Algorithm 1. Overall, GAST has an iterative structure and each iteration of GAST consists of two subsequent phases: a configuration phase (lines 3-7) and an instance-generation phase (lines 9-23).The configuration phase is similar to PARHYDRA [17] in which component solvers of c 1:k are configured iteratively.More specifically, in the i-th iteration, GAST uses an algorithm configurator (AC in Algorithm 1) with time budget t C to configure c i to add to the current portfolio c 1:i−1 , i.e., (c 1 , ..., c i−1 ), Algorithm 1 GAST Input: base solvers B with configuration space C; number of component solvers k; instance set I; performance metric m; algorithm configurator AC; independent configurator runs n; time budgets t C , t V , t I for configuration, validation and instance generation respectively Output: parallel portfolio c 1:k /*----configuration phase-------*/ while time spent in this phase not exceeds t I do 13: for each s ∈ I do such that the performance of the resulting portfolio c 1:i , i.e., (c 1 , ..., c i ), on instance set I, is optimized (line 4).During the configuration process of c i (line 4), GAST would run the entire portfolio on the considered instances while only c i is available to be configured, leaving (c 1 , ..., c i−1 ) fixed.In other words, in each iteration GAST aims to find a configuration that maximizes marginal performance contribution across the configurations identified in the previous iterations.Since generic algorithm configurators are usually randomized methods, to ensure the reliability of the outputs of the algorithm configutor AC, following the established best practices [6], [11], GAST always performs n independent runs of AC when configuring c i (line 3) and thus obtains n different portfolios produced by these runs, i.e., c 1 1:i , ..., c n 1:i .These portfolios are then tested on I with time budget t V (line 6) and the one achieving the best validation performance will be retained (line 7).
The instance-generation phase begins once the configuration phase finishes.Note in the last iteration (i.e., the k-th iteration) of GAST, instance generation is skipped (line 9) because there is no need to generate more instances since c 1:k has been completely constructed.In the instance-generation phase GAST first creates a backup of the training set I (line 11) that will be restored to the training set at the end of this phase (line 23), and then enters an iterative process in which GAST repeatedly generates new instances based on current training set I (lines [12][13][14][15][16][17][18], tests these new instances with current portfolio c 1:i (line 19) and uses them to update the instance set I (lines 20-21), until the time spent for generating instances reaches budget t I (line 12).More specifically, to generate a new instance s new , GAST uses an existing instance s in I as a base instance, and randomly selects a set of instances from I excluding s as the reference instances (ref set in line 15).s new is then generated by modifying s with random perturbation and insertion of structures/components extracted from the reference instances (by the variation procedure in line 16).Taking each instance in I as the base instance (line 14), GAST eventually generates a set of new instances I new .Instances generated in this way are expected to differ significantly from the existing instances in I, but at the same time would preserve some characteristics of the existing ones.This is desirable because generating too similar instances to existing ones is not useful for exploring the instance space, which is crucial for improving the generalization of the portfolio being constructed, and generating instances completely unrelated to existing ones could result in instances of no interests, e.g., instances with no practical significance.Moreover, since each existing instance in I is used as the base instance to generate new ones, the diversity in I new is expected to be enhanced.The precise definition of the modification procedure depends on the specific problem domain considered; thus it is encapsulated as the variation procedure in Algorithm 1 (line 16).A lot of existing instance variation mechanisms, which are applicable for a wide range of problem domains (see Section II), could be used to instantiate variation when applying GAST to the corresponding domains (see Section IV for the instantiations for TSP and SAT).
Another two important aspects in the instance-generation phase are the instance evaluation and the instance selection.As aforementioned, only instances that cannot be solved well by the current portfolio c 1:i is valuable for improving the generalization of the portfolio; thus in the instance-generation phase each instance is assigned with a quality score equal to the performance of the current portfolio on it (i.e., w s • P (c 1:i , s)) -the worse the performance, the higher the score (note w s is just the normalization factor to handle different scales of the performances).For the initial instances in I when entering instance-generation phase, their quality scores could be directly obtained from the validation results which were cached in the configuration phase (line 10).As for those newly generated instances, GAST will test them with c 1:i (line 19) and obtain their quality scores 2 .After that, all newly generated instances, i.e., instances in I new , are included in the training set I (line 20), and then binary tournament selection [63], which repeatedly randomly selects two instances from I and removes the one with the lower quality score, is used to remove |I new | instances from I to keep its size unchanged.
In general, GAST alternates between generating new training instances that are hard for current portfolio and configuring a new component solver to solve these instances while leaving existing component solvers clamped.In this sense, GAST always promotes the component solver being configured in the current iteration to handle the newly generated instances which are different from the ones considered in previous iterations, such that the complementarity among the component solvers of the constructed portfolio would be enhanced.

D. Discussions
Intuitively, if we consider an instance "covered" by a portfolio c 1:k as it can be solved well by c 1:k ; then the target of the construction problem considered here is to find c 1:k = (c 1 , . . ., c k ) from configuration space C with maximum coverage on the target instance space I * .Generally, the problem is NP-hard and can be approximated within 1 − 1 e + o(1) ≈ 0.632.The approximation ratio is achieved by the generic greedy method [64].More specifically, this is an iterative method which starts from an empty portfolio and at each iteration selects a configuration from C that covers the largest number of uncovered instances in I * to add to the portfolio.The iterative framework of GAST (the outermost loop in Algorithm 1) is exactly the same as the greedy method except that GAST involves an additional instancegeneration phase in each iteration.Recall that in the problem considered here we are only given a training set I that is non-representative of I * , and I * is impossible to enumerate in advance, e.g., of huge size or is changing over time.This means during the portfolio construction it is unclear which instances in I * are not covered by the current portfolio.Thus it is necessary to first identify those uncovered instances in I * for enlarging the portfolio's coverage on I * , which is exactly what the adversarial instance generation does.In comparison, existing approaches do not involve such mechanisms; thus they could only optimize the portfolio's coverage on the training set I. For the instances that are in I * but not in I, the portfolio's performance is not optimized and could be arbitrarily bad.

E. Time Complexity and Computational Costs
The most time-consuming parts of GAST are the runs of the component solvers on the problem instances, and the incurred computational costs account for the vast majority of the total costs of GAST.Therefore we analyze the time complexity of GAST in terms of the total number of the runs of the solvers.In Algorithm 1 the solvers are invoked in three places, i.e., configuration (line 4), validation (line 9) and instance generation (line 19).Recall that in the i-th iteration of GAST, there are i component solvers in the portfolio and they are always executed in parallel.Let N C , N V and N I denote the number of the runs of each component solver in configuration (line 4), validation (line 9) and instance generation (line 19),

CPU time GAST
respectively.The total number of runs of solvers in the i- where n is the number of independent configurator runs (line 3).
Considering the instance-generation phase is skipped in the last iteration of GAST, the time complexity of GAST in terms of the number of the runs of the solvers is O( Similarly, we could obtain that the time complexity of existing parallel portfolio construction approaches, i.e., PARHYDRA, GLOBAL and PCIT, are and O(kn(N C + N V )), respectively.For detailed information of how these results are derived, we refer the reader to the original papers [17] for PARHYDRA and GLOBAL and [18] for PCIT.Note that for a specific portfolio construction approach, the values of N C , N V and N I depend on the predefined time budgets t C , t V and t I , respectively, and for different approaches, t C , t V and t I could be set differently.
Given time budgets t C , t V and t I , the total CPU time consumed by GAST is The n independent runs of AC (line 4) and the validation processes (line 6) can be performed in parallel if n machines (with each of k cores) are available, in which case GAST will require k • (t C + t V ) + (k − 1) • t I wall clock time to complete.For completeness, in Table I we also list the needed CPU time for PARHYDRA, GLOBAL and PCIT, which will be referenced in the experiments (see Section V-A4).

IV. INSTANTIATIONS OF GAST FOR TSP AND SAT
In this section the variation procedure in GAST is instantiated for TSP and SAT respectively, resulting in two approaches GAST-TSP and GAST-SAT.

A. GAST-TSP
Specifically, the symmetric TSP, i.e., the distance between two cities is the same in each opposite direction, with distances in a two-dimensional Euclidean space is considered here.Each instance of such TSP is represented by a list of (x, y) coordinates with each coordinate as a city.We extended the variation strategy used in [40] which requires the base instances and the reference instances have the same size (i.e., number of the cities) to allow the use of instances of different sizes.Specifically, given a base instance s, and a reference instance s * (meaning GAST-TSP requires only one reference instance in ref set; see lines 15-16 in Algorithm 1), the variation procedure in GAST-TSP applies a variable-length crossover and a uniform mutation to s and s * to generate a new instance.Let |s| and |s * | be the length of the coordinate list of s and s * respectively.The crossover first randomly selects min{|s|, |s * |} − 1 split points in both lists, and then constructs a new coordinate list (i.e., the new instance s new ) in a sequential manner by choosing each segment from either of the two lists with equal probability.The new list is then subject to the mutation operator that replaces each coordinate in the list, with a probability 1/|s new | 1 2 , with a coordinate uniform randomly chosen within the ranges bounded by the minimum and the maximum values of the coordinates in the lists of s and s * .

B. GAST-SAT
The variation procedure in GAST-SAT utilizes the spig technique proposed by [46], which iteratively removes particular components (bounded together through a core variable) from the base instance s and then inserts such structures extracted from the reference instances into s.The generated instance will only be accepted by spig if all of its features are within σ standard deviations of the mean value across all instances (including s and the reference instances).The value of σ is set to a quite small value, i.e., 3, by [46] for generating similar enough instances to the existing ones, which obviously is not our goal here.We thus set σ as a random variable whose value is randomly sampled from [3,300] for each acceptance check to introduce more randomness in the generated instances.To prevent the runtime of spig from being too long, the size of the reference instance set, i.e., |ref set|, is set to |I| 1 2 , where |I| is the size of the training set.

V. EXPERIMENTS
We conducted experiments on SAT and TSP.Following the common scheme, in the experiments we used GAST to build parallel portfolios based on a training set, and then compared them against the ones constructed by the existing approaches, on an unseen test set.

A. Experimental Setup 1) Portfolio Size and Performance Metric:
We set the number of component solvers k to 4, since 4-core machines are widely available now.The optimization goal considered here is the runtime needed by a solver to solve the problem instances (for SAT) or to find the optimums of the problem instances (for TSP).In particular, we set m to Penalized Average Runtime-10 (PAR-10) [6], which is the average runtime over all the test runs, where those unsuccessful runs (unable to solve the given instance within the cut-off time) are counted as 10 times the cut-off time.Note for PAR-10 the weight w in Eq. ( 1) is set to 1.The optimal solutions for TSP instances were obtained using Concorde [67], an exact TSP solver.
2) Instance Sets: Since we focus on the scenarios where the available training instances are non-representative, it is very important to decide an appropriate way to choose the instances.We used two different ways to obtain the instances, thus dividing our experiments into two parts.In the first part we obtained instances through instance generators and evaluated GAST-TSP in this part because for TSP there exist generators that could generate instances with diverse characteristics.Specifically we used the portgen and the portcgen generators from the 8th DIMACS Implementation Challenge [68] to generate 150 "uniform" instances (in which the cities are randomly distributed) and 150 "clustering" instances (in which the cities are distributed around different central points) to form a set of 300 instances, denoted as TSP whole .The problem sizes of all these generated instances are within [400,600].
The instance-generation way has two potential issues.First the generated instances might be far away from the realworld cases, thus making the evaluation on them not of practical significance.Second, since GAST also involves instance generation (see Algorithm 1), there is a possibility that the underlying generation model in GAST is similar to the instance generators used here.To avoid these issues, in the second part we only obtained the instances from the industrial benchmark suites and evaluated GAST-SAT in this part.Specifically, we obtained two industrial benchmarks, IBM Hardware Verification (HV) benchmark, and Bounded Model Checking (BMC) benchmark, from the Algorithm Configuration Library (AClib) [7], and randomly selected 150 instances from each of the two sets to form a set of 300 instances, denoted as SAT whole .
3) Experimental Scenarios: For brevity, we only describe how we split TSP whole here.For SAT whole the same procedure was conducted.We split TSP whole into training sets and test sets in two different ways, for simulating two different cases.The first case "SMALL" means the available training set contains only a small number of instances.In this case we randomly selected 1/6 instances (50 in total) from TSP whole as training instances and used the left instances (250 in total) as test instances.The second case "BIAS" means the training instances are biased to narrowly defined cases.In this case from TSP whole we randomly selected 1/3 instances from one of the two types of the instances (50 in total; recall that there are 150 "uniform" instances and 150 "clustering" instances in TSP whole ) as training instances, and used the left instances (250 in total) as test instances.
Since the above split procedure is randomized and the choices of training/test instances would obviously affect the performances of portfolio construction approaches, to ensure the reliability of our experiments, we repeated the above split procedure for 4 times for each of "SMALL" and "BIAS" cases, which eventually gave us 8 different experimental scenarios, with each of a unique pair of training set and test set, for each of TSP and SAT domains.For convenience, we use TSP-SMALL/BIAS-1/2/3/4 and SAT-SMALL/BIAS-1/2/3/4 to denote these scenarios.Moreover we use TSP-SMALL to denote a set of 4 scenarios {TSP-SMALL-1/2/3/4}, and the same rule applies to TSP-BIAS, SAT-SMALL and SAT-BIAS.
Table II summarizes the instance sets, the cut-off time and the base solvers used in different scenarios.The base solver used in TSP-SMALL/BIAS was LKH version 2.0.7 [65] (with 23 parameters), one of the state-of-the-art inexact solver for TSP.The base solver used in SAT-SMALL/BIAS was lingeling-ala [66] (with 118 parameters), the gold medal winning solver in the application track of the 2011 SAT Competition.

4) Competitors and Time Budgets:
We compared GAST against the state-of-the-art automatic construction approaches for parallel portfolios: GLOBAL, PARHYDRA [17] and PCIT [18].For all considered approaches here, SMAC version 2.10.07 [11] was used as the algorithm configurator.Since the performance of SMAC could be often improved when used with the instance features, we gave SMAC access to the 126 SAT features and the 114 TSP features used in [18].The detailed setting of the time budget for each approach is given in Table III.Overall, in the experiments GAST would consume around 10% more CPU time than other approaches for the construction of parallel portfolios.
All the experiments were conducted on a cluster of 3 Intel Xeon machines with 128 GB RAM and 24 cores each (2.20 GHz, 30 MB Cache), running Centos 7.5.The entire experiments took almost 2.5 months to complete.

B. Results and Analysis
In each experimental scenario we tested each obtained portfolio by running it on the test set for 50 times (for TSP) and for 5 times (for SAT).The mean ± stddev of the test performances (PAR-10 score over the test instances) across these runs, and the total number of timeouts (#TOs), are presented in the PAR-10 † columns and the #TOs columns respectively in Table IV.The validation performances of the portfolios constructed by PARHYDRA, GLOBAL and PCIT (except for GAST since it keeps changing the training set) on the training sets are also reported in the PAR-10 columns in Table IV.To determine whether the differences between the test performances (i.e., the PAR-10 † columns) were significant, we performed a Wilcoxon signed-rank test (with significance level p = 0.05) to them in each scenario, and a PAR-10 score is indicated in bold face if it was not significantly different from the best test PAR-10 score of the scenario.
Overall GAST is the best-performing approach in Table IV and in most cases it constructed significantly and substantially better portfolios than other approaches.Since PARHYDRA could be seen as a variant of GAST without the instance generation mechanism (see Section III-C), the superior performances of GAST over PARHYDRA indicates the effectiveness of the instance generation for improving the portfolio's generalization.Moreover, recall that we used generated instances for TSP and industrial instances for SAT, the consistent strong performances of GAST on both domains indicate that generating new instances through recombination of existing instances and random perturbation (as in GAST) is a robust and effective way for training data augmentation.Another important observation from Table IV is that for existing approaches the gaps between the validation performances and the test performances are usually very large; this is conceivable since in the experiments the training set is expected to be nonrepresentative, which in turn indicates the necessity of instance generation in this case.

C. Comparison against PARHYDRA when k is larger than 4
Both GAST and PARHYDRA are iterative approaches, adding one component solver to the portfolio per iteration.To investigate how they would perform when the number of iterations (i.e., portfolio size) gets larger, we run GAST and PARHYDRA for 8 iterations (k = 8) in 4 scenarios TSP/SAT-SMALL/BIAS-1. Let GAST i and PARHYDRA i denote the resultant portfolios at the end of the i-th iteration of GAST and PARHYDRA respectively.In each scenario we tested the corresponding GAST i and PARHYDRA i with i = 1, ..., 8 on the test instances, and let P[portfolio, scenario] be the average test result in terms of PAR-10 scores.For example, P[GAST 2 , TSP-SMALL-1] is the average PAR-10 score of the test result of GAST 2 , i.e., the resultant portfolio at the end of the second iteration of GAST in TSP-SMALL-1, on the test instances of TSP-SMALL-1.The results are plotted along the number of iterations in Fig. 2.There are three observations from Fig. 2. First, for both GAST and PARHYDRA, the test performance   improves monotonically from one iteration to the next.This is reasonable because adding a component solver to an existing portfolio would result in a new portfolio that is theoretically not worse (mostly better) than the original portfolio.Second, for both GAST and PARHYDRA, as the number of iterations increases, the benefits of adding new component solvers gradually decrease.Especially when number of iterations becomes larger than 5, the performance improvement is very small.This is conceivable because the performance of the portfolio becomes better and better as the number of iterations increases, which in turn makes it more difficult to further improve the performance of the portfolio.Third, GAST could usually achieve larger performance improvements than PARHYDRA.For example, in SAT-BIAS-1, in the earlier iterations GAST achieved remarkable performance improvements in comparison with PARHYDRA.This clearly shows the performance of PARHYDRA is limited by the non-representative training data while GAST could break such limitation with the instance generation mechanisms.

D. Comparison against PARHYDRA with Augmented Training Sets
Since GAST generates instances for configuring the portfolios while the existing approaches do not involve any instance generation, in this sense GAST actually uses much more instances than other approaches for construction.A natural question is that, if given enough generated instances, how will existing approaches perform when compared against GAST?If the former could reach (or even exceed) the performance level of GAST, it could be concluded that it is unnecessary to realize instance generation and portfolio construction simultaneously in an adversarial framework (as in GAST); instead, directly generating enough instances and then using existing portfolio construction approaches to build portfolios on them is already good enough for handling data-scarce/biased scenarios.
To answer this question, in each of the 8 SAT scenarios, i.e., SAT-SMALL/BIAS-1/2/3/4, we used the same instance generation procedure as in GAST (Lines 13-18 in Algorithm 1) to generate a large set of instances based on the training set.The size of the generated set is 5 times the size of the training set.Recall that the training set contains 50 instances, we thus obtained an augmented training set of 300 instances in each SAT scenario, and then PARHYDRA was used to construct a parallel portfolio on these augmented training sets, and then the obtained portfolio was tested on the test sets.As before, each portfolio was tested by running it on the test set for 5 times.The mean ± stddev of the test PAR-10 scores across the 5 runs are presented in the "PARHYDRA-A" column in Table V.
For the sake of comparison, the test performances of the portfolios constructed by GAST and PARHYDRA (without augmented training sets) in SAT-SMALL/BIAS-1/2/3/4, which are originally presented in Table IV, are also presented in Table V.It could be seen from Table V that even with augmented training sets, PARHYDRA still could not reach the performance levels of GAST.Note in SAT-SMALL-3, SAT-BIAS-2/4, when using generated instances, the performance of PARHYDRA would even deteriorate.The key for training set augmentation is which kinds of generated instances should be used.GAST generates instances in an adversarial process where only the hard instances for the current portfolio are selected because on them there is high opportunity for improvement.This could be seen as a guided sampling in the instance space, which always seeks to find areas not covered by the portfolio yet.On the other hand, treating data augmentation and portfolio construction as two sequential and independent phases, i.e., generating enough training instances and then using PARHYDRA to build portfolios on them, lacks such guidance and might cause useless training set, which might be harmful for the portfolio construction (as in the cases of SAT-SMALL-3 and SAT-BIAS-2/4).Overall, GAST is more effective at data augmentation and thus performs better.

E. Comparison against Hand-Designed Parallel Solvers
To further evaluate the portfolios constructed by GAST, we compared them against the state-of-the-art manually designed parallel solvers.Specifically, we considered the ones constructed for SAT.We tested each of the 8 portfolios constructed by GAST in SAT-SMALL/BIAS-1/2/3/4 on the entire SAT instance set, i.e., SAT whole , and reported the best, the worst and the median performance (in terms of PAR-10) achieved among these portfolios in Table VI.For manually designed solvers, we chose Plingeling-ala [66], which is the official parallel version of lingeling-ala (the base solver in all the SAT As shown in Table VI, the portfolios constructed by GAST always perform better than pfolioUZK and in most cases perform better than Plingeling-ala.It is impressive to see that in the best case, the portoflio constructed by GAST (regardless of its simple parallel-solving strategy) could reach the performance level of the more state-of-the-art Plingelingbbc, and moreover the performance difference between them is statistically insignificant.Such results indicate that GAST could identify powerful parallel portfolios, with little human effort involved.It is expected that given more state-of-the-art base solvers, e.g., lingeling-bbc, GAST could deliver parallel portfolios with even better performance.

VI. CONCLUSIONS AND DIRECTIONS FOR FURTHER RESEARCH
This paper proposed a novel approach, dubbed GAST, for the automatic construction of parallel portfolios in environments where the training sets are non-representative.The most novel feature of GAST is that, different from existing approaches, it considers instance generation and portfolio construction simultaneously in an adversarial process.Instantiations of GAST for TSP and SAT were also proposed.Experiment results showed that GAST could identify parallel portfolios with much better generalization than the ones generated by existing approaches when the training data was scarce and biased.Moreover, it was further demonstrated that the generated portfolios could reach the performance level of the state-of-the-art parallel solvers designed by human experts.Further directions for investigations may include: • Further improvements to GAST.Diversity preservation scheme, such as speciation [71] or negatively correlated search [72] can be introduced into GAST to explicitly promote cooperation between different component solvers.• Deeper understanding of the foundations of GAST.For example, GAST actually maintains two adversary sets competing against one another, which is a typical scenario where the game theory can be applied.• Other more general issues in training instance augmentation, e.g., similarity measure between problem instances and instance space characterizing, are also worthy of exploration.

c 1 :
k is C k = k i=1 {c|c ∈ C}, where the product of two configuration spaces A and B is the Cartesian product of A and B, i.e., A × B = {(a, b)|a ∈ A and b ∈ B}.In other words, the size of the full configuration space for c 1:k is |C| k .Given the above definitions, the parallel portfolio construction problem considered here can be stated as follows.Given a possibly non-representative training set I, a performance metric m, a set of parameterized base solvers B and the configuration space C induced by B, select configurations c 1 , ..., c k from C to form a parallel portfolio c 1:k , such that c 1:k can generalize well, i.e., achieve good P (c 1:k , I * ) on the target set I * , which is impossible to enumerate in advance, e.g., of huge size or is changing over time.

3 :
for j ← 1 : n do 4: obtain a portfolio c j 1:i by running AC on configuration space {c 1:i−1 } × {c|c ∈ C} using m for time t C , ..., c n 1:i on I using m for time t V 7: let c 1:i ← arg min c j 1:i |j∈{1,...,n} P (c j 1:i , I) be the portfolio with the best validation performance 8: /*----instance-generation phase-----*/ 9: if i = k then break //skip instance generation 10: according to the validation results, assign the quality score of each s ∈ I as w s • P (c 1:i , s) 11: Ī ← I 12:

15 :
ref set ← randomly sample from I \ {s} 16: s new ← variation(s, ref set) 17: I new ← I new ∪ {s new } 18: end for 19: test c 1:i with each s ∈ I new and assign the quality score of s as w s • P (c 1:i , s) 20: I ← I ∪ I new 21: remove |I new | instances from I with binary tournament selection 22: end while 23: I ← I ∪ Ī 24: end for 25: return c 1:k , SAT-BIAS-1] P[PARHYDRA , SAT-BIAS-1]

Fig. 2 .
Fig. 2. Test performance (in terms of average PAR-10 scores) progress when k = 8 for GAST and PARHYDRA in 4 scenarios (TSP/SAT-SMALL/BIAS-1) along the number of iterations.GAST i and PARHYDRA i are the resultant portfolios at the end of the i-th iteration of GAST and PARHYDRA respectively.P[portfolio, scenario] is the average test result in terms of PAR-10 scores on the test set in the scenario.

TABLE I COMPUTATIONAL
COSTS FOR EXISTING PARALLEL PORTFOLIO CONSTRUCTION APPROACHES.t C , t V , t I ARE TIME BUDGETS FOR CONFIGURATION, VALIDATION AND INSTANCE GENERATION RESPECTIVELY.k IS THE PORTFOLIO SIZE.n IS THE NUMBER OF INDEPENDENT RUNS OF THE USED ALGORITHM CONFIGURATOR.NOTE THAT FOR DIFFERENT APPROACHES t C , t V COULD BE SET TO DIFFERENT VALUES.

TABLE II SUMMARY
OF THE INSTANCE SETS, THE CUT-OFF TIME AND THE BASE SOLVER IN EACH SCENARIO.

DETAILED
TIME BUDGET IN TERMS OF HOURS OF CPU TIME FOR EACH APPROACH IN EACH SCENARIO."TSP" REPRESENTS 8 SCENARIOS TSP-SMALL/BIAS-1/2/3/4 AND "SAT" REPRESENTS 8 SCENARIOS SAT-SMALL/BIAS-1/2/3/4. IN THE EXPERIMENTS THE NUMBER OF INDEPENDENT RUNS OF ALGORITHM CONFIGURATOR n FOR ALL APPROACHES WERE SET TO 10. t C , t V , t I ARE TIME BUDGETS FOR CONFIGURATION, VALIDATION AND INSTANCE GENERATION RESPECTIVELY.SEE TABLE I FOR HOW TO ESTIMATE THE NEEDED CPU

TABLE IV RESULTS
OF VALIDATION AND TESTING IN THE 16 EXPERIMENTAL SCENARIOS.VALIDATION PERFORMANCES IN TERMS OF PAR-10 SCORES OVER THE TRAINING SET ARE PRESENTED IN THE PAR-10 COLUMNS.TEST PERFORMANCES IN TERMS OF MEAN ± STDDEV OF THE PAR-10 SCORES ACROSS THE 50 RUNS (FOR TSP) AND THE 5 RUNS (FOR SAT) OVER THE TEST SET ARE PRESENTED IN THE PAR-10 † COLUMNS.THE TOTAL NUMBER OF TIMEOUTS (#TOS) IN TESTING IS PRESENTED IN THE #TOS COLUMNS.THE NAME OF THE CONSTRUCTION APPROACH IS USED TO DENOTE THE PORTFOLIOS CONSTRUCTED BY IT.THE TEST PAR-10 SCORE OF A PORTFOLIO IS SHOWN IN BOLDFACE IF IT WAS NOT SIGNIFICANTLY DIFFERENT FROM THE BEST TEST PERFORMANCE IN THE SCENARIO (ACCORDING TO A WILCOXON SIGNED-RANK TEST WITH p = 0.05).

TABLE V TEST
PERFORMANCES IN TERMS OF MEAN ± STDDEV OF THE PAR-10 SCORES ACROSS THE 5 RUNS OVER THE TEST SET IN THE 8 SAT SCENARIOS, I.E., SAT-SMALL-BIAS-1/2/3/4. THE NAME OF THE CONSTRUCTION APPROACH IS USED TO DENOTE THE PORTFOLIOS CONSTRUCTED BY IT. "PARHYDRA-A" REFERS TO PARHYDRA CONFIGURING BASED ON AUGMENTED TRAINING SETS.A PAR-10 SCORE IS SHOWN IN BOLDFACE IF IT WAS NOT SIGNIFICANTLY DIFFERENT FROM THE BEST TEST PERFORMANCE IN THE SCENARIO (ACCORDING TO A WILCOXON SIGNED-RANK TEST WITH p = 0.05).

TABLE VI TEST
[70]ORMANCES IN TERMS OF MEAN ± STDDEV OF THE PAR-10 SCORES ACROSS THE 5 RUNS OVER SATWHOLE ."UZK"REFERSTOPFOLIOUZK."ALA"REFERSTOPLINGELING-ALA."BBC"REFERSTOPLINGELING-BBC."GAST-B","GAST-W",AND"GAST-M"REFERSTOTHEBEST, THE WORST AND THE MEDIAN PERFORMANCE ACHIEVED AMONG THE 8 PORTFOLIOS CONSTRUCTED BY GAST IN SAT-SMALL/BIAS-1/2/3/4.A PAR-10 SCORE IS SHOWN IN BOLDFACE IF IT WAS NOT SIGNIFICANTLY DIFFERENT FROM THE BEST TEST PERFORMANCE (ACCORDING TO A WILCOXON SIGNED-RANK TEST WITH p = 0.05).inourexperiments),pfolioUZK[69], the gold medal winning solver of the parallel track of the SAT'12 Challenge, and Plingeling-bbc[70], the gold medal winning solver of the parallel track of the SAT'16 Competition.Note that all the manually designed solvers considered here have implemented far more advanced parallel solving strategies (e.g., clause sharing) than only independently running component solvers in parallel.The default settings of these solvers were used and all of them were tested on SAT whole .The test performances are presented in TableVI.As before, we performed a Wilcoxon signed-rank test (with significance level p = 0.05) to the test performances, and a PAR-10 score is indicated in bold face if it was not significantly different from the best test performance. scenarios