Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

Search engines are nowadays widely applied to store and analyze logs generated by large-scale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach $2.07\times $ . In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO.


I. INTRODUCTION
In the era of big data and artificial intelligence, large scale distributed systems are generating tons of logs in the meantime with data processing. To efficiently operate systems, log-based applications such as anomaly detection [1]- [3] and workflow monitoring [4], [5] have become hot research topics The associate editor coordinating the review of this manuscript and approving it for publication was Stavros Souravlas .
in both academia and industry. As the underlying infrastructure of log-based applications, log search engines like Elasticsearch [6], [7] and Solr [8], [9] are widely deployed to store and analyze these ''big'' logs. To adapt to varying workload scenarios, log search engines usually expose a considerable large number of configuration parameters to developers that nearly can change all the runtime behaviors. Different settings of these configuration parameters can significantly affect the end-to-end performance of document indexing and query. Therefore, to pursue better performance for log search engines, it is necessary to delicately tune their configuration parameters.
In practice, manual configuring is time consuming and labor intensive. From our own practical experience when cooperating with one of the most famous banking companies in China, tuning parameters for their Elaticsearch cluster in production environment usually took a few to ten days. Hence, developers usually have to accept the default configurations [10]. In order to change the situation, many previous research [11]- [22] have studied how to automatically tune the configuration parameters for various applications. However, these studies usually only focus on the inherent configurations of target applications, while ignoring parameters from the runtime environment and operating systems kernels. In fact, both runtime and kernel configurations are closely related to application performance. For example, according to Elasticsearch official guides [23], in addition to Elasticsearch configuration parameters, runtime parameters such as JVM heap size (-Xms and -Xmx) and kernel parameters such as swappiness should also be carefully tuned for improving indexing speed. Besides, in our own practical experiments, we find that the median indexing throughput of Elasticsearch will sharply drop from 49261 docs/s to 7651 docs/s if we change the kernel parameter dirty_ratio to 0 while keeping all the other configurations as default (Experiments are performed on a local cluster of 3 nodes with the percolator workload, and the default values of configuration parameters are shown in Table 3. More detailed information can be found in Section V.). Therefore, the full stack configuration parameters from log search engines, runtime environment and operating system kernel should be all taken into consideration to achieve better performance.
In this paper, we focus on how to automatically tune the full stack configuration parameters for log search engines. Based on the in-depth analysis of previous studies as well as our experimental observations on a practical log search engine cluster, there are mainly three challenges to solve this problem: • Black-box objective function. To solve this problem, a straightforward method is to construct a performance prediction model first and then utilize some search based algorithms to explore the optimal configuration [11]- [14], [24]. However, due to the complex implementation of log search engines, it is very difficult if not impossible to figure out the relationship between configuration parameters and performance, and building a useful prediction model usually requires a considerable number of high-quality observations. As a result, we treat the target performance as a black-box objective function of the configuration parameters and we have to solve the formulated black-box optimization problem.
• High dimensional configuration space. The full stack configuration parameters from log search engines, JVM and kernel actually construct a high dimensional configuration space. Although traditional black-box optimization methods such as Bayesian optimization algorithm (BO) [22], [25] have been successfully applied to find best configuration for different application scenarios, these methods cannot be directly utilized to solve this high dimensional black-box optimization problem due to the curse of dimensionality. According to [26]- [28], BO does not scale beyond 10-20 parameters, which is much less than the total number of parameters of log search engines.
• Automatic configuration update. Once a new configuration is selected, we should deploy it to the cluster and then observe the performance. However, different parameters usually have quite different updating methods. In Elasticsearch, parameter index.translog.durability can be online updated through REST APIs, while changing its JVM heap size need a restart. In addition, a higher configuration dimensionality usually means more necessary observations and hence, more frequent configuration updates. In order to make developers be free of these complex and frequent operations, an automatic configuration tool is necessary to solve this high dimensional black-box optimization problem.
Unfortunately, none of previous studies can efficiently address all these challenges in the same time according to our analysis. We list the detail comparison of previous studies in Table 1, and more detail introduction of these related work can be found in Section II. Therefore, we propose HDConfigor -an automatic configuration tuning system for log search engines. HDConfigor consists of three important modules: Configuration Generator, Data Aggregator and Test Launcher. Specifically, Configuration Generator is responsible for iteratively generating a new configuration according to current observation set. The core lies in it is our modified Random EMbedding Bayesian Optimization algorithm (mREMBO), which is proposed to solve the high dimensional black-box optimization problem. Once a new configuration is generated, Test Laucher is responsible for automatically deploying this configuration and launching tests to measure the corresponding performance metrics. After that, Data Aggregator will add the configurationperformance pair into the observation set and then update on its persistent storage. Finally, as shown in Table 1, HDconfigor is able to simultaneously address all the three challenges described above to automatically tune high dimensional configuration parameters for log search engines.
In order to evaluate the effectiveness of HDConfigor, we compare mREMBO with three other widely used blackbox optimization algorithms [31]: Random search, Simulated Annealing and Bayesian optimization. Experimental results on a practical Elasticsearch cluster show that compared with the default configuration, the best relative median indexing throughput achieved by mREMBO are 2.07×, 1.46× and TABLE 1. Comparison of existing studies. We use symbol to represent yes and × to represent no. It is worth noting that the symbol in the 4th column means the work only considers either JVM parameters or kernel parameters, and in the 5th column means the work cannot effectively support high dimensional configuring space.
1.07× for three different types of workload scenarios respectively. Besides, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% throughput improvement compared to the three baseline algorithms. In summary, our work makes the following key contributions: • As shown in Table1, few previous studies consider JVM or kernel parameters. With practical experiments, we point out the necessity of considering full stack configuration parameters of log search engines, which can further improve the performance by 16.86% according to our experiments.
• We conclude that the full stack configuration space of Elasticsearch have low effective dimensions. Then based on this observation, we propose mREMBO algorithm to solve the high dimensional black-box optimization problem. The evaluation based on practical experiments shows the effectiveness of mREMBO.
• We design and implement an automatic tool HDConfigor to automatically tune high dimensional configuration parameters for Elasticsearch. By simply re-configuring HDConfigor, it is easy to support other log search engines. We will open source the codes of HDConfigor in the near future. The rest of this paper is organized as follows. Section II gives a detailed introduction of related work. Section III describes the motivation and formulates the optimization problem. Section IV introduces the design and implementation of HDConfigor. Section V describes our experimental settings. Section VI presents and discusses the experimental results. Finally, Section VII concludes the paper.

II. RELATED WORK
In this paper, we propose a high dimensional black-box optimization algorithm mREMBO and implement HDConfigor to automatically tune the full stack configuration parameters for Elasticserach. There are already some previous studies targeting on configuration parameters optimization for various objectives. In the following we will give an introduction and discussion of them.

A. PREDICTION MODEL-BASED METHODS
A straightforward method [12]- [14], [24], [32]- [34] to solve the configuration parameter optimization problem is to construct an offline prediction model first and then apply some search algorithms to online find the optimal configuration based on this prediction model. For instance, Xiong et al. [24] utilize an ensemble learning algorithm to build the performance-prediction model and leverage genetic algorithm to search the optimal configuration parameters for HBase. Similarly, for Spark clusters, Yu et al. [12] propose a hierarchical modeling method to build the prediction model and then employ genetic algorithm to find the optimal configuration. However, these prediction-model based methods usually require a considerable number of high-quality observations, which are hard to obtain especially within a high dimensional configuration space. In addition, static prediction models are not able to adapt to workload or hardware changes.
Considering the high dimensional configuration space and changing workload patterns, Jamshidi et al. [11] study the feasibility of transfer knowledge about performance models across environments. Chen et al. [35] use the dependence between configuration parameters in one system to speed up searching the optimal configuration for another system. Besides, Mahgoub et al. [36] combine workload prediction with a cost-benefit analyzer to compute the relative cost and benefit of each reconfiguration step for a future time window. However, due to the high dimensional full stack configuration parameters of Elasticsearch, trying to directly build a useful prediction model through experiments on a practical Elasticsearch cluster is still quite challenging, especially under the specified observation times constraint.
In order to address the challenge of high dimensional configuration space, comparison or rank-based methods are provided in previous studies [15]- [17], [29], [37] to select the most important parameters to shrink the original space. Cao et al. [37] show that some parameters have greater performance impact than others and focusing on a smaller number of more important parameters can speed up autotuning systems. Therefore, Nair et al. [29] propose a rankbased approach to reduce the number of training samples. Van Aken et al. [16] use a LASSO algorithm to select the most impactful knobs first and then recommend knob settings based on Gaussian Processes. Bao et al. [17] utilize a weighted Latin hypercube sampling approach to select the sample set and then build a comparison-based model to find the optimal configuration for Kafka. However, obtaining the correct importance of each parameter usually require a considerable number of high-quality observations and what is worse, the importance of parameters are quite different with different objectives and workloads.

C. BLACK-BOX METHODS
Instead of constructing a prediction model or ranking the importance of parameters, there exists a category of heuristic search algorithms that treat the objective function as a black box [18]- [22], [30], [31], [38], [39], mREMBO in HDConfigor can also be divided into this category. Random search is now widely utilized in hyper-parameter optimization for machine learning models [40]. For configuration parameters tuning, Ye and Kalyanaraman [18] propose a recursive random search algorithm based on random sampling for large-scale network parameter optimization. Zhu et al. [30] first propose a divide-and-diverge sampling method and then design a recursive bound-and-search algorithm to tune system configurations within a resource limit. Heinze et al. [38] build a monetary costs function for each parameter configuration and then utilize recursive random search to find the optimal configuration. However, these random-based methods are simple to understand but cannot utilize any useful information generated from historical observations. Therefore, Cao et al. [19] develop two versions of an enhanced multi-objective simulated annealing approach to solve configuration optimization problem with multiple hard constraints, while Wang et al. [39] provide a control-theoretic approach to continuously tune a distributed application. Further, based on the reward function calculated with observations, an enhanced reinforcement learning approach is proposed in [20] to online tune configurations for web systems, while Zhang et al. [21] design an end-to-end automatic cloud database tuning system using deep reinforcement learning. The most related work to our paper is [22]. It propose an auto-tuning algorithm that leverages Bayesian Optimization to iteratively capture posterior distributions of the configuration spaces and sequentially drive the experimentation. However, BO is restricted to problems of moderate dimension typically up to about only 10-20 [26]- [28], and hence cannot be directly used to solve the high dimensional black-box optimization problem in this paper. Therefore, we proposed mREMBO algorithm based on the Bayesian optimization algorithm to solve the high dimensional black-box optimization problem in order to automatically tune configuration parameters for Elastcisearch.

D. HARDWARE RESOURCE CONFIGURATION
In addition to system performance, configuration optimization is also utilized towards other objectives such as resource management [25], [41]- [43]. For instance, BO [25] and an augmented BO [42] are utilized to find the best configurations of cloud VMs from a broad spectrum of candidates. Klimovic et al. [43] use latent factor collaborative filtering to predict how an application will perform across different configurations and then recommend configurations of cloud compute and storage resources. In this paper, we focus on how to tune the full stack parameters from Elasticsearch, runtime and kernel and regard configurations of hardware resource as given constant. These studies can provide a strong reinforcement and complement for our work.

III. MOTIVATION AND PROBLEM FORMULATION
In this paper, we study how to automatically tune the full stack configuration parameters of Elasticsearch to improve its median indexing throughput. Although we choose Elasticsearch as the target application, it is easy to support other log search engines in HDConfigor. Detailed description about how to add support for other log search engines in HDConfigor can be found in Section IV. Besides, we choose median indexing throughput as the optimization objective because it can tell how many documents Elasticsearch can index per second, which characterizes one of the most important capabilities when Elasticsearch is deployed as a log search engine. Especially, when the the volume of log data is huge, the throughput is considered specially.

A. ELASTICSEARCH ARCHITECTURE
Elasticsearch is an open source Java-based search engine built on top of Apache Lucene [44]. It is able to index and search documents with diverse formats in near real-time. Currently, Elasticsearch is widely applied for storing and analyzing system logs. Figure 1(a) shows the architecture of an Elasticsearch cluster in detail. This cluster consists of three data nodes and one dedicated master node. Each Elasticsearch index has one primary shard and each shard has one replica considering data reliability. Shards in the same color are belonging to the same Elasticsearch index. As Elasticsearch is built on top of Lucence, each shard is actually a Lucence index which stores data of indexed documents.
In order to tune the full stack configuration parameters related to indexing throughput, we illustrate the detail indexing data flow in Figure 1(b). Once a new indexing request arrives, the coordinating node will decide its target primary shard according to a certain routing policy. After successfully indexing the corresponding new documents, the primary shard node will send this request to all its replicas. Finally, it will send back a response to the coordinating node when it has received response from all its replicas.
Next, we focus on the data flow inside each shard. When the indexing request arrives at the shard, the indexed data will first be written into the memory buffer. Considering reliability, Elasticsearch will also write the indexed data as well as operation logs into a transaction log file. After the specified refresh interval, all the newly indexed data in the memory buffer will be refreshed into a new segment file and then become searchable. It is worth noting that these newly generated segment files are still stored in the memory, and they will be flushed into the disk for persistent storage when the total size or the existing time of the transaction log file is larger than a specified threshold. Besides, for the sake of high reliability, the transaction log file also needs to be fsynced into the disk when the fsync interval is exceeds the specified maximum.
From the architecture and indexing data flow of Elasticsearch we can find that, in order to improve the median indexing throughput, we should pay special attentions to configuration parameters which are related to write, refresh, flush and fsync operations.

B. FULL STACK CONFIGURATION PARAMETERS
According to the above description of indexing data flow in Elasticsearh, configuration parameters related to indexing throughput actually exist in the runtime, kernel and Elasticsearh. For example, Figure 2 shows the median indexing throughput of Elasticsearh when the parameter threadpool.write.size varies from 1 to 4 and the JVM heap size varies from 2 GB to 8 GB, while all the others are set as defaults. More detailed description of our experimental settings can be found in Section V. As shown in Figure 2, the median indexing throughput will change from 41940 docs/s to 51861 doc/s by only tuning these two parameters. Specifically, for the same value of threadpool.write.size, tuning JVM heap size is able to improve the throughput as much as 15.6%. On the other hand, with practical experiments we also find that the indexing throughput will sharply drop from 49261 docs/s to 7651 docs/s if we change the kernel parameter dirty_ratio to 0 while keeping the other parameters as default. Therefore, all of these full stack configuration parameters from Elasticsearh, runtime and kernel should be taken into consideration to tune for a better median indexing throughput.
It is worth noting that the JVM heap size is not 8 GB in the best configuration. In detail, from Figure 2 we can find that for the same threadpool.write.size, a larger JVM heap size does not lead to a better throughput. Besides, the best JVM heap size configurations quite different across different values of threadpool.write.size. Actually, due to the complex implementation of Elasticsearh, the relationship between performance and configurations is quite difficult to predict. Hence, instead of trying to construct an explicit prediction model, we will formulate the full stack configuration parameters optimization problem as a high dimensional black-box optimization problem.

C. PROBLEM FORMULATION
In this paper, we focus on how to automatically tune the full stack configuration parameters of Elasticsearch to improve its median indexing throughput. To achieve this target, first we formulate it as a high dimensional (D dimensions in total) black-box optimization problem. Let x i indicate the i-th (i = 1, . . . , D) configuration parameter and x i takes values from a finite domain Dom(x i ). In our case, x i is either a numerical or a categorical variable. For categorical parameters, we use its index to represent the corresponding category and let x i equal to the index. Thus, the full stack configuration parameters space can be denoted as X = D i=1 Dom(x i ). We denote the median indexing throughput under configuration − → x ∈ X as f ( − → x ), our target is to find the optimal configuration − → x * that maximizes the median indexing throughput of Elasticsearch: Due to the complex implementation of Elasticsearh, we treat f (·) as a black-box function that only allows us to observe its value f ( − → x ) under a certain configuration − → x ∈ X through practical experiments. Since evaluation is expensive, the total times of observation is often restricted in practice. We define this restricted observation times as a constraint, which is denoted as OT . Now we can obtain the following D-dimensional black-box optimization problem:

IV. HDCONFIGOR ARCHITECTURE A. DESIGN OVERVIEW
In order to automatically tune the full stack configuration parameters of Elasticsearh to optimize the median indexing throughput, we design and implement HDConfigor, an automatic High Dimensional Configuration parameters tuning tool. The high-level architecture of HDConfigor is shown in Figure 3. Once the configuration optimization process is necessary to be launched, developers can specify the configuration parameter space and observation time constraint. After that, HDConfigor is able to automatically generate new configurations and update them to the Elasticsearch cluster.
To obtain the median indexing throughput under newly generated configurations, HDConfigor is also able to automatically launch performance tests utilizing benchmark tools. When the constrained observation times is reached, HDConfigor will report the best configuration to developers. In detail, HDConfigor mainly consists of three modules, namely Configuration Generator, Data Aggregator and Test Launcher. It works in an iterative way. For each iteration we have: Configuration Generator is responsible for generating a new configuration according to the current observation set stored in Data Aggregator. In order to achieve a tradeoff between exploitation and exploration, the new configuration should be either better than the currently best configuration, or help observe more information about the black-box objective function. The core lies in Configuration Generator is our modified REMBO (mREMBO) algorithm, which is proposed based on the Random EMbedding Bayesian Optimization (REMBO) algorithm [26], [27]. Instead of directly solving the high dimensional black-box optimization problem described as Problem 1, mREMBO first generates a low dimensional embedded space and then performs Bayesian optimization in this embedded space to generate new configurations.
Data Aggregator is responsible for collecting newly generated configurations from Configuration Generator and calculating the median indexing throughput utilizing performance metrics form Test Launcher. For the sake of system stability, all configuration changes should be able to rollback to the latest version. To this end, these configurationperformance pairs will be stored in a persistent database such as HBase or Elasticsearch.
Test Launcher is responsible for automatic deployment of the new configuration and tests on the Elasticsearch cluster to measure the corresponding performance metrics. It is worth noting that practical testbed may contain noises due to the performance interference generated by other co-located applications and network jitters. In order to mitigate the impact of performance variability [45], Test Launcher repeats each iteration for a certain times and send the averages of target metrics back to Data Aggregator.

B. MREMBO IN HDCONFIGOR 1) BAYESIAN OPTIMIZATION
Bayesian optimization (BO) [46]- [48] is a framework to solve optimization problem like Problem 1 where the objective function f (·) is unknown but can be observed through experiments. Two key parts of the BO framework are: a prior distribution which captures the beliefs about f (·), and an acquisition function which quantifies the expected value of VOLUME 8, 2020 BO has been demonstrated to outperform other state-ofthe-art black-box optimization algorithms when observations are expensive and observation times constraint is low. However, BO is restricted to optimization problems with up to 10-20 dimensions [26]- [28] for the following two reasons. First, the number of required observations grows exponentially with the dimensionality. Second, the global optimization for the high dimensional acquisition function is difficult to obtain. Considering the full stack configuration parameters of Elasticsearch, we need to propose an effective high dimensional black-box optimization algorithm.

2) LOW EFFECTIVE DIMENSION
The full stack configuration parameters of Elasticsearch construct a high dimensional configuration space. However, through detailed analysis of these parameters and extensive experiments on a practical Elasticsearh cluster, we conclude that there are mainly three categories of configuration parameters actually do not change the median indexing throughput: In addition, all parameters related to the CMS garbage collector will loose effect if developers choose to use G1 as the JVM garbage collector by configuration −XX:+UseG1GC. • Conflicting configurations. This kind of configurations work for the same function but are mutually exclusive. For example, kernel parameters dirty_ratio and dirty_bytes are used to set the maximum amount of system memory that can be filled with dirty pages before flushing to disk. In general, dirty_bytes works as the counterpart of dirty_ratio, and only one of them can be updated at a time. When one is written, it will be immediately active to evaluate the dirty memory limits and the other one will be set as 0 automatically.
That is to say, the high dimensional black-box objective function f (·) in Problem 1 has a feature of ''low effective dimension'' and we denote the effective dimension of f (·) as d e . Note that due to the complex relationship between performance and configurations, it is impossible to directly find out these most effective configuration parameters for Elasticsearh.

3) MREMBO
Based on the REMBO algorithm recently proposed in [26], [27], we propose a modified REMBO (mREMBO) algorithm to automatically tune high dimensional configuration parameters for Elasticsearch to improve its median indexing throughput. The key idea lying behind mREMBO is the following theorem: Theorem 1: For the given objective function f (·) with effective dimension d e and a random matrix A ∈ R D×d with independent entries sampled according to N (0, 1) and d d e , with probability 1, for any − → x ∈ X, there exists a − → y ∈ Y (Y = d j=1 Dom(y j )) such that f ( − → x ) = f (A − → y ). Proof: The proof of Theorem 1 can be found in [26], [27] and is omitted here for brevity.
Theorem 1 implies that for any optimal configuration − → x * ∈ X, there exists a corresponding point − → y * ∈ Y that f ( − → x * ) = f (A − → y * ). Therefore, for the high dimensional black-box optimization problem described as Problem 1, instead of directly optimizing f (·) in the high dimensional space X, mREMBO first introduces a random embedding matrix A ∈ R D×d (d d e ) and then performs BO to optimize function g( − → y ) = f (A − → y ) in the low dimensional embedded space Y.
The detailed process of mREMBO is described in Algorithm 1. Line 1-3 is responsible for generating the embedded space Y and specifying its bounded region set. Then in this low dimensional space Y, mREMBO performs BO algorithm as well as data transformation to iteratively generate the next best configuration for Elasticsearch (shown in line [4][5][6][7][8][9][10][11]. When the total observation times achieve the specified constraint OT , mREMBO will terminate and report the best configuration to developers. Next, we give a detailed discussion about the setting of hyper-parameters in mREMBO:

Algorithm 1 mREMBO
Input: Configuration space X, Observation times constraint OT , Embedded dimensionality d Output: Next best configuration − → x nb . 1 Transform all configurations in X to be numerical and rescale X to space X res = [−1, 1] D . 2 Generate a random matrix A ∈ R D×d with independent standard Gussian entries. 3 Set the bounded region set as 4 Initialize with random points in Y and get the initial GP model as the prior function of g(·). 5 for t=1,2,. . . ,OT do 6 Find the next best point − → y nb ∈ Y according to the current GP model and the EI acquisition function. 7 Project the product of A − → y nb into X res and then rescale to obtain the next best configuration − → x nb ∈ X. 8 Send out − → x nb to deploy and Wait for the median indexing throughput f ( − → x nb ), use it as the observation result of g( − → y nb ). 9 Augment the observation set OB t+1 = {OB t , ( − → y nb , g( − → y nb ))}. 10 Update the GP model with OB t+1 . 11 end 12 Report the best configuration.

a: PRIOR FUNCTION
As most BO studies do [22], [25] [42], [49], we also choose Gaussian Process (GP) as the prior function g( − → y ). Specifically, a GP is a distribution over functions and is defined by its mean function µ and its covariance function (also called kernel) κ. When the observation set is augmented from OB t to OB t+1 , we can easily update the GP model and then obtain the posterior distribution of function g( − → y ) according to Bayes' theorem.

b: ACQUISITION FUNCTION
Acquisition function is utilized to search the next best point − → y nb ∈ Y. Typically, acquisition functions are optimized by choosing points where the predictive mean is high (exploitation) and where the variance is large (exploration). Therefore, acquisition functions need to be carefully designed to tradeoff between exploration and exploitation. There are three main strategies to design an acquisition function [50]: Probability of Improvement (PI), Expected Improvement (EI) and Gaussian Process Upper Confidence Bound (GP-UCB). In mREMBO we also choose to use EI as the acquisition function as most BO applications do: where − → y * curb is the best point from current observation set OB t and the next best point is calculated by:

c: INITIAL POINTS
As described by Line 4 in Algorithm 1, in order to build the initial GP model for function g(·), mREMBO will generate a few random points (usually 3-5 in our experiments) in space Y and then observe the median indexing throughput under the corresponding configurations in X. It is worth noting that these initial operations only need to be executed once for the same objective function and can be done offline before the execution of mREMBO. Hence, these initial observation times generated by this step will be not calculated into the total observation times.

d: EMBEDDED DIMENSION AND BOUNDED REGION SET
The setting of embedded dimension d and the bounded region set of the embedded space Y are two of the most important hyper-parameters of mREMBO. Specifically, if we ignore the observation times constraint, we can achieve the following theorem: Theorem 2: For the given function f res (·) with effective dimension d e subject to the box constraint X res = [−1, 1] D , let − → x * res ∈ X res be an optimal point of f res (·). If the embedded matrix A is a D×d (d ≥ d e ) random matrix with independent standard Gaussian entries, then with probability at least 1− , there exists an optimizer − → y * ∈ Y such that f (A − → y * ) = f res ( − → x * res ) and − → y * 2 d e . Proof: The proof of Theorem 2 follows [26], [27] and is omitted here for brevity. Therefore, as the same with REMBO, we also set the bounded region set of the embedded space Y to be [− √ d, √ d] d . Specifically, according to Theorem 2, in order to increase the probability that Y contains the optimum (i.e., the probability of finding the optimal configuration for Elasticsearch), we should increase the embedded dimension d. However, the number of observations required by the prior distribution in mREMBO grows exponentially with the dimension and the global optimization for the high dimensional acquisition function is also difficult to obtain. Therefore, considering the specified observation times constraint, we must carefully choose the value of d. The impact of different settings of embedded dimension in mREMBO will be discussed in Section VI-A.

C. IMPLEMENTATION
We implement HDConfigor for Elasticsearch using Python 3.7 [51], considering its impressive simplicity and strong compatibility. As shown in Figure 3, in addition to iteratively generate new configurations with the mREMBO algorithm, HDConfigor is also required to be able to automatically execute configuration updates and launch performance tests on the Elasticsearch cluster. Generally speaking, VOLUME 8, 2020 configuration parameters from Elasticsearch, runtime and kernel usually have quite different updating methods. For instance, changing the Java runtime configurations need to modify the file ../elasticsearch-*/config/jvm.options and then restart the whole cluster, while updating most kernel configurations only need a sysctl -p command. As a result, we use the IT automation tool Ansible [52] to automate this complex configuration updating procedure through writing dedicated playbooks. On the other hand, Rally benchmark tool [53] is utilized in HDConfigor to measure the median indexing throughput of Elasticsearch under newly generated configuration. In order to automate these performance tests, we also write dedicated playbooks in Ansible to automatically launch Rally and collect the resulting performance metrics.
The total line of code of HDConfigor is about 1500, mainly including the mREMBO algorithm and Ansible playbooks for automatic execution. We use Python to implement the major part of HDConfigor and utilize the interface provided by the Ansible tool to achieve the automatical configuration updating function in HDConfigor. Specifically, with dedicated playbooks written in YAML for deploying Elasticsearch to the local cluster, updating configuration parameters from kernel, JVM and Elasticsearch, running benchmarks with the Rally tool, HDConfigor is able to automatically generate the next best configuration according to the mREMBO algorithm and obtain the corresponding performance metrics after testing. With these Python codes and YAML scripts, HDConfigor will output the current best configuration for Elasticsearch when the observation times meet the specified constraint.
Besides, HDConfigor can be easily extended to support automatically configuration parameters optimization for other log search engines. Taking Solr as an example, in order to add support for Solr in HDConfigor, we only need to add Solr's configuration files and modify the Ansible playbooks according to Solr's deploying and testing procedures. Other components such as mREMBO and performance metric collecting can be directly adopt. We will open source the whole project after adding support for other log search engines such as Solr.

V. EXPERIMENTAL SETUPS A. HARDWARE PLATFORM
HDConfigor is evaluated on a cluster of three nodes connected with LAN, each node has 16 GB memory and an Intel(R) Core(TM) i7-7700 processor with 4 physical cores. In our experiments, the source code of HDConfigor is deployed on one dedicated node while Elasticsearch and Rally are separately deployed on the other two nodes, in order to avoid performance interference as possible. We will modify HDConfigor to add support for a large scale Elasticsearch cluster in the future work.  In order to mitigate the impact of performance variability [54], [55] [45] resulting from workloads, networking and other hardwares, we use the median throughput as the objective performance metric and repeat all the experiments for 3 times to improve the confidence of our results and the correctness of our conclusions. A few previous work [22], [56] [57] have studied the problem that how to further reduce the performance variability, we leave these improvements as our future work.

B. BENCHMARKS
We employ Rally [53] to benchmark the Elasticsearch cluster under different configurations. Race, track and challenge are three important concepts in Rally. Specifically, a race is the execution of a benchmarking experiment. We can choose different benchmarking scenarios (called tracks) for a race and each track may have several different challenges, which record a list of operations to run such as checking cluster health, indexing, forcing merge and so on. Table 2 lists the detail information of representative online available tracks in Rally in ascending order of uncompressed data size. In our experiments, according to the hardware resources of the local cluster, we choose to use percolator, geonames and eventdata to simulate 3 different workload scenarios. Considering the potential performance fluctuation of the Internet, we download all the relevant data of these tracks to the Rally node before the experiments. In addition, we design our own challenges for each track in order to obtain the median indexing throughput with only necessary operations. To mitigate the impact of noise, we execute each race for 3 times and use the average as the observation result.

C. CONFIGURATION PARAMETERS
We choose 41 configuration parameters including numerical or categorical parameters in total according to the official guides [23], [58] as well as empirical information from our practical experiments and other developers. Instead of considering configuration parameters only from Elasticseach itself, these parameters are chosen from the full stack configuration space, namely: kernel, JVM and Elasticsearch. It is worth noting that we have already omit many evidently low effective configuration parameters, and the whole configuration space of Elasticsearh is actually much higher dimensional. We list all used parameters and give a short description in Table 3. Besides, according to our testbed and benchmark requirement, we also give the default value and value range of each parameter. These parameters can be divided into 3 categories and the detailed description are as follows.
• Kernel configuration parameters. Kernel configurations parameters can be modified by changing the content of files located in /proc/sys/ and online activated with a sysctl -p command. Linux kernel contains a wide range of parameters, covering network, disk, virtual memory and so on. As described in Section III-A, the indexing operation of Elasticsearch involves extensive memory and disk writing. Therefore, we finally select 18 parameters related to networking, disk I/O and virtual memory usage. The default value is the same with a clean system, and note that we modify the default value of proc/sys/vm/max_map_count from 65530 to 262144, according to Elasticsearch's minimum requirement.
• JVM configuration parameters. As the runtime environment of Elasticsearh, configuration parameters of JVM are highly related to the performance of Elasticsearh. Parameters can be modified offline in the jvm.options configuration file and then Elasticsearch runs as a Java process. We choose to tune 6 JVM parameters in our experiments, which are related to the configuration of Java heap and the garbage collector. Generally speaking, the initial heap size and the VOLUME 8, 2020 maximum heap size will be configured as the same value and we count them as one parameter. Besides, we choose to use G1 Garbage Collector in our experiments due to its soft real-time and high throughput.
• Elasticsearch configuration parameters. Elasticsearch itself provides quite a large number of configuration parameters to developers. In general, index-related settings can be divided as static and dynamic [23]. Static parameters need to be offline modified in the elasticsearch.yml configuration file and dynamic parameters can be online changed through REST APIs. As described in Section III-A, indexing mainly involves write, refresh, flush and fsync operations, leading to massive memory and disk writing. Therefore, we select 17 configuration parameters related to the settings of index, threadpool and translog.

D. BASELINE ALGORITHMS
HDConfigor utilizes mREMBO to solve the high dimensional black-box configuration parameters optimization problem described as Problem 1. In order to evaluate the effectiveness of mREMBO, we compare it with three other black-box optimization algorithms already widely used to tune configuration parameters for different software systems, namely Random search [18], [40], Simulated Annealing [59], [60] and Bayesian optimization [22], [61]. These baseline algorithms are also widely used in tuning hyper-parameters for machine learning models. Therefore, these algorithms are already widely utilized as baseline black-box optimization algorithms in previous work such as [17], [21], [30]. In the following, we provide a brief introduction of each algorithm and give a short description of their hyper-parameter settings in our experiments if necessary.

1) RANDOM SEARCH
Random search explores each dimension of the configuration parameters uniformly at random. With the same observation times constraint, it is more efficient than Grid search, especially in a high dimensional configuration parameters space. However, it is not guaranteed to find a better configuration than Grid Search. For each iteration, in order to generate a new random configuration, we use the randint() function in Python to generate a random value for numerical parameters and a random index for categorical parameters.
2) SIMULATED ANNEALING SA explores new configuration in an iterative way controlled by temperature. For each temperature: it first generates a new configuration through randomly altering the old one, then it observes the performance under this configuration, finally, according to a probability calculated with the observation and current temperature, it will decide whether to accept this new configuration. Utilizing the acceptance probability, SA is able to avoid getting trapped by local optimal configurations early in the process. In our experiments, we use the open source implementation from [60] and set the number of iterations to be the specified observation times constraint.

3) BAYESIAN OPTIMIZATION
BO also works in an iterative way, the main steps during each iteration is to update the probabilistic model of the objective black-box function according to current observation set, and then utilize this model and the acquisition function to determine the next configuration to be observed. The detail description of BO can be found in Section IV-B. In our experiments, we use the open source implementation of BO from [61] and use GP as its prior function, EI as its acquisition function and 3 initial points to build the initial GP model.

4) mREMBO
mREMBO proposed in this paper is to solve the high dimensional black-box configuration parameters optimization problem for Elasticsearch. It first introduces a random embedding matrix to generate a low dimensional embedded space and then performs BO on this embedded space to iteratively search for the optimal configuration. The detail description of mREMBO can also be found in Section IV-B. In our experiments, mREMBO uses the same hyper-parameter settings with BO. In addition, without specially stated, the embedded dimensionality is set to be 10 for Elasticsearh.

VI. EXPERIMENTAL RESULTS
In this section, we first evaluate the effectiveness of mREMBO through comparing it with the three other blackbox optimization algorithms. Then we show the performance of mREMBO when applied to different workload scenarios. Next, we discuss the impact of different settings of embedded dimensionality in mREMBO, and finally, we verify the necessity of full stack configuration parameters optimization in Elasticsearh.

Q1: Do previous black-box optimization algorithms already perform well in the high dimensional scenario? If not, how does mREMBO work?
To answer these question, we use the percolator track in Rally to evaluate the effectiveness of mREMBO and three other baseline black-box optimization algorithms in optimizing full stack configuration parameters for Elasticsearh. The defaults of all the 41 configuration parameters are set as Table 3 and the observation times constraint is 30. Experimental results are shown in Figure 5.
There are two key factors to measure the effectiveness of these algorithms: the best relative median indexing throughput over default and the necessary number of observation times to find this best configuration. Figure 5(a) illustrates the best relative throughput achieved along with observation times. We can find that mREMBO is able to find the best configuration with fewest observation times. In detail, the best relative throughput achieved by mREMBO, Random, SA and BO are 2.07×, 1.91×, 1.97× and 1.87× with 3, 14, 18 and 4 observations, respectively. That is to say, under the same observation times constraint, mREMBO is able to improve the median indexing throughput of Elasticsearh by up to 107.26% over the default configuration, and at least a further 10.31% improvement over default compared with the other 3 baseline algorithms. Besides, with the smallest necessary number of observation times, mREMBO is more suitable for production systems with a considerable expensive cost for each observation. This impressive improvement is achieved since by introducing a random embedding matrix, mREMBO can take full advantage of the low effective dimension feature in Elasticsearh and actually performs BO in a low dimensional embedded space to tune the full stack configuration parameters. Figure 5(b)-5(d) shows the relative throughput over default under each iteration of these 4 algorithms in detail. As can be seen in Figure 5(d), mREMBO and BO indeed work in a quite similar way to make a tradeoff between exploitation and exploration for each iteration. However, due to the course of dimension, BO is restricted to only 10-20 dimensions. Hence, it has the lowest median indexing throughput. On the other hand, although Random search and SA performs better than BO, the number of observation times may prevent them to be directly applied in optimizing full stack configuration parameters for Elasticsearh. As shown in Figure 5(b) and Figure 5(c), both algorithms need 10+ more observation times compared with mREMBO and BO to obtain their best configuration. What is worse, since Random search does not utilize any historical information and generates new configurations in a totally random way, it actually works without any reliable guarantees.
It is worth noting that both BO and mREMBO utilize 3 offline observations to build the initial GP model in our experiments. However, even taking these initial points into account, the total number of observation times spent by mREMBO on finding the best configuration is still much smaller than Random search and SA.
Q2: What is the impact of embedded dimension in mERMBO on its effectiveness? How to choose an appropriate embedded dimension for mREMBO?
The embedded dimension d is one of the most important hyper-parameters in mREMBO. According to Theorem 2, it can directly impact the probability of finding the optimal configuration for Elasticsearch. In this experiment, we use the percolator track and set d to 1, 5, 10, 15, respectively. The observation times constraints are all set to be 30.
As shown in Figure 6, d = 10 indeed performs better than all other settings under track percolator. In detail, the best relative median indexing throughput of d = 1, d = 5, d = 10, d = 15 over default are 1.44×, 1.78×, 2.07× and 1.85× utilizing at least 6, 10, 3 and 19 observations, respectively. That is to say, when the embedded dimension is not appropriately set, mREMBO will fail to find a satisfied configuration for Elasticsearch while consuming more observation times. On one hand, when d is too large, the embedded space itself will become high dimensional and the necessary number of observation times to find the best configuration may exceed the specified constraint. On the other hand, when d is too small, the embedded space is not able to cover all the effective dimensions of Elasticsearch and as a result, mREMBO will probably miss the optimal configuration. Generally, the setting of embedded dimension is closely related to the number of effective configuration parameters of the target application and its hardware platform. For the Elasticsearch testbed used in our experiments, we believe that d = 10 is an appropriate choice according to the extensive experiments under different Rally tracks.
Summaries: Under the same observation times constraint, mREMBO performs much better than Random, SA and BO. Compared with these baseline black-box optimization algorithms, mREMBO is able to achieve at least a further 10.31% improvement in throughput.

Q3: Is it necessary to consider the JVM and kernel parameters?
One of the most important contribution of this paper is that instead of only tuning Elasticsearch parameters, we also take runtime and kernel parameters into consideration and propose mREMBO to explore the best configuration in the resulting high dimensional full stack configuration parameters space. In order to verify the necessity, we evaluate the performance of mREMBO respectively working in full stack parameters space and Elasticsearch parameters space. In this experiment, we use the percolator track and set the observation times constraint to be 30. Experimental results are shown in Figure 7.
First of all, we can find that d = 5 is a better setting of embedded dimensionality when mREMBO is applied to tune only Elasticsearch parameters. As described in Section V-C, the total number of Elasticsearch parameters used in our experiments is 17 and therefore, d = 10 is certainly too large and instead we set d = 5 for this scenario. As can be seen in Figure 7, compared with the default configuration, mREMBO with d = 5 is able to improve the median indexing throughput by 90.41% through tuning Elasticsearch parameters only while tuning full stack configuration parameter can improve this throughput as much as 107.26%. That is to say, taking runtime and kernel parameters into consideration is able to achieve a 16.86% improvement over default compared with tuning Elasticsearch parameters alone.
Summaries: These results help us draw two conclusions. First, developers usually have to tune the Elasticsearch parameters to fit their own workload scenarios and cluster resources. Second, it is very necessary to optimize parameters from runtime environment, kernel and Elasticsearch, which can further improve the performance of Elasticsearch by 16.86% according to our experiments.

Q4: What is the impact of varying workload on configuration parameters tuning? How does mREMBO work under different workload secnarios?
To answer these question, we choose 3 tracks in Rally to simulate different workload scenarios: percolator, geonames and eventdata. These tracks have a quite different document number as well as uncompressed size, detailed description of these tracks can be found in Table 2. The observation times constraint for these tracks is set to be 30. Figure 8 illustrates the best relative indexing throughput achieved by mREMBO over the default configuration. Specifically, under tracks percolator, geonames and eventdata, mREMBO is able to improve the median indexing throughput by 107.26%, 46.5% and 6.75%, respectively. Therefore, mREMBO is able to be adaptive to different workload scenarios of Elasticsearh. In addition, we also present the improvement of minimum and maximum indexing throughput over default in Figure 8, and we can find that mREMBO is also with high effectiveness in optimizing these two objectives. Note that for the eventdata track, the best maximum indexing throughput is only 3.39% higher than the default. In fact, due to the huge number of documents (20 millions) as well as the large uncompressed size  (15.3 GB), the computing resource of the local single-node Elasticsearch cluster becomes the performance bottleneck. As a result, solely focusing on software configurations is not able to obtain a remarkable performance improvement any more. However, mREMBO can still improve the minimum indexing throughput by 13.15% under track eventdata.
In Table 4, we present the detailed changes of some recommended parameters for indexing optimization from the official guide [23]. These parameters are mainly related to memory and disk writing, which is consistent with the analysis of indexing data flow in Elasticsearh in Section III-A. As can be found in this table, the relationship between configurations and performance is very complex and actually varies with different workload scenarios. For example, in order to improve the indexing throughput, kernel parameter swappiness is modified to 6 under track eventdata, while almost the same with default under track percolator. Therefore, although Elasticsearh has the feature of low effective dimension, it is still impossible to directly find out its most effective configuration parameters. Instead, utilizing this feature, we propose mREMBO to solve the full stack configuration parameters optimization problem for Elasticsearch.
Summaries: Workload changes will lead to new optimal configurations because the effective configuration parameters for different workload scenarios are usually quite different. According to our experiments, mREMBO is always able to find an outstanding configuration under different workload scenarios.

VII. CONCLUSION
In this paper, we design and implement HDConfigor, an automatic full stack configuration parameters tuning tool for log search engines. HDConfigor solves the high dimensional black-box optimization problem through the proposed mREMBO algorithm. Utilizing the low effective dimensionality of Elasticsearch, mREMBO first introduces a random embedding matrix to generate an embedded space and then performs BO in this low dimensional embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor on a local Elasticsearch cluster with 3 different tracks from the Rally benchmark tool. Experimental results show that mREMBO can improve the median indexing throughput compared with default configuration by up to 107.26%, and mREMBO is able to find a configuration with at least a further 10.31% throughput improvement over default compared to Random search, Simulated Annealing and BO under the same observation times constraint. Besides, we also verify the necessity of taking runtime and kernel configuration parameters into consideration to further improve the performance.
In the future, we will add support for other log search engines such as Solr in HDConfigor and open source our codes. Besides, how to tradeoff between different objectives such as indexing and searching during the configuration tuning process is also an open interesting problem. In addition, it is also necessary to study how to mitigate performance variability resulting from workloads, configuration and networking for experimental confidence and correctness.