A Log Parsing Framework for ALICE O2 Facilities

The ALICE (A Large Ion Collider Experiment) detector at the European Organization for Nuclear Research (CERN) generates a substantial volume of experimental data, demanding efficient online and offline processing. To enhance the stability and reliability of the ALICE computing system, this study introduces an Artificial Intelligence-based logging system designed to detect, identify, and resolve issues through the analysis of system runtime information contained in logs. Existing online log parsing methods, however, often lack full automation and generality, relying instead on manual parameter definition and regular expressions that are better suited for static logs. In this study, we propose a novel and fully automated online log parsing framework for ALICE O2 (Online-Offline). To overcome key challenges, we employ the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to create ground truth, employ genetic programming to generate regular expressions, utilize the Artificial Bee Colony (ABC) algorithm for hyperparameter optimization, and implement a log template reduction algorithm to reduce similarity among log templates. Our framework’s effectiveness is validated through experiments on 5 benchmark log datasets and ALICE application logs, comparing its performance with the state-of-art online log parsing framework, Drain. The empirical results demonstrate the automated nature of our approach and its ability to achieve accurate parsing with high accuracy (i.e., 99.89% on the ALICE application log).


I. INTRODUCTION
The European Organization for Nuclear Research, or CERN, is the largest particle physics laboratory in the world. At CERN the fundamental constituents of the universe and their interactions are studied in order to gain an understanding of the fundamental rules of nature. The Large Hadron Collider (LHC) has been built for this purpose, and four large particle detectors operate at the LHC to analyze particle collisions from various perspectives and using various technologies. Among them, ALICE (A Large Ion Collider Experiment) [1] is the one devoted to the study of ion collisions and optimized to investigate the physics of interacting matter at high energy densities, where quark-gluon plasma The associate editor coordinating the review of this manuscript and approving it for publication was Ikramullah Lali. is generated. ALICE also has a significant physics program which focuses on proton-proton and proton-ion collisions.
To handle the significantly increased data rate from the detectors, which is approximately two orders of magnitude larger than before LS2, ALICE implemented the O 2 (Online-Offline) [2] computing system. O 2 is a combined online and offline system specifically developed for the upgraded ALICE detector. ALICE O 2 can read and analyze around 27 Tb/s of raw data from the detectors. Given the tremendous amount of data, the system must be closely monitored to ensure high availability and low failure. To this end, ALICE O 2 requires not only a monitoring system based on performance measurements but also a real-time monitoring system based on the system log, as the system log messages can describe the events in the system while it is in operation.
In the upgraded ALICE O 2 , the AI-Based Logging System has been introduced to achieve a more effective and intelligent logging system. BELK stack, which is a collection of four open-source tools consisting of Beats, Logstash, Elasticsearch, and Kibana, is utilized to underpin the entire system's architecture. BELK Stack enables users to explore, analyze, and visualize data in real-time, regardless of its source or format.
As indicated in Fig 1, the system architecture comprises four modules that constitute an intelligent logging system equipped with BELK stack. Firstly, each log shipper transfers log files from the FLP node to the log preprocessing module. Each logstash node then preprocesses data by converting it into a specified format and forwarding log messages to the log parser. Secondly, the log parser module parses the logs of several machines with varying operating systems, changes each log message into a unique event template acceptable for the machine learning module, and sends it to storage. Thirdly, Elasticsearch is responsible for indexing and storing the template in Elasticsearch. The machine learning module can be subdivided into the anomaly detection module, the failure prediction module, and the survival analysis module. Lastly, Kibana assists users in creating visualization and an alert system. If the system detects an anomaly, an alert will be generated and presented to the administrator in Kibana.
In this work, we focus on the log parser module in the AI-based logging system for ALICE O 2 facilities. The log parser module is a potent module for extracting the event by automatically separating the constant part from the variable part of raw log messages from different machines, and then transforming each log message into a specific event template. Various data mining techniques are applied to the template to build an automated log analysis, namely anomaly detection [3], [4], network security [5], fault diagnosis [6], etc.
Numerous studies have been conducted on log parsing approaches that can be implemented across a wide range of system types, operating systems, and technological capabilities. Many traditional log parsers can simply rely on regular expressions since the pattern of log statements does not change too often. However, with growing advancement of software systems and services, the generation of log statements becomes more complex and dynamic; thus, an automated log parser is required.
To be more specific, we list the necessities of an automated log parser as follows. First, the volume of log grows rapidly due to the uninterrupted operation of modern software systems [7]. Second, the structure of modern services and applications has become more complex because certain source codes are developed by hundreds of developers worldwide. Especially since the prevalence of open-source platforms such as GitHub and current Web services, developers frequently incorporate them into their systems which makes this a challenging task for the individual who maintains the parsing rule, as they may not be aware of the log's functionality. Third, log statements are now updated frequently, with approximately thousands of log statements being modified each month [8]. Due to the above reasons, the automated log parser is superior to the conventional one because it employs data-driven methods to automatically extract log template files from raw log messages.
In this paper, we present a solution for the log parsing module in the AI-based logging system for ALICE O 2 facilities to automatically interpret the unstructured log of various log files. Most existing log parsers work offline, requiring the use of collected logs for a period of time to train the model. Traditional online log parsing methods are not totally automated; some require human inputs of regular expressions to modify raw logs, and others require predetermined parameters to reach the highest accuracy. These log parsers are suitable for static logs in stable systems.
However, the ALICE O 2 is currently under development, thus the log messages are changing frequently. Thus, the ALICE O 2 needs an online log parser to parse logs in a streaming manner. Consequently, some online log parsing technique has high accuracy but is not fully automated and had to be modified [9]. Our log parsing framework does not require any predefined parameters or manually labeled ground truth. Fully automatic log parsers can extract log templates from raw log messages. After the log has been parsed, we perform the template reduction to boost the model's accuracy. Then, we comprehensively assess our approach on 6 distinct log datasets (i.e., 5 benchmark datasets and 1 dataset from the CERN application) and evaluate the findings in terms of accuracy, robustness, and efficiency. Our framework consistently outperforms the other algorithms and achieves a parsing accuracy of 99.89% on ALICE application logs.
In summary, our fully automated log parsing framework for ALICE O 2 facilities offers the following contributions: • Automatic optimization of the log parsing model's parameters for each log dataset through the application of an Artificial Bee Colony (ABC) algorithm, which serves as an automatic hyperparameter tuning technique.
• Effective extraction of variable parts and templates from log messages using the Term Frequency-Inverse Document Frequency (TF-IDF) method, enabling automatic identification of crucial information within the logs.
• Automated construction of refined regular expressions derived from the extracted variables in the log messages, facilitated by genetic programming techniques.
• Enhanced accuracy achieved by merging similar log templates using a template reduction algorithm, which leads to improved log parsing and anomaly detection capabilities. The remaining sections are organized as follows. In Section II, we summarize the related works of the log parsing framework. Section III explains in detail our proposed framework for a fully automated log parser. In Section IV, we describe our experiments, analytical results, and discussion. Section V finally concludes the paper.

II. LITERATURE REVIEW
In this section, we describe concepts and approaches related to log parsing. Firstly, we present an overview of log analysis. Then, offline log parsing is introduced, followed by online log parsing. Thirdly, techniques and frameworks for fully automated log parsing are presented. Finally, at the end of this section, we provide a comprehensive summary of the literature review and list the limitations of existing approaches.

A. LOG ANALYSIS
Logs are essential for the development and maintenance of software systems and devices. The majority of logs contain system runtime information, allowing the developer and support staff to comprehend the system information and track down any potential issues. With the advent of data mining, automated log analysis has been developed to enhance system administration and diagnostic operations. The log analysis process consists of four steps: log collection, log parsing, matrix building, and log mining.
Raw log messages are often unstructured, as developers can put free-text log messages in the system's source code. Given such nature of log messages, parsing becomes a crucial step to transform them into structured data. The output after parsing must be precise enough, or otherwise, the subsequent steps such as log mining could fail [10]. For instance, a fourpercent error rate might result in a tenfold decrease in the performance of anomaly detection. The majority of these log mining strategies require structured data as model input (e.g., a matrix or a log event list).
The conventional method of log parsing heavily relies on the developer who is responsible for designing and maintaining the regular expressions used to extract the log events and structured logs [11]. Unlike conventional log parsers, which heavily rely on regular expressions, automated log parsers are better suited to logs created by contemporary software systems and services. Automated log parsers utilize data-driven technologies that automatically extract log templates from raw log messages. Figure 2 depicts Simplified Apache Mesos raw log messages gathered on the slave node of the CERN FLP. It can be noticed that a log parser transforms each raw log message into a log template and structured logs throughout the parsing process. The log template comprises: i) the constant component of each log message that is comparable to the template in the log group and is the same for every occurrence (e.g., ''New master discovered at *'', and ii) the variable part in structured logs (e.g. ''master@188.184.86.204:5080''). We note that ''*'' represents the location of each variable.
According to the study by Zhu et al. [12], the automated log parser can be classified into two primary modes: offline and online, depending upon how the processing of log messages is conducted. Related works about offline and online log parsing are discussed in the following subsections (II-B, II-C).

B. OFFLINE LOG PARSING
In offline mode, developers or systems must collect all log data and transfer it to a single computer (e.g., a centralized server) prior to parsing and generating parsing rules. However, offline log parsers must be updated every time a new log statement is generated, as the parsing rules may not recognize the new log statements.
Numerous offline log parsing methods adopt clustering concepts to perform their task since there exist similarity patterns among system log messages. For example, Fu et al. [13] grouped log entries by calculating their similarities, such as weighted edit distance and distance function, and then extracted the log event or template. However, the accuracy was not high, and its running time was longer than the other methods. Likewise, Tang et al. [14] proposed a technique which leverages message signatures to cluster log messages into a predetermined number of groups. Yet, results reveal that the method is not suitable for fully automated systems.
Rather than focusing solely on the similarity among messages, Hamooni et al. [15] further take into account the hierarchy of log messages to generate event templates. To achieve their purpose, they apply hierarchical clustering to group log messages in a bottom-to-top fashion. These aforementioned approaches, however, yielded low accuracy with notable variations in the standard deviation of the parsing accuracy. Furthermore, they demonstrated limited performance when dealing with diverse log types and exhibited suboptimal results in such cases. The inability of these approaches to effectively handle the wide range of log variations highlights the need for a more robust and adaptable log parsing framework.
Due to the shortcomings of clustering-based approaches, new approaches, such as those that employ heuristics or optimization techniques, have been developed. In particular, Makanju et al. [16] proposed Lightweight Algorithm -an offline log parser which extracts message type from System Application Logs (IPLoM). The method is based on heuristics constructed in accordance with the characteristics of the raw log messages. On the other hand, the recent study by Messaoudi [17] created the log parser as multiple objective optimizations and solved it by an evolutionary algorithm, (i.e. Non-Dominated Sorting Genetic Algorithm 2 or NSGA-II for short). Nevertheless, both studies [16], [17] present drawbacks due to the limited computational capability of a single machine, as shown by an increasing parser runtime as the size of log grows.

C. ONLINE LOG PARSING
In contrast to offline log parsers, online log parsers are useful for distributed software systems that require spontaneous monitoring and repair, allowing log parsers to parse log messages in a streaming manner. Various methods that process incoming streams of messages include [9], [18], [19].
Specifically, in the study by Mizutani et al. [18], the proposed method computes the similarity of newly arriving log messages and existing templates. If the log message matches with the template, it will be added to an existing cluster; otherwise, a new cluster is formed and the event template will be modified accordingly. Similar approach was also carried out by Du et al. [19]; however, the longest common subsequence (LSC) technique was used to parse the logs. Although existing online parsers achieved high parsing precision on particular datasets, their parsing accuracy varied across various datasets, indicating that they were not robust. For instance, results shown in the study by Spell [19] demonstrate an average accuracy of less than 80%. Besides, the study did not restrict the depth of its prefix tree and calculated the longest common subsequence between two log messages, which was time-consuming.
He et al. [9] presented an online log parsing approach called Drain that is capable of parsing logs in a streaming way. Drain employs a parse tree with a predefined depth to expedite the parsing process. By adopting a fixed depth parse tree, Drain is able to direct log messages to the appropriate log group and prevent an excessively deep or imbalanced tree. The method was executed with high accuracy, and its efficiency was reasonable. Moreover, the study highlighted the significance of log parsing research and the scarcity of publicly available tools and benchmark datasets [12]. They presented an implementation of 13 log parsing methods and evaluated them using 16 log datasets from diverse software systems. The results of their study indicate that Drain [9] emerges as the top-performing method, excelling in terms of parsing accuracy, robustness, and efficiency.
Yet, the disadvantage of this method is that the hyperparameters must be tuned to achieve high accuracy and high computing time. Therefore, Drain is not entirely automated since the system requires hyperparameter tuning to achieve high accuracy, ground truth to correctly interpret unfamiliar logs, and regular expressions that must be introduced manually at the initial step. Further, it is important to note that ALICE O 2 is still under development, thus inevitably resulting in dynamic changes in system logs. For these reasons, the standard Drain approach is not practical in the ALICE O 2 .

D. TECHNIQUES FOR FULLY AUTOMATED LOG PARSING
To fully automate the log parsing process, it is essential to accurately extract the ground truth from each log, which includes identifying the log template and variables such as numbers, time-stamps, or IP addresses. The correctness of ground truth extraction plays a crucial role in the effectiveness of an automated log parser. The ground truth serves as a reference for training and evaluating the log parser's performance, allowing for iterative improvements and fine-tuning of the parsing algorithm.
We observed that the Term Frequency-Inverse Document Frequency (TF-IDF), a popular approach used in natural language processing, could differentiate between the variable and the log template. The TF-IDF was applied to detect unusual behavior from network access logs [20]. It was computed after mapping the term and document ideas to the port number and daily access history, respectively. Moreover, TF-IDF can be applied together with other methods such as Byte Pair Encoding (BPE) -this technique has proven useful in various downstream machine learning algorithms.
It is worth noticing that using Separate-and-Conquer genetic programming, Bartoli et al. [21], [22] achieved significant accuracy in the automatic production of the regular expression which is needed for our proposed automated log parsing framework. The works by Bartoli showcase the effectiveness of genetic programming in learning regular expressions from examples. Drawing upon these insights, our online log parsing framework can leverage this approach to automatically generate regular expressions tailored to specific log types, eliminating the need for manual intervention.
This capability allows our framework to adapt to dynamic logs, enhancing both parsing accuracy and efficiency. As a result, our proposed log parsing framework can seamlessly process logs in a streaming manner, without relying on predefined hyperparameters or fixed regular expressions. This flexibility is particularly well-suited for ALICE O 2 facilities, where applications and components undergo continuous changes.

E. LIMITATIONS OF EXISTING LOG PARSING METHODS
While previous studies have made significant contributions to log parsing techniques, some limitations can be identified. These limitations include: • Lack of focus on dynamic log parsing: Many existing log parsing approaches are designed for static logs in stable systems. They may not adequately address the challenges posed by dynamically changing log messages, such as those encountered in the ALICE O 2 facilities.
• Limited scalability: Some log parsing methods may struggle to scale effectively when faced with large-scale log datasets or high data rates. This can impact the efficiency and real-time processing capabilities of log parsing algorithms.
• Manual intervention and parameter tuning: Several log parsing techniques rely on manual intervention, such as the definition of regular expressions or the setting of predefined parameters. This manual effort can be timeconsuming, less flexible, and may not adapt well to evolving log structures and patterns.
• Lack of comprehensive evaluation: Some previous studies may not provide a comprehensive evaluation of the proposed log parsing techniques. This can limit our understanding of their performance, robustness, and applicability in real-world scenarios. Our framework in this study aims to build a completely automated online log parser on top of Drain [9]. We include the following tasks: optimizing parsing tree parameters, generating ground truth, building regular expressions, and improving accuracy by lowering the number of template groups in order to ensure the aforementioned limitations are addressed.

III. OUR PROPOSED LOG PARSING FRAMEWORK
Considering the limitations identified in the literature review regarding existing log parsing techniques, we propose a novel log parsing framework. This framework incorporates various components, including automatic parameter optimization using Artificial Bee Colony as a hyperparameter tuning technique, automatic extraction of variable parts and templates utilizing TF-IDF (Term Frequency-Inverse Document Frequency), automatic construction of regular expressions through genetic programming, and enhanced accuracy achieved through a template reduction algorithm. With these advancements, we aim to put forth the following hypotheses: • Hypothesis 1: The automatic optimization of the log parsing model's parameters for each log using the Artificial Bee Colony algorithm will result in improved performance compared to other tuning models.
• Hypothesis 2: The automatic extraction of variable parts and templates from log messages using TF-IDF will lead to efficient and accurate log parsing, capturing relevant information while disregarding noise and irrelevant data.
• Hypothesis 3: The automatic construction of regular expressions distilled from the variables in the log messages using genetic programming will enable effective pattern recognition and enhance the log parsing process.
• Hypothesis 4: The merging of similar log templates using the template reduction algorithm will contribute to improved accuracy by reducing redundancy, enhancing efficiency, and providing a more concise representation of log information. Overall, the proposed fully automated log parsing framework aims to optimize the accuracy and efficiency of log parsing in ALICE O 2 facilities, and we hypothesize that it will significantly improve the log parsing process compared to traditional manual approaches. The ALICE computing system acquires a major upgrade every few years, while minor upgrades occur throughout the detector's operation. The AI-based logging system is a part of the monitoring system for ALICE O 2 facilities, where the monitoring system provides comprehensive functionality in the metric collection, processing, storage, and anomaly detection. We proposed an effective framework for log parsing for online jobs, particularly in the AI-based logging system. Whether the logs are changing or new application logs are being added to the system, the framework must be compatible with the developing cycles. Through the installation of FileBeat, a lightweight shipper for sending and centralizing log data, we deployed it on each node of an ALICE O 2 FLP cluster. As an agent on the cluster, FileBeat monitors the operating system and service logs. Fig. 3 illustrates the overall model of the proposed log parsing framework which is divided into 2 parts: constructing a log parsing model and applying the log parsing model.
The model construction phase contains 5 main step, including data preparation, word embeddings, modeling, evaluation, and output as illustrated in Fig. 3. In Step 1 Data Preparation, log collection of the log parsing framework will be retrieved by LogStash as represented in Fig. 1 via REST API. Then each log dataset is split into a training set and a test set. In Step 2 Word Embedding involves feeding the collected log into TF-IDF by embedding raw log messages as tokens. This requires tokenization of log lines to divide them into discrete tokens, after which the log's ground truth and list of variables will be constructed and sent to the log parsing model and auto regular expression model, respectively.
Step 3 consists of modeling: the log parsing 69444 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. model using the concept [9] is trained. Its hyperparameter is optimized through a hyperparameter tuning algorithm, and the number of templates is reduced before being tested in the evaluation step (Step 4). In Step 5 (Output Processing), matrixes of log template are generated for the machine learning module described in Fig. 1.
The right side of Fig. 3 depicts the production stage of our log parser. Each type of log has its own pipeline, which Logstash uses to feed into the framework. The log parsing model is fed by three components: the preprocessed log, regular expressions from the auto regex generator, and an optimal set of hyperparameters from hyperparameter tuning optimization. The model's log template is then reassembled by the template reduction algorithm and returned to log storage. The details of constructing a log parsing model are described in the next subsections.

A. DATA PREPARATION
To ensure accurate log parsing, the first step is to prepare the data. The raw log messages are typically in an unstructured format. Using predefined regular expressions, we extract the log content of interest while excluding header information such as log level, username, and role name. The header information follows a consistent format within the same system, allowing for direct extraction.
For our study, we utilize a combination of benchmark datasets from LogHub [23] and ALICE O 2 application logs. The LogHub datasets encompass various systems, including distributed systems, operating systems, mobile systems, server applications, and standalone software. We select six log datasets from LogHub and ALICE O 2 application logs, and split them into an 80% training set and a 20% test set. Previous log parsing studies [9], [10], [12], [19] have utilized these datasets.
The benchmark datasets used are as follows: • Hadoop distributed file system (HDFS) log messages were gathered from over 200 Amazon WC2 nodes and processed into the log template.
• ZooKeeper is another distributed system log given by The Chinese University of Hong Kong.
• Linux is a type of operating system log that was gathered on a Linux server from the Public Security Log Sharing Site initiative.
• Android framework is a mobile system log.
• Health App is an application log on Android devices.
• CERN App is an application log messages of ALICE O 2 facilities which are streamed by Logstash from the database. It is the log that CERN's developers produce to monitor when the detector is being operated. Table 1 provides an overview of the datasets used, including the number of log messages and file sizes. The datasets exhibit variation in the number of templates, ranging from a minimum of 30 templates in HDFS to a maximum of 76,923 templates in the Android datasets. The length of a log message is determined by the number of tokens it contains. Table 1 also provides insights into the maximum and average number of tokens in different log types, as indicated by the Max Length and Average Length columns, respectively. Log message lengths typically range from 4 to 14 words, although certain datasets may have log messages spanning hundreds of words. The log content primarily consists of legible free text, with minimal variation across datasets.
To assess the accuracy and efficiency of our log parser, LogHub selected a subset of 2,000 log messages from each dataset and manually labeled the corresponding log templates as the ground truth. Table 1 includes the number of log templates in these labeled subsets as No. Template (2k). We utilize these labeled data for evaluating the performance and accuracy of our log parser. The data is divided into an 80% training set for model training and a 20% test set for performance evaluation, providing a consistent benchmark environment.

B. WORD EMBEDDINGS
As indicated in the previous section, our architecture is online and capable of processing logs in real time. In order to create a fully automated log parsing framework, our model which is described in detail later in modeling requires automated labeling of the log template. This automated labeling provides a ground truth for tuning the model's hyperparameters and an automated regular expression generator, which constructs the regular expression to parse the log. This section explains the details of constructing ground truth and regular expression. Additionally, the result of this step is shown in Fig. 4.

1) TF-IDF FOR EXTRACTING VARIABLES AND GROUND TRUTH
Without labeling the raw log message, we apply the method outlined in Algorithm 1 to extract the variable that will be used to automatically generate the regular expression and log VOLUME 11, 2023 template in the following subsection. The ground truth of each log consists of a log template and a variable, which are crucial for evaluating hyperparameter tuning.
TF-IDF, a statistical metric used in text mining, is applied to proportion weight according to the number of times a word appears in the text, while also inserting an offset based on how frequently the term appears in the whole corpus. As seen in Fig. 4, the log template has the same pattern in every log, therefore we can easily identify it using TF-IDF. More specifically, the Term Frequency (TF) of a token in a log is normalized by taking the number of tokens into account, as in (1). The Inverse Document Frequency (IDF) evaluates how frequently tokens appear in the whole log to scale down tokens that exist in the entire log and scale up tokens that are unique to a few logs, as in (2). Finally, TF-IDF is a product of the two equations.

TF(t) =
Number of times token t appears in a log Total number of tokens in the log (1)

IDF(t) = log( Total number of logs Number of log with the token t in it
) (2) As aforementioned, although certain token t has a high TF value due to a frequent appearance in log messages, its TF-IDF value is weighed down since it is also present in every log message, making it impossible to distinguish the logs' content. Hence, this token t is one of the log templates. In contrast, a token t with a high TF-IDF value in each log message has higher priority. It is stated most frequently, yet does not appear frequently in log messages. Variable part of Compute the TF-IDF scores 5: if TF-IDF score ≥ c then 6: Append TF-IDF to variable_log  Fig. 4 at the Log content 1 of preprocess log, TF value of each token is the same because each token appears only once in Log content 1. However, the IDF value of ''61'' is the highest, while others are low because they appear in many logs. Therefore, the token stated in the example is the variable part.
We extract the variable component from log content in Algorithm 1 by computing the TF-IDF of each token. To distinguish between variable and log templates, the determination of TF-IDF threshold (c) is significant, as it affects parsing accuracy. If the threshold is too low, every token, including the log template, will be identified as a variable. On the other hand, if the threshold is set too high, the algorithm will assign the majority of tokens as log templates, including variables. The threshold is based on parsing accuracy in comparison to the manually labeled ground truth of the 6 datasets stated in data preparation. Fig. 5 demonstrates the average parsing accuracy of various TF-IDF score cutoffs.
It can be observed that, when the cutoff is low, the majority of tokens will be assumed as variables, resulting in low parsing accuracy. The parsing accuracy increases as the TF-IDF score rises, reaching a maximum of 0.73 when the TF-IDF value is more than or equal to 0.6. When the cutoff value reaches the end, the accuracy gradually declines to 0.51. Consequently, a TF-IDF score of 0.6 is the optimal threshold to separate between a log template and a variable.

2) AUTO REGULAR EXPRESSION GENERATOR
The regular expression is an important feature of the log parsing mechanism described later in our model. It can improve parsing accuracy by giving regular expressions that represent commonly used variables such as id, username, or timestamps for the second layer of modeling. Normally, these regular expressions must be provided by the users' domain expertise, which is not practical for the ALICE O 2 system because it 69446 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  changes constantly. As a result, we developed an automated regular expression generator to generate regular expressions from highlighted variables. The variable components of the log messages that were split automatically in the preceding stage are combined together to generate a parameter list that will be sent to the auto regular expression generator without the need for any domain expertise. Throughout the procedure, the parameter list in each log template is utilized to build the regular expression. The method of the regular expression generator is illustrated in Algorithm 2.
The generator takes as input: • The log template, which consists of template ID, template, and occurrences.
• The log structure that was modified in the preceding section.
• The tuned parameters, which consist of the number of population (n pop ), the number of iterations to perform (n gen ), and the number of times that an individual remains unchanged (n stop ). Furthermore, Fig. 4 provides an example of log content that has been automatically transformed into JSON format, accompanied by the corresponding regular expression for the highlighted variable part in the log content.
The regular expression generator function is derived from the Genetic Programming-based method [22] (GP). Unlike methods that require mathematical model attributes for optimization, GP can optimize many objectives simultaneously. To infer regular expressions for text extraction from samples, a sophisticated issue must be solved that optimizes both continuous and discrete functions as well as multi-objective problems. Furthermore, the GP is useful since the GP tree's search space is very large and there are a large number of factors involved. Thus, GP is appropriate for evolving regular expressions in many populations, followed by a composition module that composes two provided regular expressions in different presets and chooses the composition that performs best on a validation set. The criteria for selecting the two particular expressions to be fed to the composition module from the final population are not provided. Individual possible solutions for a problem, known as individuals, are

Algorithm 2 Generate the Regular Expressions
Input: Dataframe of the log template df_template and data frame of log structured df_structure n pop =500 n gen =1000 n stop =200 Output: List of regular expressions regex_list 1: Initialize empty list regex_list to store regex patterns 2: for each templateID in df_template do 3: Select first 50 rows of df_structure 4: if parameterList of the selected log is empty then 5: Skip to the next templateID in df_template 6: end if 7: for each index in the parameterList of df_structure do 8: Create a new column in the selected df_structure using the index as the column name and fill it with the corresponding parameterList token 9: if any of the tokens in the new column match any of the regex patterns in regex_list then 10: Continue searching the next index 11: else if no match is found in regex_list then 12: Highlight the token by finding the start and end position of the token in the selected df_structurecontent 13: Convert to JSON file 14: A tree is generated from the highlighted token 15: initial population basing on the training set τ 16: for a number of n gen iterations have been performed or the fitness tuple of the best individual has remained unchanged for n stop consecutive iterations do 17: if the number of individuals generated from the training set τ < n pop then 18: Compute the fitness function f (r) 19: pick two snippets: tree 1 and tree 2 to evolve 20: new individuals are generated 80% by crossover of tree 1 & tree 2 21: 10% by mutation of individuals 22: 10% generated randomly with a Ramped half-and-half method 23  problem-dependent numeric function known as fitness. Typically, this function is performed by calculating an individual's performance index on a specified collection of problem cases known as the learning set. The fitness function is where r is the individual associated with a fitness tuple τ . The fitness function consists of three evaluation metrics: precision (Pre(r, τ )), accuracy (Acc(r, τ )) which is an average of the True Positive Character Rate and the True Negative Character Rate, and the length of regular expressions (l(r)). A GP execution consists of a heuristic and stochastic search in the solution space for the solution with the highest fitness. Between two individuals, the one with the highest precision is considered the best; if they have the same precision, the one with the highest accuracy is recognized as the best; and if they have the same precision and accuracy, the one with the lowest l(r) is considered the best. This process consists of 4 steps: encoding the regular expression as a tree, constructing the initial population, allowing its evolution to explore the solution space, and employing the Separate-and-Conquer strategy as shown in Algorithm 2 from line 14 to line 33. The GP performs an iterative procedure consisting of 3 steps, which are: • creating new individuals from existing ones using genetic operators, typically by using crossover and mutation; • adding new individuals to the population; • removing the worst individuals. The crossover takes two individuals and generates two new ones that are identical to the input individuals except for the swapping of two randomly selected subtrees. Furthermore, the mutation takes one individual and generates a new individual identical to the input individual except for the replacement of a randomly picked subtree with a newly created subtree. Repeating the steps above until reaching specific goals, or problems are solved. For example, the fitness scores of the individuals in the population are no longer improving.
As depicted in the last box of Fig. 4, we extract the optimal regular expression for the variables part of each log template after determining the optimal expression. The regular expression generated by the automatic regular expression generator will be used as input to exclude frequently used variables in the log content, which can considerably increase parsing accuracy, as discussed in details later in the regular expression's model.

C. MODELING
In this section, we describe briefly the construction of our automated log parsing framework's model. Our log parsing model is separated into 6 layers composed of hyperparameter tuning, regular expression, length, token, output, and template reduction, as demonstrated in Fig 6. When the training set arrives, our model will determine the three optimal hyperparameters for the log parsing model (i.e., max child, max depth, and similarity threshold) using the hyperparameter tuning algorithm. The log messages are then preprocessed using a regular expression defined in previous section. In addition, our method will look for the log group that best fits the raw log message or build a new log group. Otherwise, it will compare the raw log message with the log template saved in each log group. After the log messages reach the output layer, we apply a template reduction technique to minimize the number of log templates in order to improve parsing precision. Finally, the structured log will be forwarded to the evaluation stage. The model is trained with 80% of the data described in the data preparation. The details of each layer are as follows:

1) HYPERPARAMETER TUNING
Once the ground truth and regular expressions have been created for each type of log dataset, the log parsing model is defined using a model [24] which is suitable for online log parsing. This model utilizes a parsing tree optimized with hyperparameter optimization. Three important parameters are considered: max child, max depth, and similarity threshold (st), with their respective ranges listed in Table 2. The log message is assigned to the log group with the highest similarity score, determined by the st value. The max depth represents the maximum depth of the parsing tree, while the max child indicates the maximum number of children in each parsing tree node.
To ensure optimal performance, different services may require distinct hyperparameter values. Therefore, an Artificial Bee Colony (ABC) approach is applied to automatically optimize these parameters for each type of log. The overall procedure is illustrated in Fig. 7. The parameters are encoded as a set of target values to be optimized, represented as u = {u 1 , u 2 , u 3 } where u 1 , u 2 , and u 3 indicate max child, max depth, and st, respectively. The fitness value of a solution u is determined by the parsing accuracy (PA) can be represented as where PA represents the parsing accuracy, a performance evaluation metric that will be further explained in the evaluation section During each phase, the bees evaluate the fitness value of the food source using equation (4) by running the model and comparing it with the ground truth generated using TF-IDF. This eliminates the need for manual labeling of the ground truth for each log, as TF-IDF automates the process. In the employed bee phase, each employed bee generates a new solution using equation in where X j is the j th parameter of the current solution, and determine new solutions by greedy selection. If a solution cannot be further improved after the 7-th iteration, it is considered to have reached its maximum potential and the employed bee associated with that solution becomes a scout bee, searching for a new solution. The employed bee and scout bee phases continue until a specified number of iterations is reached or termination conditions are met, ensuring the optimization process is thorough and exhaustive.

2) REGULAR EXPRESSION
The raw log messages are processed in this step by using the regular expression. Each kind of log has its own regular expression to replace variables with asterisks to mark them as variables. In this phase, instead of providing a basic regular expression based on our knowledge to replace frequent variables with asterisks, we use the regular expression produced previous section (e.g. the timestamp or memory size is replaced). This stage is completed automatically, with no human intervention. Therefore, the developers in charge of designing the log do not need to manually update this layer.

3) LENGTH
This layer's principal function is to segregate raw log messages based on their length. The number of tokens in a log message is referred to as its length. Based on the results of the experiment, we assume that log messages with the same log template are likely to have the same log message length. For example, in Fig. 4, Log content 1 will traverse the internal node ''Length: 3''.

4) TOKEN
Once the log messages are categorized based on their length, they undergo an additional separation based on the preceding tokens in this particular stage. To illustrate, let's take a look at log content 1 in Fig. 4, which reads ''Publishing 61 Mon-itorObjects''. In this case, the traversal occurs at the token layer node ''Publishing'', which is the initial token in the log VOLUME 11, 2023 message. Depending on the maximum depth defined by the developer, there may exist multiple preceding token layers.

5) OUTPUT
The output layer is the parsing tree's leaf node, and it includes a list of log groups. The log messages in the same log template are grouped in the same log group in this layer. Then, from the log group list, it will select the most appropriate log group by applying the similarity sequence [24] to determine the similarity between the log message and the template of each log group with the ground truth that was created by using TF-IDF.

6) TEMPLATE REDUCTION
According to the results of the experiment, the parsing accuracy of some log datasets is decreased because log messages that should have the same template were assigned to a new group. Table 3 shows an example of a log template that should be merged. It can be noticed that the only difference among the three log templates is the number of items to delete (i.e. one, four, and five) while the main message is the same. Thereby, in this situation, these templates should be in the same group. The sample pseudocode of the template reduction to solve the problem is shown in Algorithm 3. First, in line 3, we sort the log template by the number of occurrences in descending order. Then, we substitute < * > with white space and remove all white space from the log template. Next, in lines 7×11, we determine whether a duplicate exists by comparing the log template to the others. If the log is duplicated, we group them together as a single template, assign a new template ID, and remove the old template.
Note that each log dataset has its own pipeline for entering our model. The scheduler has configured the log parsing model of each log dataset to re-train the model on a regular basis with the default setting of 80% and 20% of the data as training set and test set, respectively.

D. EVALUATION
To demonstrate the effectiveness of our model's performance, we employ a set of five performance metrics, namely precision, recall, f-measure, parsing accuracy, and running time. Precision, calculated using TP (true positives), FP (false positives), and FN (false negatives) as shown in equation (6), measures the accuracy of assigning log messages with the same log template to the same log group. FP occurs when log messages with different log templates are incorrectly assigned to the same log group, while FN arises when log messages with the same log template are erroneously assigned to separate log groups. Recall, as depicted in equation (7), quantifies the ability of our model to properly assign log messages using the same log template based on the total number of ground truths.
To assess the accuracy of the log parsing algorithm, we utilize the f-measure, which is computed by combining the accuracy and recall measures as shown in equation (8). This evaluation metric has been widely adopted in previous Algorithm 3 Our framework's Post-Processing Input: Dataframe of the log template df_template and data frame of log structured df_structure Output: Dataframe of the new log template new_template 1: Initialize dataframe new_template that is a copy of df_template 2: for each log template in new_template do 3: Sort the new_template based on value in column ''occurences'' in descending order 4: Replace the * with white spacer 5: Remove white space at the beginning and the end of log template 6: end for 7: if log template of new_template is duplicated then 8: Set flag as duplicate 9: Group the log template that duplicate together, assign new templateID to new_template 10: Assign new templateID to the the df_structure 11: Delete the log template that has been marked as duplicate 12: end if 13: return nerw_template studies [10], [24]. Additionally, we measure the parsing accuracy (PA), defined as the ratio of correctly parsed log messages to the total number of log messages [12].
Furthermore, we compare the performance of our proposed technique with existing log parsers [9], [24], thereby enabling a comprehensive evaluation of our model's capabilities.

E. OUTPUT PROCESSING
This phase is used to prepare the log parsing model's outputs so that they may be utilized in the log mining and anomaly detection models. The log templates are sorted based on timestamps. The sessions are structured using a predetermined field, such as the block ID in HDFS or the combination of ID, username, and facility in the application log of ALICE O 2 facilities. This field is utilized to determine the start and end of each session. The matrix is then created, with each line containing the series of log template IDs (e.g., 1 15 15 4 3 4 3 36 35 14).

IV. RESULT AND DISCUSSION
This section presents our framework's performance in parsing the log of ALICE's applications and the others indicated in data preparation. An example of the result from the log parsing model is shown in Fig 8. The findings illustrate how different logs are transformed into structured logs consisting of log templates and parameter lists for each line. First, we describe our experimental setup, then show the experiment results of our automated online log parser using 5 metrics. We evaluate the effect of our template reduction algorithm and compare our approach with the state-of-the-art online log parser.

A. EXPERIMENTAL SETUP
The reference computer in our experiments had an 8-core Intel(R) Core(TM) i5-6400 CPU, 32 GB of RAM, and 2 TB of storage, and it ran the CentOS 7.8.2003 operating system. Additionally, the results are averaged over 20 runs. To achieve the best performance of the parsing model, we applied the proposed technique described in the previous section. The optimal hyperparameters of each dataset after using ABC are shown in Table 4. The similarity threshold st determines if a raw log message matches the log templates. The maximum depth avoids branch explosion in the parsing tree which considerably improves its performance. The maximum child limits the number of children in each node of the parsing tree. If a node has reached its maximum number of children, the remaining unmatched tokens in the token layer will be gathered in the special node. In every experiment, we followed the setup outlined in Table 4. In addition, the studies are conducted on 6 different datasets, including Hadoop's distributed file system (HDFS), ZooKeeper, Linux, Android, the Health App, and the CERN App.

B. EXPERIMENTAL RESULT
Our experiment results are divided into two sections.
In section IV-B1, we examine the accuracy and efficiency of our proposed framework with and without the template reduction algorithm. In section IV-B2, we compare our performance with Drain, which is a famous online log parser.

1) THE EFFECT OF TEMPLATE REDUCTION
The log parsing performance of our proposed framework on different datasets is shown in Table 5. The performance of the log parsing approach was measured in terms of precision, recall, f-measure, accuracy, and computational times (in seconds). We separate our model into 2 parts: one without and one with template reduction to evaluate the effect of template reduction. The proposed framework with the reduction of templates has slightly higher precision, recall, f-measure, and parsing accuracy. Our log parsing framework with the template reduction reduces the number of templates in the range of 2-9 templates with increased parsing accuracy between 1% to 4%. However, the total running time of the process increases by approximately 1.55 seconds, as shown in Fig. 9, which does not significantly impair the process. Template reduction can boost the parsing accuracy because some log templates are duplication as explained in our framework by reducing log templates and regrouping them. Consequently, implementing our framework with template reduction significantly enhances accuracy at the expense of slightly increased computing time.

2) COMPARING WITH THE STATE-OF-THE-ART ONLINE LOG PARSER
We compare our proposed framework to Drain [9], which is a state-of-the-art online log parser with predefined hyperparameters. Drain has the best performance among the other online log parsing frameworks [12]. However, Drain has several limitations in terms of hyperparameter tuning in the parsing tree as well as regular expression generation. As a result, Drain is not yet entirely automated. The solution from [24] addresses only the problem of Drain hyperparameter tuning, however, it still requires manually ground truth to perform the parameter tuning and requires an automated regular expression generator. We solved both the hyperparameter and regular expression issues in our proposed framework, resulting in a completely automated online log parser. We also divided our framework into two log parsers: one without and one with template reduction. From Table 7, our proposed framework with the template reduction generally had the highest accuracy values among the different approaches on various datasets. The performance of our framework with the template reduction surpasses both the Drain with hyperparameter tuning [24] and our framework without the template reduction because it can decrease the misgrouping of log templates after the output stage. Moreover, compared to Drain [9], the proposed log parsing framework has the highest accuracy on 4 datasets, which are HDFS, Linux, Health App, and CERN App. For HDFS, both Drain and our approach achieve 100% parsing accuracy because the datasets have a relatively basic log template and are straightforward to recognize. In addition, our approach is suited for frequently-changing applications since we have automated hyperparameter tuning that optimizes the hyperparameter based on the scheduler. Nonetheless, Drain has overcome our framework on 2 datasets, including ZooKeeper and Android, since Drain employed manually labeled ground truth in Table 6 and predefined hyperparameters in the parsing tree. While our online log parsing framework is fully automated and does not need any predefined hyperparameters or regular expressions, some of the automatically generated ground truth may lead to inaccurate evaluation for hyperparameter tuning, and thus resulting in reduced parsing accuracy. Table 6 displays the percentage of manually labeled and automatically generated ground truth that corresponds to the correctness of our constructed ground truth. A reduction of 1% in the proposed framework's ground truth can result in an overall reduction of parsing accuracy by 1.35%. As a result, further improvements should be made toward enhanced ground truth creation for such log data. Table 8 shows the accuracy distribution of each online log parser across 6 log datasets. Drain has the highest accuracy on average and shows a small variance of 0.017. On one hand, our proposed framework attains an average accuracy of 0.891, slightly lower than Drain's by only 0.004. On the other hand, our framework exhibits a slightly higher variance across different datasets compared to the findings presented in [9] and [24]. This variation can be attributed to our log parsing model, which relies on the hyperparameter determined through the Artificial Bee Colony (ABC) algorithm.As demonstrated in Table 4, the similarity threshold (st) obtained by ABC for the Android log is only 0.18. When the st is too low, the log group of the log template is inaccurate which reduces the accuracy of parsing. We also observed that ABC might become trapped in the local optimum and disregard the global optimum. This is one of the limitations that requires additional development.
In addition, we evaluated the processing speed of a log parser by recording the time that each log parser takes to train the model. Drain does not consider this since it uses predefined parameters and ground truth, which do not need to be trained. Fig. 9 depicts the runtime result. The average running times of the Drain with hyperparameter tuning, the proposed online log parsing framework without template reduction, and the proposed online log parsing framework with template reduction are 419.46, 1,060.09, and 1,061.63 seconds, respectively. The Drain with hyperparameter tuning uses the least amount of time to train the model because it does not need ground truth generation. The computation time of each dataset varies based on the length of log messages and the number of log templates. The bar chart clearly demonstrates that both the hyperparameter tuning process for the log parser and our automated online log parser have significantly low computation time. Specifically, in the case of the Health App dataset, the computation time is approximately 356.27 seconds. The average length of a log message in this dataset is 4.93, and there are a total of 45 log templates.
In contrast, the Android dataset with log messages averaging 13.31 words and 166 templates requires the greatest running time of around 1,194.41 seconds. Therefore, the   computation is proportional to both the length of log messages and the number of log templates. The average time required to generate the ground truth for hyperparameter tuning is 640.33 seconds. However, template reduction adds just 1.54 seconds to the formation of ground truth.
To compare the result statistically, we employed the ANOVA test and Holm-Bonferroni method to verify the statistical significance. ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the differences between the means of two or more groups. It determines VOLUME 11, 2023 69453 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   whether the variations observed between the group means are statistically significant or if they can be attributed to random chance. The null hypothesis is whether there is a similar performance among all log parsing methods. In this study, a significance level of 0.05 was employed. The results of the ANOVA test, presented in Table 9, revealed the rejection of the null hypothesis for all log parsing techniques in terms of mean rankings of parsing accuracy.
Consequently, a post-hoc test was conducted to further examine. The null hypothesis in each pairwise is to check whether our proposed log parsing framework with the template reduction does not perform better than the others. Table 10 displays the outcomes of the Holm-Bonferroni test, with adjustments made to the p-values. The comparison involved assessing the adjusted p-value against the predetermined significance level to establish the results of the hypothesis testing. Regarding parsing accuracy, our proposed method demonstrated superiority over the hyperparameter tuning classifier for log parser [24]. However, when compared to Drain [9] across three datasets (HDFS, Zookeeper, and Android), our approach did not lead to the rejection of the null hypothesis. Conversely, our approach successfully rejected the null hypothesis in three datasets, namely Linux, Health App, and CERN App compared to Drain.

C. DISCUSSION
In this section, we discuss the advantages and limitations of our proposed log parsing framework. After conducting the experiments described in the previous section, our results validate the efficacy of our proposed framework and support the acceptance of the formulated hypotheses. The limitations identified in the literature review regarding existing log parsing techniques were the driving force behind the development of our novel log parsing framework, which integrates several crucial components. These components include automatic parameter optimization using the Artificial Bee Colony algorithm, automatic extraction of variable parts and templates through TF-IDF, automatic construction of regular expressions via genetic programming, and the utilization of a template reduction algorithm to merge similar log templates.
Firstly, our findings confirm Hypothesis 1, which posits that the automatic optimization of the log parsing model's parameters using the Artificial Bee Colony algorithm leads to improved performance compared to other tuning models. Table 11 provides valuable insights, revealing that while grid search achieves the highest accuracy on six datasets, it also exhibits the longest computational time of approximately 3,591.07 seconds. Such prolonged execution time is not practical for the dynamic ALICE O 2 system, which operates online and undergoes frequent changes. On the other hand, ABC demonstrates comparable parsing accuracy while requiring significantly less running time, approximately 419.33 seconds. Although ABC's average parsing accuracy is only 2.9% lower than that of grid search, its ability to complete the parsing process nine times faster makes it a more efficient choice. Thus, our hypothesis is accepted.
Secondly, Hypothesis 2 is supported by the results presented in Table 6, indicating that the automatic extraction of variable parts and templates using TF-IDF contributes to efficient and accurate log parsing. By leveraging the relevance of terms in log messages, TF-IDF effectively captures essential information while filtering out noise and irrelevant data.
Furthermore, our experimental results validate Hypothesis 3, which suggests that the automatic construction of regular expressions through genetic programming enables effective pattern recognition and enhances the log parsing process. This validation is supported by the comparison of our proposed model with the state-of-the-art online log parsing technique (IV-B2). Genetic programming allows the generation of regular expressions specific to each log type, resulting in improved accuracy and adaptability.
Lastly, Hypothesis 4 is confirmed by our findings in IV-B1, emphasizing the significance of the template reduction algorithm in improving accuracy, reducing redundancy, enhancing efficiency, and providing a more concise representation of log information. The merging of similar log templates optimizes the parsing process by eliminating duplication and streamlining log analysis.
Overall, the successful validation of these hypotheses demonstrates the effectiveness and superiority of our proposed fully automated log parsing framework within the context of ALICE O 2 facilities. By addressing the limitations of existing log parsing techniques and incorporating advanced methodologies, our framework offers enhanced accuracy, efficiency, and adaptability compared to traditional manual approaches.
Although our method achieves slightly lower average parsing accuracy across 6 log datasets than Drain, we attain the highest accuracy in four out of the six datasets, including HDFS, Linux, Health App, and CERN App. Note that our proposed framework is fully automated, while Drain still needs predefined hyperparameters and regular expressions to preprocess logs. Drain only achieves the highest accuracy from three out of six which are HDFS, ZooKeeper, and Android. With 98.3% and 99.89% accuracy, respectively, our technique works well with application logs from the Health App and the CERN App. Moreover, the Health App's parsing accuracy is approximately 20% better than Drain. This results from the fact that the maximum length of application log messages is short and the structure of the logs is dynamic. They have a higher number of variables compared to the others. In this case, our method handles better than Drain due to the extraction of variables.
One limitation of our framework is that it currently parses log messages independently, without considering the potential relationships between consecutive logs. In some cases, logs may need to be considered as a group, where one log might be related to the previous log. Future enhancements could be made to incorporate this contextual information into the parsing process, which may improve the accuracy and effectiveness of the framework.
Additionally, there are several factors that can cause variations in parsing accuracy and computational time.
For instance, when a log message is excessively long, it can increase the computation time required for parsing. To address this, we could implement an alert system to identify significantly longer log messages and study them before parsing them with the model. While our framework demonstrates high accuracy in parsing ALICE O 2 application logs, the accuracy may decrease as the complexity of the log increases. To mitigate this, we could consider eliminating stop words from the log messages before feeding them into the log parsing model. This would help reduce the length and complexity of the log messages, potentially improving parsing accuracy.
Another limitation is related to the creation of ground truth for evaluating the hyperparameters. In some cases, the automatically created ground truth may result in incorrect evaluations and ineffective sets of hyperparameters. Increasing the number of samples in the training set, especially in datasets with substantial disparities between our parser log template and the actual template (such as Android logs) can help address this limitation by providing more diverse samples for the algorithm to learn from and better separate variables from the log template.
These limitations provide valuable insights for future improvements and considerations in the development of our framework. By addressing these limitations, we can enhance the accuracy, efficiency, and robustness of the log parsing process. Lastly, the proposed framework could open up opportunities for improved log analysis in various domains, particularly, in information technology (IT) operations and infrastructure management, telecommunications, and cybersecurity.

V. CONCLUSION
Conclusively, our fully automated log parsing framework for ALICE O 2 facilities offers significant contributions to the field. The framework addresses the challenges of dynamically changing log messages by employing a range of techniques. Firstly, it leverages the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to automatically extract variable parts and templates from log messages. Secondly, it utilizes genetic programming to construct regular expressions tailored to specific logs, facilitating automatic log construction. Thirdly, the framework optimizes hyperparameters using the Artificial Bee Colony (ABC) algorithm, ensuring efficient log parsing model configurations. Lastly, a template reduction algorithm is employed to merge similar log templates, thereby enhancing parsing accuracy.
Through extensive experimentation on benchmark log datasets and ALICE application logs, we have demonstrated the effectiveness of our proposed framework. Compared to the state-of-the-art online log parser, our fully automated approach consistently outperforms, delivering highly accurate parsing results. Notably, on the ALICE application log dataset, our framework achieved an average parsing accuracy of approximately 99.89% with an acceptable tradeoff in computing time. These results underscore the efficiency and efficacy of our framework, particularly in handling application logs, dynamic logs, and short-length logs.
Overall, our research presents a comprehensive and automated solution for log parsing in the ALICE O 2 system. By overcoming the challenges associated with dynamically changing logs, our framework provides a reliable and efficient approach for transforming unstructured logs into structured ones. This contributes to enhancing the stability, reliability, and monitoring capabilities of the computing system. We believe that our work significantly advances the field of log parsing and holds great potential for practical applications in various domains.