Network Security Situation Assessment Based on Improved WOA-SVM

Network security situation assessment is an important means of understanding the current network security situation to provide a basis for taking security measures. To address the problem that the accuracy of existing network security situation assessment methods needs to be improved, this paper proposes a network security situation assessment method based on support vector machine (SVM) optimized by whale optimization algorithm (WOA) that is improved by adaptive weight (AW) combined with simulated annealing algorithm (SA). In this method, the SVM is embedded into the fitness function calculation of the improved WOA, and the global optimization characteristics of WOA are used to determine the optimal penalty parameter $c$ and kernel function parameter $g$ of the SVM. To solve the problem of the WOA being prone to falling into local extremum and slow convergence when solving large and complex data problems, an adaptive weight is used to adjust the whale position update coefficient, and a simulated annealing algorithm (SA) is used to increase random search factors to avoid falling into local extremum, so as to improve the global optimization ability. The experimental results show that this method is feasible, can assess the network security situation more accurately, and has better convergence than other assessment algorithms based on an improved SVM.

network security situation assessment results in the form of a 103 security situation value to reflect the current network security 104 situation, which is based on the established network secu-105 rity situation indicator system and certain prior knowledge. 106 At present, network attacks are becoming increasingly diver-107 sified, complicated and random, and the network security 108 situation is a complex and constantly changing nonlinear pro-109 cess. Therefore, the use of artificial intelligence technologies 110 such as machine learning and deep learning is an inevitable 111 development trend for assessing the network security situa-112 tion. Building a highly accurate, scientific and objective net-113 work security situation assessment model is a research focus 114 of network security situation assessment. Many artificial 115 intelligence-based assessment methods for network security 116 have been proposed. They are roughly classified into three 117 types based on their theories: those based on mathematical 118 model, those based on knowledge reasoning, and those based 119 on pattern recognition. This method assesses the situation primarily by constructing 122 an assessment function, and the key is the construction of the 123 function. The most common method is the analytic hierarchy 124 process (AHP). Reference [4] proposed a network security 125 situation assessment model that uses an alarm verification 126 algorithm in conjunction with a fuzzy inference algorithm 127 to improve the analytic hierarchy process, effectively elim-128 inating the impact of false alarm information and intuitively 129 reflecting the network security situation; however, the data 130 source of this method is relatively single. Reference [5] 131 provided an assessment approach that integrates the AHP 132 with the hierarchical model of context assessment, which 133 simplifies the situation assessment problem and may reflect 134 the entire security condition of the network and better serve 135 high-level decision-making. The disadvantage of the model 136 based on a mathematical model is that there is currently no 137 objective and unified standard for function construction and 138 it is prone to being influenced by subjective human factors, 139 resulting in inaccurate evaluation results. This method primarily constructs models based on specific 142 criteria and empirical knowledge, and applies logical reason-143 ing theory to evaluate it. The evidence theory and graph mod-144 els are two of the most representative examples. Based on the 145 evidence theory, for example, [6] proposed a network security 146 threat situation assessment method based on unsupervised 147 generation reasoning, which solves the shortcomings of high 148 computational cost, time consuming and low efficiency of 149 the supervised assessment method, and can more intuitively 150 assess the overall situation of network threats; [7] studied a 151 network security situation assessment model based on DS 152 evidence theory, which used principal component analysis 153 (PCA) to preprocess the alarm data, adopted the improved DS 154 evidence theory and combined the credibility of multi-source 155 attack data to improve the alarm recognition rate. Based 156 on the graph model, for example, [8] proposed a situation assessment method using the Seeker Optimization Algorithm 158 to improve the hidden Markov model, which can more accu-159 rately assess the situation of network security, but there were irrelevant and false positive data in situation elements, which 161 need further research on observation sequence; [9] proposed 162 a network security situation assessment method with Markov 163 game model as the core and combined with four-level data 164 fusion, which considered the interaction between attackers 165 and defenders, so it was closer to reality and can assess the  This method is mostly based on machine learning theory for 175 assessment, which is also the method studied in this paper.   Aiming at the problems of low accuracy and slow conver-197 gence in the above assessment methods, this paper proposes a 198 network security situation assessment method that introduces 199 adaptive weight (AW) and simulated annealing algorithm  coefficient, and SA is used to increase the random search 211 factor to improve its global optimization ability. This assess-212 ment model can be divided into three stages, which are shown 213 in FIGURE 1. The experimental sample data are extracted from the net-216 work, the situational indicators are extracted according to the 217 situational indicator extraction principles and the situational 218 indicator system is constructed. Then the extracted sample 219 data are preprocessed. Finally, the obtained data set is divided 220 into two parts: training set and test set.

222
Two algorithms of AW and SA are used to improve the WOA 223 and the improved WOA algorithm (AWSA-WOA) is used 224 to perform an optimization search operation on the penalty 225 parameter c and the kernel function parameter g of the SVM 226 to determine the optimal combination of parameters. Then the 227 optimal combination of parameters is assigned to the model. 228

229
The preprocessed test sample data is input to the final 230 obtained SVM assessment model and the situation assess-231 ment results of the test samples are output.

234
The extraction of situation indicators is a prerequisite for 235 situation assessment, and the construction of the situation 236 index system serves as a foundation for the extraction of 237 situation elements. We must follow specific principles while 238 extracting situation indicators so that we can scientifically 239 and rationally construct a multi-directional and multi-angle 240 reflection of the network's security condition. The construction of the index system is a very complicated 244 process, but it is also a key link in situation assessment 245 and prediction. Therefore, it is vital to construct a scien-246 tific and logical index system. If the number of indicators 247 selected is large, the complexity and workload of the situa-248 tion assessment system will grow, resulting in a decline in 249 the system's efficiency and speed. On the contrary, if the 250 number of selected indicators is small, it cannot fully reflect 251 the security status of the whole network. Therefore, before 252     . It searches for the best solution 293 by mimicking the ''spiral bubble network'' search strategy 294 of humpback whales. The algorithm has the advantages of 295 few adjustment parameters, simple operation and easy under-296 standing [13]. The WOA mainly involves the following opti-297 mization steps: surround prey, spiral predation, and search for 298 prey [14]. 299

300
Humpback whales surround their prey when hunting. After 301 the humpback whale has selected the optimal position, other 302 whales will approach this position, and its position update 303 in the iterative optimization process is represented by the 304 following formulas (1) and (2).
where M * (t) is the position vector of the optimal solution at 308 the t-th iteration, and M * (t) will be updated accordingly when 309 a better solution appears in the iteration process; M (t) is the 310 position vector of the solution at the t-th iteration; D is the 311 iterative distance between the optimal solution position and 312 the current solution in the t-th iteration; the coefficient vectors 313 A and C are determined by formulas (3) and (4).
Among them, a is a constant that linearly drops from 2 to 317 0 in the iterative process, which can be expressed as a = 318 2t/maxgen, maxgen is the maximum number of iterations; r 1 319 and r 2 are random vectors in [0,1].
When the whale (searcher) gradually approaches the prey (the 322 optimal solution), the variable a will decrease accordingly, 323 and the coefficient A will also decrease linearly with the 324 variable a according to formula (3). When A is [−1, 1], 325 the next position of the new whale (searcher) can be any 326 position between the current position and the optimal position 327 (optimum solution). The whale (searcher) will attack the 328 prey (optimal solution) in a spiral way, update the position 329 according to formula (5), and gradually approach the position 330 of the prey.
where, D = |M * (t) − M (t)| represents the position distance 333 between the i-th searcher and the current optimal solution; b 334 is a constant that is used to define the shape of logarithmic 335 spiral, and l is a random number within [−1,1].

336
Humpback whales not only move in a spiral manner, but 337 also constantly narrow the search range. Therefore, assuming 338 a 50% probability of switching between the contraction sur-339 rounding mechanism and the spiral model, the whale position 340 is updated according to formulas (6) and (7).  (8).
where, M rand is a position vector randomly selected from the 352 current population (representing a random whale). shown in (9).
where, t is the number of iterations, and the maximum num-381 ber of iterations maxgen = 100.

382
The adaptive weight coefficient k(t) is substituted into 383 formulas (6), (7) and (8), the position update formulas of the 384 improved WOA algorithm are shown in (10), (11) and (12). 385 According to the above equation, AW is used to change the 389 update speed of the algorithm. Here, AW adopts a trigonomet-390 ric function, so it is periodic. When WOA is in the initial stage 391 of iterative optimization, the inertia weight coefficient k(t) is 392 at the minimum value. At this time, the position update speed 393 increases slowly, which improves the local search ability of 394 the algorithm. As the number of iterations increases, the 395 value of k(t) gradually increases, and the update speed of 396 the algorithm gradually increases, so as to enhance the global 397 search ability of the algorithm to a certain extent. Therefore, 398 adjusting the position update speed of the algorithm through 399 the weight coefficient AW can well balance the global and 400 local search ability of the algorithm. Simulated annealing algorithm (SA) is a global search algo-405 rithm extended from the local search algorithm [18], which 406 has the innate advantage of global search. In order to further 407 strengthen the global search capability of WOA, the SA 408 algorithm is introduced in WOA so as to effectively avoid 409 WOA from falling into the trap of local optimum.

410
In the WOA iterative optimization, WOA is used to deter-411 mine the individual optimal solution and the global optimal 412 solution, but if the optimal position of the population is at 413 a local extreme, the obtained optimal fitness value will also 414 tend to the local minimum, which will degrade the global 415 search performance of the algorithm. Therefore, in order to 416 avoid falling into the local extremum, the principle of SA is 417 introduced, that is, the sudden jump probability is adopted, 418 and the bad solution is accepted with the probability P i to 419 help WOA jump out of the local optimum. The probability P i 420 is determined according to formulas (13) and (14). of temperature T t are as formulas (15) and (16).  In this article, the radial basis function (also known as the 472 Gaussian kernel function) is selected. At the moment, the most popular SVM optimal parameter 486 selection methods include experience, experimental compar-487 ison, grid search and large-scale search. These methods have 488 their own set of benefits and drawbacks. In this paper, the 489 WOA algorithm optimized by adaptive weight and simu-490 lated annealing algorithm is used to determine the optimal 491 parameters of SVM for network security situation assess-492 ment. The main steps of the assessment algorithm based on 493 the WOA-SVM optimized by adaptive weight and simulated 494 annealing algorithm (AWSA-WOA-SVM for short) are as 495 follows.

496
Step1. Collect existing network security data as sample 497 data to establish a sample set, then normalize and preprocess 498 the sample data in the sample set and divide it into training 499 set and test set. Step2. Initialize the population size of the WOA algorithm, 501 upper and lower limit of whale position and its initial position, 502 and initialize annealing temperature, cooling rate and sudden 503 jump probability of the SA algorithm. Initialize the iteration 504 number t=1, maximum iteration number maxgen.

505
Step3. Calculate fitness value. Calculate the individual and 506 group extreme positions and the corresponding best fitness 507 value of the whale. This algorithm takes the mean square error 508 (MSE) between the assessment value obtained by the SVM 509 model and the true value as the fitness function, so the smaller 510 the fitness value, the higher the accuracy.

511
Step4. When t ≤ maxgen, update the values of parameters 512 a, r 1 , r 2 , A, C, b, l, p and k.

513
Step5. Iterative optimization. When p < 0.5, if |A| ≥ 1, 514 the individual position in the current whale population is 515 updated according to formula (12), and M rand is randomly 516 selected from the current whale population; If |A| < 1, the 517 spatial position of the individual in current whale population 518 is updated according to formula (10). When p ≥ 0.5, the 519 spatial position of the current whale individual is updated 520 according to formula (11). Finally, the optimal position and 521 global optimal position of the whale and their fitness values 522 are updated.

523
Step6. Introduction of SA algorithm. Select a searcher in 524 the neighborhood of the global optimal fitness value and 525 calculate the difference value df according to formula (14). 526 If df < 0, the new whale position replaces the original posi-527 tion; if df ≥ 0, use the probability exp(−df /T t ) to determine 528 whether to accept the position of the inferior solution, and 529 then update the whale optimal position gbest and the global 530 optimal position zbest and save.

533
Step8. Determine whether the termination condition of 534 the loop is met, that is, whether the maximum number 535 of iterations is met or the error requirement is met. If it can be met, get the optimal individual zbest and assign it where, x, y ∈ R n , x min is the minimum data in the sample set,

616
The difference between the true value and situation assess-617 ment value are measured using three performance indicators: 618 mean square error (MSE), mean absolute percentage error 619 (MAPE) and mean absolute error (MAE).

MAPE indicators:
where, x t and x * t represent the true value and situation assess-627 ment value respectively, and n represents the number of test 628 data sets.      58th, and 65th times. In the end, the optimal fitness value of 673 the algorithm converges to 0.990142. Since this paper uses the 674 mean square error MSE of the security situation assessment 675 value and the true value as the fitness function, the smaller the 676 individual fitness value, the higher the accuracy. Although the 677 WOA-SVM algorithm converges to the local optimal fitness 678 value at the 9th iteration, the AW-WOA-SVM algorithm and 679 the SA-WOA-SVM algorithm converge to the local optimal 680 fitness value at the 3rd iteration, it is obvious that their fitness 681 value is not the smallest. To sum up, the AWSA-WOA-SVM 682 algorithm can achieve smaller individual fitness values faster 683 than other algorithms, which shows that the algorithm has 684 higher accuracy and superiority. 685

686
The complexity of an algorithm is an important indicator 687 for evaluating its performance, which includes time com-688 plexity and space complexity. The time complexity of an 689 algorithm represents the total time required to complete it. 690 VOLUME 10, 2022  uation. The comparative experimental results show that the 727 assessment result of the AWSA-WOA-SVM algorithm can 728 more accurately reflect the current network security situation, 729 and has better stability and convergence. In the future, other 730 intelligent assessment algorithms will be studied to determine 731 more accurate and efficient network security situation assess-732 ment methods.