Predicting CVSS Metric Via Description Interpretation

Cybercrime affects companies worldwide, costing millions of dollars annually. The constant increase of threats and vulnerabilities raises the need to handle vulnerabilities in a prioritized manner. This prioritization can be achieved through Common Vulnerability Scoring System (CVSS), typically used to assign a score to a vulnerability. However, there is a temporal mismatch between the vulnerability finding and score assignment, which motivates the development of approaches to aid in this aspect. We explore the use of Natural Language Processing (NLP) models in CVSS score prediction given vulnerability descriptions. We start by creating a vulnerability dataset from the National Vulnerability Database (NVD). Then, we combine text pre-processing and vocabulary addition to improve the model accuracy and interpret its prediction reasoning by assessing word importance, via Shapley values. Experiments show that the combination of Lemmatization and 5,000-word addition is optimal for DistilBERT, the outperforming model in our experiments of the NLP methods, achieving state-of-the-art results. Furthermore, specific events (such as an attack on a known software) tend to influence model prediction, which may hinder CVSS prediction. Combining Lemmatization with vocabulary addition mitigates this effect, contributing to increased accuracy. Finally, binary classes benefit the most from pre-processing techniques, particularly when one class is much more prominent than the other. Our work demonstrates that DistilBERT is a state-of-the-art model for CVSS prediction, demonstrating the applicability of deep learning approaches to aid in vulnerability handling. The code and data are available at https://github.com/Joana-Cabral/.


I. INTRODUCTION
C YBER threats force companies to increase their investments in security, which resulted in a $170 billion security aspects related market in 2015 [1]. These threats impact 556 million people annually, costing $3 trillion worldwide, with an expected increase to $10.5 trillion by 2025 [2]. Additionally, there was an increase of vulnerability entries in VulDB, with 61 new daily entries, in 2021, relative to the 41 reported in 2016 [3]. This tendency provides a clear picture of the increased risk of threats and cybercrime, raising concern among Information Technology (IT) administrators, which often lack the resources to handle all incoming threats [4]. Given this context, there is an inherent need to define which vulnerabilities should be tackled first.
To aid in the prioritization of vulnerability handling, experts typically use the Common Vulnerability Scoring Sys-tem (CVSS) [5], a de facto standard, to accurately assign a score to a vulnerability. New vulnerability entries are enumerated via Common Vulnerability Enumeration (CVE) [6], with a unique identifier, description, and CVSS Base score metrics, the latter specified by the National Vulnerability Database (NVD).
The score metric assignment is performed manually from vulnerability description analysis, for which vendors do not always provide enough detail [7] for experts to accurately create these scores. Furthermore, some CVSS metrics are subjective [8], heavily relying on the previous experience at assigning CVSS metrics. The inherent problems of this process are exacerbated by the temporal mismatch of CVSS metric assignment and vulnerability finding: 19 days to populate a vulnerability with the respective CVSS and six days to find a new one [9]. Therefore, to reduce the time/cost spent while also mitigating the subjective aspect of score assignment, we explore the use of a deep learning approach to predict the CVSS metrics based on the vulnerability description.
We start by obtaining the vulnerabilities descriptions and respective CVSS metrics using the NVD Application Programming Interface (API). The collected data is processed for the most recent version of CVSS (version 3) and serves as input for the deep learning approach. We select the Distil-BERT for sequence classification given its outperformance, in the created dataset, over other state-of-the-art Natural Language Processing (NLP) models. Since the vulnerability descriptions contain technical expressions and have reduced length size, we assess the effect of text pre-processing techniques and vocabulary addition. Our results show that text pre-processing improves the baseline model accuracy, exhibiting incremental performance with vocabulary addition.
One drawback of using a deep learning approach is that the reasoning behind their outputs is not easily disclosed. To overcome this limitation, we use the Shapley value [10], a game-theoretic approach to explain machine learning outputs, to perceive the correlation between description words and the predicted CVSS metric. This process allows us to understand the importance of each word towards the CVSS metric prediction, assessing their importance variance with text pre-processing and vocabulary addition.
The main contributions of our work are summarized as follows: • We present a vulnerability dataset, derived from NVD data, with vulnerability descriptions and CVSS (version 3) metrics; • We demonstrate the applicability of deep learning approaches to predict CVSS metrics, in combination with text pre-processing and vocabulary addition, achieving state-of-the-art results; • We confer interpretability to model prediction by analyzing the importance of word descriptions, via Shapley value.
The remainder of this paper is organized as follows: Section II summarizes the most relevant CVSS-based works; Section III describes the methodologies used; Section IV describes the vulnerability dataset; and Section V discusses the results obtained. Finally, the main conclusions and future work are presented in Section VI.

A. CVSS APPLICABILITY
CVSS has been extensively analyzed and applied to multiple domains to prioritize or estimate security risks. Younis and Malaiya [11] compared the CVSS base metrics and the Microsoft rating system, declaring that both measures have a very high false-positive rate, with CVSS significantly affected by the software type. Joh [12] concluded that most vulnerabilities are compromised due to no authentication required systems, by analyzing the CVSS base scores for vulnerabilities of currently supported Windows operating systems, suggesting the addition of an authentication process in every system. CVSS base metrics have been used to assess cybersecurity risks in IT systems [4], using the risk formula, and calculating risk probability and impact. The same study reported that an identification of security properties in the early stages of development positively impacts the security of the systems. In the same context, Wirtz and Heisel [13] proposed a semi-automatic method to estimate security risks in the early stages of software development, using CVSS formulas to assess the threat severity. Since CVSS has already demonstrated its validity in typical IT systems, it was also adapted to calculate vulnerabilities regarding hybrid IT and IoT systems [14], [15] accurately. Following this idea, Mishra and Singh [16] proposed a taxonomy for Cloudspecific vulnerabilities, using the CVSS score to represent each major Cloud vulnerability severity. Finally, a guide for applying CVSS to medical devices was also proposed [17], consisting of questions that identify a value for a specific CVSS metric.

B. CVSS AND ARTIFICIAL INTELLIGENCE
The combination of Artificial Intelligence techniques and CVSS scores of individual vulnerabilities has also been reported. Sheehan et al. [18] proposed using Bayesian Networks to identify connected and autonomous vehicle cyber risks, using CVSS scores to predict knowledge gaps or potential new cyber vulnerabilities. Furthermore, Frigault et al. [19] employed Bayesian Networks and Attack Graphs to measure network security, using the CVSS scores as probabilities and considering metric values of each vulnerability to be independent. However, applying Bayesian Networks to assess CVSS scores has limitations [20], leading to the proposal of an approach that considers the dependency relationships between the CVSS base metrics, combining scores into three aspects: probability, effort, and skill. Allouzi and Khan [21] proposed using the Markov Chain to compute the probability distribution of Internet of Medical Things security threats, using CVSS scores to assign severity to the acknowledged vulnerabilities. One first attempt to predict CVSS final scores was made through the employment of fuzzy systems [22], outperforming Support Vector Machine (SVM) and Random-Forest. In this context, fuzzy CVSS [23] was used to calculate the final severity score for vulnerabilities, employing fuzzy theory to reduce the error rate. To predict CVSS values for base metrics, Elbaz et al. [24] propose a linear regression model, using a bag of words approach, with the removal of irrelevant words.

C. CVSS AND DEEP LEARNING
Deep learning is also known for its effectiveness in solving complex problems, with the drawback of time-costly training. Therefore, to resemble security experts decisionmaking [25], the usage of Neural Networks was proposed, automatically providing a vulnerability report through CVSS metrics. Deep reinforcement learning was also used to assess the cyber-physical security of electric power systems [26], ...

[CLS] [SEP]
Replace tokens by their ids

FIGURE 1.
Overview of the methodology used to assess DistilBERT performance in vulnerability detection, using CVSS data descriptions and categories. We evaluate the model performance by varying two key aspects: 1) text pre-processing approaches; and 2) vocabulary addition. Furthermore, we evaluate the correlation of tokens and category, via Shapley value, to assess the tokens more influential towards each category prediction.
which adapted CVSS to estimate the complexity of attack path. As a result, CVSS base metrics have been adopted as the guide for identifying and prioritizing threats among multiple systems. This indicates that correctly and swiftly predicting the metrics for CVSS is a valuable effort. Sahin and Tosun [27] concluded that Long Short Term Memory (LSTM) was the most accurate model to predict CVSS final scores, when compared with Convolutional Neural Networks (CNN) and XGBoost. The two previously presented approaches gathered data from Open Source Vulnerability Database (OSVDB) and NVD, respectively, to train their models. Alternatively, Twitter discussions [28], with NVD as ground truth for CVSS scores, were fed to a Graph Convolutional Network with Attention-based input Embedding to predict the CVE severity scores. However, predicting CVSS final scores does not provide any insight to the experts about the values for the CVSS metrics.

D. VULNERABILITY INTERPRETABILITY
The analysis and interpretation of vulnerability descriptions is also reported in the literature. An empirical study based on the NVD vulnerability descriptions [29] concluded that information about the asset, attack, and vulnerability type is relevant to increase vulnerability scoring accuracy. Another work used the Local Interpretable Model-Agnostic (LIME) framework to explain the vulnerability descriptions [30], providing relevant words for a small number of vulnerabilities.
To the best of our knowledge, the work presented herein is the first to combine Deep Learning and NLP approaches to extract information from vulnerability descriptions and output CVSS metrics, while using interpretability to assess model predictions.

III. METHODOLOGY
The methodology used in our experiments is displayed in Fig. 1. We start by creating a CVSS dataset, using information from the NVD. Then, we vary two major performancerelated aspects: 1) text pre-processing; and 2) vocabulary addition. Finally, we evaluate model accuracy and assess token correlation with category prediction, using Shapley value.

A. MODEL DETAILS AND EVALUATION METRICS
We used the following models in our experiments: BERT [31], DistilBERT [32], RoBERTa [33], ALBERT [34], and DeBERTa [35]. Our reasoning for model choice is linked to the importance of BERT for the NLP area. It is one of the most used models in NLP, in a variety of tasks, with proven quality. Then, we opted to choose other variations of BERT to assess what is the better model for CVSS metric prediction. Specifically, we choose ALBERT and DistilBERT for having fewer parameters than BERT and RoBERTa and DeBERTa for having more parameters than BERT. The chosen models belong to the BERT family while having specific characteristics, such as the number of parameters. As such, our work focused on finding the best performing state-of-the-art NLP models for CVSS metrics prediction. We finetune each model following the authors' methodology: regarding the learning rate, RoBERTa was set to 1.5 × 10 −5 , DistilBERT was set to 5 × 10 −5 , while BERT, ALBERT, and DeBERTa have all been set to 3 × 10 −5 ; for the number of training epochs, RoBERTa was trained for 2 epochs, DeBERTa for 10, and BERT, ALBERT, and Distil-BERT for 3; regarding batch size, we used 8 for ALBERT and DistilBERT, and 4 for BERT, RoBERTa, and DeBERTa; finally, RoBERTa has a weight decay of 0.01, while the remaining models have the default value (0). We use the default losses and architectures of each model, from Hugging Face [36]. To obtain category classification, we use a PyTorch Softmax layer [37] on the model output.
To compare the performance of each model, we use the accuracy, F1 score, and balanced accuracy from the scikitlearn library [38]. To compare our results with state-of-theart for CVSS metric inference, we use the accuracy metric.

B. TEXT PROCESSING AND VOCABULARY SELECTION
To assess the contribution of each word to the classification of the considered categories (discussed in section IV), we start by processing vulnerability descriptions. We use two pre-processing methods, namely, Lemmatization and Stemming. Finally, we tokenize the text to input to the model, evaluating its accuracy based on the pre-processing approach. Both text pre-processing approaches use Natural Language Toolkit (NLTK) methods [39], while tokenization is achieved using Transformers library, from Hugging Face [36]. We choose Lemmatization and Stemming, given their wide use as text pre-processing approaches in the NLP area. By using Lemmatization and Stemming, we intend to process text to maintain as much relevant data as possible while ignoring noisy data. This is achieved by ignoring variants of words that have the same "base". In the case of Stemming is the same stem, while in Lemmatization is the same lemma.
In our experiments, we also evaluate the effect of vocabulary addition. Moreover, we also assess this effect in conjunction with the best performing text pre-processing approach. We evaluate the accuracy of the used model when adding 5,000, 10,000, and 25,000 words to the default vocabulary of the tokenizer. To select the added words, we order them by frequency of appearance in the descriptions, choosing the top n words. To avoid redundancy, we only consider words that appear exclusively in the description and not in the default vocabulary.
Given the existence of software versions and code snippets in some data descriptions, we use regular expressions to filter digits and special characters. This approach reduces the "noise" of vocabulary addition, since this filtered data is not relevant to category classification and could potentially dissipate the importance of relevant added words.

C. SHAPLEY VALUE
Deep learning models have shown high performance in multiple tasks while providing little to no explanation for the reasoning for model prediction. To tackle this issue, we use Shapley value, an interpretability technique that allows us to interpret the reasoning of the model when providing predictions. The Shapley value, coined by Shapley in 1953 [40], is a cooperative game theory-based method used for assigning payouts to players, depending on their contribution towards the total payout. In the machine-learning context, the Shapley value is used to evaluate how each feature (player) of a given instance contributed (assigning payout) towards the model prediction of the instance (total payout).
The use of Shapley value in our experiments is linked to our interest in analyzing how each word contributed to category classification. For categories with more than n classes, and n higher than 2, we perform n Shapley value analysis, each considering a class versus the remaining classes of the category. The considered class is given the value 1, with the remaining receiving the value 0. If a word contributes positively, it means that it influences the considered class. The higher the absolute Shapley value is, the higher the feature influence. We use the SHapley Additive exPlanations (SHAP) framework [41] and the Explainer model, from a publicly available implementation in [42].

IV. VULNERABILITY DATASET
The vulnerability dataset is based on NVD information, a United States government repository of standards-based vulnerability management data. We obtain the information through their API, starting from index 0 to 152,000, representing data collected until April 2021. Finally, we process the collected data to retrieve vulnerability descriptions and the classes for each of the eight categories analyzed: Attack Vector, Attack Complexity, Privileges Required, User Interaction, Scope, Confidentiality, Integrity, and Availability. A visual representation of class proportions, for each category, of our dataset is displayed in Fig. 2.
Though the collected data corresponds to 152,000 vulnerability descriptions and categories, we only consider descriptions related to version 3 of CVSS in this work. For this reason, the total number of instances in our dataset is

A. MODEL COMPARISON
We start by comparing the performance of five different NLP methods in the proposed dataset. The accuracy, F1 score, and balanced accuracy for each of the eight categories are presented in Table 2. The results suggest that DistilBERT is the outperforming model for all the categories, in all the considered metrics. The method with the worst performance is ALBERT, which has the least number of parameters (11M), while DeBERTa, BERT, and RoBERTa, with over 100M parameters, also have worse performance than DistilBERT (65M). Since we intend to assess the class inference, given a vulnerability description, the number of parameters may be linked to the performance variance. In this case, too few parameters (ALBERT) are insufficient for the model to learn, and too many leads to poorer fine-tuning. The similarity of various accuracy values between BERT, ALBERT, and De-BERTa, for different categories, can be explained by dataset imbalance. In these cases, the values displayed represent a scenario where the models opted to achieve higher accuracy by outputting the same value in every instance. Thus, in cases of dataset imbalance, the use of accuracy can be deceptive, justifying the use of other metrics such as balanced accuracy.
In this experiment, we use the default pretraining weights (provided by HuggingFace [36]) and training parameters of every model. The models used are typically applied/evaluated in tasks where the association of two sentences is analyzed (e.g., GLUE [43]) or the aim is finding answers in a text, given a question (e.g., SQuAD [44]). These types of tasks differ from predicting a category given a vulnerability description (the aim of this work), which may justify the underperformance of state-of-the-art methods in our experiments. Based on the obtained results, we selected DistilBERT for continuing the experiments involving the usage of Deep Learning.

B. TEXT PRE-PROCESSING
We assess the performance of DistilBERT, for all eight considered categories, regarding different text pre-processing approaches. We present our results, using balanced accuracy, in Table 3, with Baseline referring to the condition where no pre-processing approach is used.
When comparing category-related performance variance, we observe that all categories benefit from pre-processing. Regarding processing-related performance variance, Lemmatization promotes better results than Stemming, for all categories. Stemming truncates words by chopping off letters from the end until the stem is reached. This is a more crude approach than Lemmatization, which justifies the underperformance using this approach. Given the superiority displayed by Lemmatization over Stemming, this is the chosen pre-processing approach to use in the remaining experiments.

C. VOCABULARY ADDITION
We also evaluate the effect of vocabulary addition on prediction accuracy. Furthermore, we compare the vocabulary addition with its combination with a pre-processing approach. We display our results in Table 4.
Relative to the baseline, most variations of vocabulary addition translate into performance increase, for all categories. Regarding the vocabulary variations, 5,000-word addition was the condition with better results overall. This suggests that adding more words is beneficial to model accuracy improvement. However, subsequent vocabulary addition (10,000 and 25,000-word addition) does not promote incremental performance increase. Given that vocabulary addition is linked to word frequency in the description, adding more words may disperse the model attention towards less relevant words, hindering its performance. This aspect is more noticeable when 25,000-word addition has worse performance VOLUME 4, 2016 Regarding the combination of vocabulary addition with Lemmatization, we observe that this approach generally improves the balanced accuracy, relative to vocabulary addition alone, for most vocabulary variations. This suggests that word importance may vary with processing approaches, which corroborates the importance of text pre-processing, even in the context of vocabulary addition. The results suggest that 5,000-word addition with Lemmatization is the best approach for overall category prediction, exhibiting the importance of text processing and pertinent word addition in description-based classification.

D. STATE-OF-THE-ART COMPARISON
We compare DistilBERT, and its combination with preprocessing and vocabulary addition, with the state-of-theart. To the best of our knowledge, only Ebalz et al. [24] evaluates class prediction accuracy in version 3 of CVSS. To compare our results with them, we also display the accuracy of Baseline and 5,000-word addition with Lemmatization, whose balanced accuracy is presented in Table 4. Since the authors presented their results in a bar plot, not displaying the analytical values, we register the rounded values observed in said plot. We display the state-of-the-art comparison in Table 5. Ebalz et al. use a bag of words approach, with the removal of irrelevant words, to input a regression model. Using DistilBERT, a deep learning approach, in conjunction with text pre-processing and vocabulary addition, we obtain substantial accuracy improvements in the majority of categories. The categories where Ebalz's approach was closer to ours were Attack Complexity, User Interaction, and Scope, which could be linked to these categories being two-classed. In these cases, the regression model used by Ebalz et al. can compete with deep learning approaches. However, for the remaining categories, with over two classes, the performance disparity is substantially larger, with up to a 28% accuracy increase. Furthermore, using the text pre-processing approach and adding vocabulary promotes an accuracy increase of DistilBERT, further enhancing its performance. The results suggest that DistilBERT is a state-of-the-art approach for vulnerability category prediction, particularly for multi-class categories.

E. INTERPRETING CATEGORY CLASSIFICATION
We assess word importance in two distinct scenarios: 1) comparing the most relevant words, using different processing techniques, for a given category; and 2) assessing the variance of word importance towards/against binary and multiclass category prediction, given different processing tech-TABLE 4. Category balanced accuracy of DistilBERT for baseline conditions (Tokenization), and with different vocabulary addition, assessing the effect of Lemmatization. Base, in each vocabulary column, refers to the vocabulary addition with Tokenization, without text pre-processing. The expression w/ Lemm refers to Lemmatization combination with vocabulary addition. The outperforming approach for each category is shown in bold.

Category
Balanced Accuracy (%)  niques. In the first scenario, we compare word importance variance with text pre-processing and vocabulary addition in DistilBERT. Given the overall superiority of Lemmatization and 5,000-word addition (Table 4), these are the chosen approaches. We consider the four stages for comparison: 1) Baseline; 2) Lemmatization; 3) 5,000-word addition; and 4) 5,000-word addition with Lemmatization. For the remaining experiments, we will refer to each word of a description as a token to accurately represent the word translated into the tokenizer vocabulary. We evaluate token importance for the category Attack Vector, regarding the Network class. In this case, Network has a value of 1, and the remaining three classes have the value 0. Tokens with positive Shapley value influence Network classification, while negative ones are more relevant to the other three classes.
The results show a variance in token importance with text pre-processing and vocabulary addition. Starting in the Baseline, with no processing or vocabulary addition, protocols, Matter, and remote are tokens that, when in a description, influence the classification of the category towards Network. There is some logic behind said importance, given that remote and protocols are linked to network-related activities. The influence of Matter is linked to Mattermost, an opensource chat service, which was the target of multiple attacks. This shows that token importance might be influenced by specific network-related events. When we analyze tokens more associated with other classes (negative Shapley value), we observe that these are closely related to class definition (Local, Physical, and Adjacent) or associated with it (infrastructure). Adding Lemmatization, we observe the same tendency for tokens influential towards other classes but with increased importance. Furthermore, tokens linked to Network classification lose importance, aside from the specific network-related event of baseline (Matter). This suggests that token descriptions are more interpretably linked in not classifying Network than towards it, which could be due to class imbalance. Network is over 70% of Attack Vector classes, making it harder to distinguish tokens clearly associated with it, thus justifying the Lemmatization results. The addition of vocabulary (5k Vocabulary) heavily influences category classification, with new tokens being associated with Network classification: libxaac, Mattermost, and man-in-the-middle. Libxaac is an Android library with reported out-of-bound reading/writing errors, while manin-the-middle is a type of network attack. The importance of these tokens is linked to specific network-related events (attacks, errors), which was also observed in the baseline. Protocols also increases in importance towards Network classification, which could be linked to their association with the added vocabulary. This shows that vocabulary addition shifts the focus of token importance heavily towards specific events, for Network classification. Network-adjacent (added by vocabulary addition) also gains importance in classifying other classes, given its relevance to dissociate Network from Adjacent. Complementing vocabulary addition with Lemmatization (5k Vocabulary & Lemmatization) diminishes the importance of tokens closely linked to Network (positive Shapley value), resurging the tendency observed with Lemmatization alone. The reduced importance of specific network-related events also greatly decreased token importance associated with it (protocols). Furthermore, the influence of added vocabulary was enhanced in logon (closely related to classes other than Network) and networkadjacent, while keeping high importance of tokens associated with other classes definition (physical and local). This result suggests that Lemmatization is necessary to obtain more coherent/explainable token importance, which ultimately translates into better model performance (as shown in Table 4).  The second considered scenario relates token importance when considering binary (Attack Complexity, User Interaction, and Scope) and multi-class categories, for the same processing approaches of the first scenario. For all categories, the highest proportion class per category was associated with the value 1, with the remaining being associated with 0. Fig. 4 displays the boxplots for the two cases considered, showing the data distribution (ignoring wildcard cases).
The analysis of binary boxplots indicates that using Lemmatization and vocabulary addition promotes a decrease in token importance variance in both towards (positive Shapley value) and against (negative Shapley value) the highest class. However, combining vocabulary addition with Lemmatization increases token importance variance, particularly for negative values. This translates into increased importance of tokens to categorize the least represented class. If almost all descriptions relate to a specific class, it may be more beneficial/discriminative to focus on tokens linked to the underrepresented class, which is the approach of the model in this case.
Analyzing multi-class boxplots shows that the variance of negative Shapley value remains nearly constant throughout the various text pre-processing methods. Comparatively to the binary classes, negative Shapley value refers to various classes and not simply one, which justifies the (low) variance observed for these cases. Relative to positive Shapley value, using vocabulary addition and its combination with Lemmatization tends to reduce the variance of token importance, achieving a similar variance to negative Shapley value tokens. In multi-class prediction, even when one class is more prevalent than others, the existence of tokens closely linked to specific categories is not as likely as in binary class prediction. For this reason, reducing the overall importance towards specific token importance classification translates into better results.

VI. CONCLUSIONS
The increasing number of threats and vulnerabilities in IT systems surpass the capability of professionals to handle them, potentially leading to company prejudice. This raises the need to prioritize vulnerabilities, typically achieved through CVSS metrics, via manual vulnerability description analysis. In this paper, we present a vulnerability dataset, from NVD data, and analyze the applicability of deep learning approaches, namely NLP methods, to aid in CVSS metric prediction via description interpretation. In our experiments, we also assess the importance of text processing and vocabulary addition in metric prediction while interpreting it via Shapley value. Our results show that DistilBERT is a stateof-the-art model for CVSS metric prediction, with increased performance when combined with Lemmatization (text preprocessing) and 5,000 word-addition. Furthermore, this combination mitigates the effect of specific events in category prediction and leads to weighted word importance, particularly for binary categories, contributing to increased model accuracy. The presented dataset and model experiments serve as a comparable basis for future works in CVSS metric prediction, applicable for vulnerability handling/prioritization, which leads to increased usefulness and accuracy of the metric, benefiting system security and operational effectiveness.

ACKNOWLEDGMENT
This work was performed under the scope of Project SE-CURIoTESIGN with funding from FCT/COMPETE/FEDER with reference number POCI-01-0145-FEDER-030657. This work is funded by Portuguese FCT/MCTES through national funds and, when applicable, co-funded by EU funds under the project UIDB/50008/2020 and FCT doctoral grants SFRH/BD/133838/2017, 2020.09847.BD, and 2021.04905.BD. It is also supported by project CENTRO-01-0145-FEDER-000019 -C4 -Competence Center in Cloud Computing cofinanced by the European Regional Develop-ment Fund (ERDF) through the Programa Operacional Regional do Centro (Centro 2020), in the scope of the Sistema de Apoio à Investigação Científica e Tecnológica -Programas Integrados de IC&DT.