Detection Strategies for Microservice Security Tactics

Microservice architectures are widely used today to implement distributed systems. Securing microservice architectures is challenging because of their polyglot nature, continuous evolution, and various security concerns relevant to such architectures. This article proposes a novel, model-based approach providing detection strategies to address the automated detection of security tactics (or patterns and best practices) in a given microservice architecture decomposition model. Our novel detection strategies are metrics-based rules that decide conformance to a security recommendation based on a statistical predictor. The proposed approach models this recommendation using Architectural Design Decisions (ADDs). We apply our approach for four different security-related ADDs on access management, traffic control, and avoiding plaintext sensitive data in the context of microservice systems. We then apply our approach to a model data set of 10 open-source microservice systems and 20 variants of those systems. Our results are detection strategies showing a very low bias, a very high correlation, and a low prediction error in our model data set.

Digital Object Identifier 10.1109/TDSC.2023.3276487 for continuous evolution and frequent release of these systems.
In this context, manually validating whether numerous required security features are used as intended throughout the system is a time-consuming and error-prone task.Architectural abstraction can help focus only on the relevant aspects of architecturally significant security features.However, substantial effort is still required, e.g., to check a large-scale system's architecture for conformance to security recommendations.This article presents an approach for checking the conformance to recommendations on security-related Architectural Design Decisions (ADDs) via detection strategies.The conformance relation is generally defined as the consistency between models [8].This concerns the relation between a software system's architecture and its intended architecture [9].
A Detection Strategy is defined as "the quantifiable expression of a rule by which design fragments that conform to that rule can be detected."[10].To enable the formulation of concrete detection strategies for conformance relations, we defined four exemplary ADDs with security tactics as decision options.Further, we specified metrics representing the different options of the ADDs.We define these for several security aspects not yet modeled by ADDs or metrics in the literature, namely access management, traffic control, and avoidance of sensitive plaintext data.Then we define our detection strategies as rules on top of the metrics.In contrast to prior work on detection strategies [10], we do not base our approach on simple data filters only.Instead, we use a statistical analysis based on ordinal logistic regression to derive prediction models, which we then use to construct our detection strategies.This article aims to study the following research questions: r RQ1.How can we automatically detect conformance to recommendations on ADDs and tactics on security in microservice architecture models?r RQ2.How well does this detection perform?This work is based on a dataset of 10 open-source microservice systems that we have manually modeled in our previous work [11], [12] and that are partially (i.e., 3 of the 10 systems) automatically extracted from the source code.We have added 20 variants in which possible violations of ADD options or refactorings for improvement are introduced based on the discussions in the relevant literature.In addition to the cases, our prior work provides (1) a method for automatically extracting decomposition models for polyglot microservice systems from the source code [12] and (2) an approach for metrics detection in such models [11].These building blocks of our approach are only briefly introduced in this article.Our novel contributions, which we will focus on, are (1) a new detection strategy approach, (2) a novel set of ADDs and security tactics used to validate our approach, and (3) a substantially extended formalization for the models and metrics.We have established a ground truth based on a manual assessment by five industrial experts.We compare the detection strategies statistically to the ground truth to evaluate our approach.Our results show that for each of the four ADDs, we found at least one detection strategy that uses a regression model with very low bias, has low or very low prediction error and a very high prediction correlation with ground truth data.Our approach requires manual assessment and modeling for creating the dataset and a regression model.Still, once a fitting regression model has been established, the approach can be applied automatically, e.g., as a component in an analysis tool (see Section VI) or as an automated step in a continuous delivery pipeline.
As an additional contribution, this article provides a validation of the study results provided in [11].That is, the conformance detection approach introduced in [11], which is used by our novel detection strategies, is replicated here based on an entirely different set of ADDs, a new formalization approach, an entirely new metrics set, a new recommendation/ground truth analysis study and new security extensions in our model data set.
This article is structured as follows: First, in Section II, we overview our approach and discuss the research methods used.Then, Section III presents the ADDs for microservice security tactics considered in this article and the ground truth derived from them.Section IV formally specifies the metrics and detection strategies.Next, Section V presents the analysis of the regression models and the derived detection strategies.Section VI describes an industrial resilience assessment tool in which we have applied our approach and the lessons learned.Section VII discusses our findings and potential threats to validity.Then we compare to related work in Section VIII, and in Section IX, we conclude.

II. OVERVIEW AND RESEARCH METHODS
Fig. 1 illustrates our approach in use.A user either models a microservice system as a software decomposition (or component & connector) model with security annotations (as specified in Section IV) or automatically extracts such a model (e.g., using the automatic code extraction approach from our prior work [12]).The automatic extraction can usually be run repeatedly, e.g., in the context of a continuous delivery pipeline, without the need for additional manual work.Once such as model is in place, our prototype can automatically detect all relevant metrics values.These provide the necessary data for running the detection strategies to detect conformance to ADD-based recommendations.The ordinal logistic regression models are part of the detection strategies and require the metrics as input.They are derived from a representative model data set like the one contributed in this paper and can be fine-tuned by, e.g., adding more or different models to the data set, if our model data is inappropriate for a given use of our approach (e.g., because the system under investigation's domain, size, or technology concepts are substantially different to the systems in our model data set).Our research started with a data collection and analysis in which we have studied existing microservice-specific recommendations by industry organizations such as NIST [5], OWASP [6], or the Cloud Security Alliance [7].Before they got involved in this article, a team of industrial security experts, including the last three authors of this study, independently analyzed these recommendations.In addition, within the As-sureMOSS EU project,1 the author team conducted a multi-vocal literature study (i.e., scientific and grey literature) to confirm the findings.
The authors derived a catalog of ADDs with security tactics as decision options from this data.We selected 4 of these ADDs that the industrial security experts and the industry recommendations judge as highly relevant for microservice systems, covering different aspects.By purpose, we selected an entirely different set of ADDs than used in our prior works [11] -this way, this article provides another validation of the approach in [11].
The authors then studied 10 open source microservices systems as case studies line-by-line and manually annotated each security feature in their source code (all are published on GitHub, see Table I).To enable us to study the continuous evolution of these systems with a focus on the security ADDs, we developed 20 variants of the systems in which possible violations of ADD options or refactorings for improvement are introduced, based on the discussions in the relevant literature.Apart from the specific variations described for the variants in Column "Main Security Features/Issues" Table I, all other system aspects remained stable.This is shown in an excerpt of Model PM0 in Figs. 2 and 3.  feature of our dataset to reflect current microservice practices is that the systems are highly polyglot, using many different programming languages and technologies.As is common in real-life systems, in the dataset, the required information is located in many different kinds of files, such as programs and scripts in various programming and scripting languages and configuration files of many different technologies (see Column "Programming Languages and Technologies Used" in Table I).
We used our existing CodeableModels tool2 , a Python implementation for precisely specifying meta-models, models, and model instances in code.Based on CodeableModels, we have automated code generators to generate graphical visualizations of all meta-models and models in PlantUML.We have also realized detectors to find all relevant aspects of the metrics in the models and the automatic calculation of metrics.
This modeling step was done manually in our work for many of the case study systems.It is also possible to automate this step: Our existing static code analysis approach for architecture reconstruction of polyglot microservice systems [12] can be applied here.Due to the polyglot nature of microservice systems, this requires a modest initial specification effort, though.Models for the systems RS0 and ES0 have been automatically reconstructed Fig. 2. Excerpt of the Model PM0 (7 out of 16 components and their connectors) showing service interactions, API Gateway, and OAuth2 server.Only the excerpt of the necessary stereotypes is shown for clarity.Fig. 3. Excerpt of the Model PM1 as an example to show changes in a variant.Changes highlighted in red: The variant introduces a security flaw as it only uses limited plaintext authorization but fixes some issues regarding encrypted communication by using HTTPS on several connections.The same traffic control and sensitive data issues are present in the excerpt as in the PM0 excerpt.using this approach in our prior work (see [12]).PM0 has been automatically reconstructed, too (not published in prior work).
We then performed a systematic assessment on support or violation of the collected security tactics.The three industrial security experts in the author team plus two additional industrial security experts (from the company SEARCH-LAB) independently derived a recommendation based on the results of our tactics study.The result provides informal guidance for security experts to judge systems such as those in our models manually.Next, the other authors applied this recommendation as an ordinal rating scheme to each model variant summarized in Table I to create a ground truth for our study.Then the five industrial security experts reviewed the rating scheme and the ratings in the ground truth.In case of inconsistency of the votes, we performed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
a discussion among the involved experts to resolve the conflict.We would have applied a majority vote if the debate would not yield consistent votes, but the experts reached a consensus after the discussion in all cases.
Independently of the work on the ground truth, on simple example cases, we developed our detection strategies.To this end, we first developed a set of metrics that automatically decide each decision point in our ADDs.These metrics are formally defined in Section IV.Next, our statistical analysis assessed how well the hypothesized metrics could predict the ground truth data by performing an ordinal regression analysis.Ordinal regression is widely used for modeling an ordinal response's dependence on independent predictors applicable in various domains.For the ordinal regression analysis, we used the lrm function from the rms package in R [13], [14].
The authors then used the ordinal regression models to construct two possible detection strategies for each recommendation on the ADDs provided by the industrial experts.The detection strategies use the regression model's means and fitted prediction methods as their basis [14].We compare and evaluate the resulting detection strategies' performances for our model data set using the Mean Square Error (MSE) and Spearman correlation.The code and model data set are provided as an open access artifact on Zenodo to enable reproducibility of our study 3 .

III. ADDS FOR MICROSERVICE SECURITY TACTICS AND GROUND TRUTH ASSESSMENT
In this section, we describe the four ADDs containing microservice security tactics that are studied in this article.Next, we describe the manual analysis of these ADDs to establish a ground truth for our study.

A. ADD: Access Management/Backend Authorization (BE_AU)
When considering access management in the context of microservice architecture decomposition models, we mainly found various authorization-related tactics.It is important to note that authorization is crucial for all parts of a microservice architecture, but especially for services reachable directly or indirectly from the clients.Thus, we treat the two decision scopes, backend authorization, and authorization from the clients/UIs, in two separate ADDs that offer the same decision options.
The following decision options (security tactics) can be chosen for Backend Authorization (BE_AU): r Token-based Authorization: Authorization is performed using a cryptographic access token issued by a central access management server, such as an OAuth 2.0 token.r Encrypted Authorization Information: Some other kind of encrypted authorization scheme is used, but not with a standardized central access management server.r No Authorization: No authorization method is used, but authorization is required.That is, the fact that authorization is not provided is a security flaw in the system.
r Authorization Not Required: The connector does not need any form of authorization, and the fact that it is missing is not a security flaw.This is, for instance, the case if access is allowed for each identified client or in public APIs with no access restrictions.These can be decided for each occurrence of the following decision context: Each connector between two components in the system (such as system services, databases, infrastructure components, discovery services, or access management servers), but not connections to clients and UIs (or between them) and/or external services.To be decided for each connector.

B. ADD: Authorization on Paths From Clients or UIs to System Services (CP_AU)
For CP_AU, the same decision options (security tactics) can be chosen as for Backend Authorization (BE_AU) but in a different decision context.While BE_AU is concerned with each backend connection in the microservice architecture (e.g., service to service or service to database), CP_AU is concerned with the paths between clients/UIs and system services (direct connections or propagated connections along the paths between them).That is, CP_AU can be decided for each occurrence of the following decision context: Each direct or transitive connector between a client or UI to a system service.In this context, transitive means that the connector can cross API Gateways and similar frontend components first, and then other system services, but no other kinds of components (such as infrastructure services, databases, and so on).To be decided for each connector.

C. ADD: Traffic Control (TC)
When considering Traffic Control in our scope of architecture decomposition models, mainly different kinds of Facades [15] that shield system services from direct access are discussed as solutions in the security recommendations [5], [7].Such a Facade acts as a reverse proxy and routes requests from clients to backend services.It also realizes cross-cutting concerns such as authentication, authorization, SSL termination, and monitoring to support security tasks [5].The most common pattern for this is the API Gateway [3], but there are also variants and homegrown solutions.
The following decision options (security tactics) can be chosen for Traffic Control (TC): r API Gateway [3] provides a single endpoint for the clients and internally maps the requests to backend microservices.
r Backends for Frontends [3] is a variation of API Gateway that defines a separate gateway for each kind of client, e.g., a Web app, a mobile app, and a public API gateway.
r Frontend Service: While the prior options usually use ded- icated technologies for establishing traffic control, some systems use a homegrown frontend service that acts as a Facade [15] for other services in the system.It can only offer the traffic control features built into that service.
r Direct Access from Clients to Services: Clients access sys- tem services directly, and thus no traffic control is provided.These can be decided for each occurrence of the following decision context: Each possible path from a client or a UI to a system service.To be decided for each such path.

D. ADD: Avoiding Plaintext Sensitive Data (SD)
Sensitive data in plaintext should not be used anywhere in a system system [5], [7].Instead, encrypted solutions and keys should be used.In architecture decomposition models, components can contain plaintext sensitive data, such as a service or database storing user passwords.The interactions (or connectors) between components can use plaintext sensitive data, e.g., to transfer unencrypted credentials over the wire.
The following decision options (security tactics) can be chosen for Avoiding Plaintext Sensitive Data (SD): r Avoiding Plaintext Sensitive Data in Components means storing no secrets in components or using encryption methods to secure them properly.This tactic requires a systematic investigation of the data that is classified as sensitive.
r Avoiding Plaintext Sensitive Data in Connectors means to not use plaintext secrets in the interactions realized by connectors, mainly distributed ones.Again, encryption, e.g., of the connection and maybe local storage of the secret, is needed to implement this tactic.This tactic requires a systematic investigation of the data that is classified as sensitive.
r Plaintext Sensitive Data in Components means a specific component stores sensitive data in plaintext form.This should usually be avoided.
r Plaintext Sensitive Data in Connectors means a specific connector uses or stores sensitive data in plaintext form.This should usually be avoided.These can be decided for each occurrence of the following decision context: Each component and connector in the model.To be decided for each such model element.

E. Recommendations and Ground Truth Assessment
To establish a ground truth for evaluating conformance to the ADDs described in the previous sections, the three industrial security experts on the author team first worked with other experts in their organizations to create recommendations based on the results of our tactics study (i.e., from security guidelines, gray literature, and scientific literature studies).The other authors then analyzed these recommendations, compared them to actual implementations in the case study systems, and selected the recommendations that were the focus of our ADDs.The results are the recommendations per ADD below, where more or less preferred ADD options (tactics) are mapped on a 5-point ordinal scale: ++: very well supported; +: well supported, but aspects of the solution could be improved; ∼: serious flaws in security design, but significant support is already found in the system; −: serious flaws in security design, but initial support can already be found in the system; −−: no support for the security tactic can be found in the system.The authors then discussed this evaluation scheme again with the three industrial security experts until a consensus was reached.The other authors then evaluated the 30 cases for conformance to each of the ADDs.The ratings were again reviewed by the three security assessment experts on the author team.In addition, two industrial security experts from another company reviewed our models, metrics, and code.Some parts of the recommendations below result in a unique score, especially the extreme cases (++, −−) often refer to unique quantities such as all connectors or no connectors.However, some other values contain fuzzy statements such as the vast majority of connectors, where human judgment is required depending on the system model to decide how large the set must be to be acceptable in that particular model.For example, system context, system size, and the system's domain can lead to individually different judgments for different models.
The resulting recommendation scheme for Backend Authorization (BE_AU) is:   If the Authorization Not Required option is selected, the connector should not be further analyzed with regard to access management aspects.
The recommendation scheme for Authorization on Paths from Clients/UIs to Services (CP_AU) is exactly the same scheme as in the Backend Authorization, but not for the context of distributed backend connectors, but for the scope of paths from clients or UIs to system services.
The recommendation scheme for Traffic Control (TC) is: r ++: All possible paths from a client/UI to a system service are passing through a dedicated gateway solution such as an API Gateway or Backends for Frontends.
r +: All possible paths from a client/UI to a system service are passing through a dedicated API Gateway or Backends for Frontends, or through some kind of Frontend Service.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II GROUND TRUTH ASSESSMENT FOR THE CASE STUDY SYSTEMS
r ∼: The large majority of the possible paths from a client/UI to a system service are passing through a dedicated API Gateway or Backends for Frontends, or through some kind of Frontend Service.r −−: Less than a large majority of components and con- nectors contain, use, and store no plaintext sensitive data.The "++" recommendation is not used in SD, as no wellsupported but not optimal option exists.
Based on the recommendations, the ground truth assessment in Table II was derived.That is, Table II lists the ground truth assessments for each decision and for each of the case study systems from Table I.

IV. DETECTION STRATEGIES, MODELS, AND METRICS SPECIFICATION
This section describes metrics for measuring conformance to the common microservice security tactics described as decision options in Section III.Our metrics are based on a microservicesbased architecture decomposition model.For a complete formal definition of this model, see [11], [16].We only present the necessary model elements and extensions used in this article.

A. Basic Architecture Decomposition Model
Formally, an architecture decomposition model M is a tuple (CP M , CN M , CP T M , CNT M , cn_source, cn_target, cp_type, cn_type) where: r cp_type : CP M → P (CP T M ) is a function that maps each component to its set of direct and transitive component types (for a formal definition of component types and type hierarchies see [11], [16]).
r cn_type : CN M → P (CN T M ) is a function that maps each connector to its set of direct and transitive connector types (for a formal definition of component types and type hierarchies see [11], [16]).Below, to simplify the metrics definition texts, when we simply say Component cp or Connector cn is of type t, we refer to the use of the function call cp_type(cp) or cn_type(cn), respectively.Figs. 2 and 3 show example models from the PM case modeled using the UML.

B. Component and Connector Types
We distinguish various component and connector types introduced in the text below where they are needed.The full type hierarchies are modeled in the CodeableModels distribution 4 .It uses microservice architecture component types such as Service, API_Gateway, Database, Monitoring, etc. Microservice decompositions have many different kinds of connector types between these components, such as RESTful HTTP, HTTPS, JDBC, etc., to denote the kind of interactions between the components.
Based on this, we define in this article some securityspecific extensions such as T okenBasedAuthorization, EncryptedAuthorizationInf ormation, Authorization-W ithP laintextInf ormation, and so on.They represent security-specific information extracted from the code and used for model annotation.Figs. 2 and 3 show example models from the PM case with the component and connector types being rendered as stereotypes.
Below we use these types to define type-selection functions.For instance, in the backend authorization metrics below the distributed_backend_connectors_requiring_authorization : P (CN M ) → P (CN M ) function is used as the basis for calculation and common divisor.It is defined as The function is essentially a cascade of type selections.distributed_backend_connectors first selects the connectors which are neither connecting to Client nor UI components to get all backend connectors.And then, the distributed backend connectors are the subset of these connectors, which are not of type InM emConnector (i.e., in-memory connectors).The function connectors_that_require_authenthorization : P (CN M ) → P (CN M ) selects the connectors that are not of the type AuthorizationN otRequired.This type indicates connections that explicitly should not get authorized, e.g., because they are offered in a public API.

C. Paths
A basic notion in a number of the metrics below is a path: The function all_paths_from_clients_or_uis_to_system _services : P (CP M ) → P (P M ) selects all paths from clients or UIs to system services.The function first selects all components of type Client or UI as clients and all components of type Service as system services.A service is a system service if it is not of the type ExternalComponent (i.e., no external services are system services), M iddlewareService (i.e., no middleware infrastructure services such as a Discovery Service), or F acade (i.e., no frontend services or gateways with the sole purpose of shielding the system from clients).Then the function uses a simple Depth-First Search algorithm to calculate all paths from clients to services.From those paths, we select only the ones that are well-formed in the sense that first Clients or UIs are on the paths, then zero, one, or more F acades (e.g., AP IGateways or frontend services are of type F acade), and finally one or more system services (as defined above).Paths going across other components such as Databases or M iddlewareServices are excluded; paths going into the system, then out of the system, and back into the system are also excluded, too.
The function client_service_path_connectors_requir-ing_authorization : P (CP M ) → P (CN M ) is based on this function.It first selects the connectors from the result of all_paths_from_clients_or_uis_to_system_services using another function connectors_on_client_service_paths : P (P M ) → P (CN M ).This function returns the set of all connectors on a set of paths (without connectors having F acades or ExternalComponents as targets or that are of type InM emoryConnector).Then, it selects the connectors that require authentication using the function connectors_that_require_authorization.In a set of model elements and paths ⊃ ME_P M , let the function components : P (ME_P M ) → P (CP M ) select the components in the set, connectors : P (ME_P M ) → P (CN M ) the connectors in the set, and paths : P (ME_P M ) → P (P M ) the paths in the set.

E. Metrics Definitions
The formal metrics definitions are provided in Table III.All metrics have values ranging from 0 to 1, with 1 indicating full support and 0 indicating no support.
The first compartment in the table provides metrics on backend authorization.They are all based on the distributed_backend_connectors_requiring_authorization function explained above.The first metric AUB detects general support for authorization in the backend without considering the used type of authorization.The following metrics AUB_T, AUB_E, AUB_P, and AUB_C are calculating a similar ratio but only for specific types of authorization, or any authorization over an encrypted, secure connection.Finally, AUB_A combines the authorization methods that are considered to be secure enough, i.e., AUB_T, AUB_E, and AUB_C in one metric.
The second compartment in the table provides metrics on authorization on client/service paths.They are all based on the client_service_path_connectors_requiring_authorization function, which uses components as input and delivers a connector set.Apart from that, the AUC metrics are constructed in the same fashion as the AUB metrics.
In the third compartment of the table, we see metrics on traffic control.First, the superior method of using gateways is measured in the GWP metric, and then FEP measures the acceptable practice of realizing traffic control with frontend services.Finally, GFP measures the ratio of both practices in the paths, leading to different results than just looking at the two individual metrics GFP and FEP, e.g., as both practices could be applied on a path.GFP, however, can be biased as an implicit weight is introduced (the number of gateway versus frontend services).
Finally, in the fourth compartment, the plaintext sensitive data use is measured, first for the components in CMP, then for the connectors in CNP, and finally for both in CCP.It makes sense to have CMP and CNP in addition to CCP, as the number of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE III METRIC DEFINITIONS
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
components and connectors are introducing an implicit weight in the metric, which can potentially bias the results of CCP.

F. Detection Strategies
Based on our metrics, we define two alternative detection strategies that use the mean and fitted prediction methods for ordinal logistic regression, as provided, e.g., in R's rms package [14].First, we define the Means Predictor Detection Strategy.Let OR add denote an ordinal regression model for the ADD add (with add ∈ {BE_AU, CP _AU, T C, SD}, i.e., it is in the set of all ADDs), predict mean the prediction function using the means prediction method, and m ∈ M the input model for the prediction.The mean prediction computes the estimated mean Y by summing values of Y multiplied by the estimated probabilities P (Y = j) [14].Then, means_predictor_detection_strategy : OR add × M → {' + +', ' + ', ' ∼ ', ' − ', ' − −'} is a detection strategy function computing the levels ++, +, ∼, and -of our ground truth scheme from Section III-E The Fitted Predictor Detection Strategy function f itted_ predictor_detection_strategy : OR add × M → {' + +', ' + ', ' ∼ ', ' − ', ' − −'} is based on the fitted prediction method, which gets all the individual probabilities Y = j [14].We define level_of _max as a function that first selects the maximum probability in the vector returned by predict f itted .Then it gets the ordinal level ++, +, ∼, -, -of the vector value with the maximum probability

ANALYSIS OF REGRESSION MODELS AND DETECTION STRATEGIES
In this section, we first describe and then analyze the ordinal regression models for our ADDs and then compare the detection strategies applied using these models.

A. Ordinal Regression Models
The final element required in our approach is the ordinal regression models for each of our ADDs.As explained, these are computed using R's lrm function from the rms package.As described in Section III-E, the dependent outcome variables are the ground truth assessments for each ADD.The metrics defined in Table III are used as the independent predictor variables.
The actual values, automatically computed with our detectors from the models of our data set, are reported in Table IV.The objective of the regression analysis is to predict the likelihood of the dependent outcome variable per ADDs.
In Table V we show the best three ordinal regression models we have found for each of the four ADDs.The p-value assesses the statistical significance of each regression model; the smaller the p-value, the stronger the model is.A p-value smaller than 0.05 is generally considered statistically significant.The C-index (which is also called the concordance index and is equivalent to the area under the Receiver Operating Characteristic (ROC) curve) is frequently reported in the statistical literature to measure the predictive power of ordinal regression models [17].A C-index of 0.5 indicates random splitting, whereas a C-index of 1 indicates perfect prediction.
Harrel [13] suggests bootstrapping to obtain nearly unbiased estimates of a model's future performance based on re-sampling.A simple technique to adjust for optimism or overfitting is data splitting, but it is inefficient since the model is only fitted to a subset of the available data.Bootstrapping is thus the better and therefore a recommended method to adjust for optimism or overfitting [13].We used lrm's validate function to perform bootstrapping and calculated the bias-corrected C-index in addition to the original C-index.The C-indexes, reported in Table V, are all larger than 0.9.For each ADD, we have found at least two models with a bias-corrected C-index above or almost at 0.9, which indicates that the models are good enough for predicting the outcomes of individuals.
We used lrm's function pentrace to assist in the selection of penalty factors for fitting regression models using penalized maximum likelihood estimation (see [13]).In the reported models, we generally used a simple penalty of 1 and a non-linear penalty of 5.The penalized regression models offer slightly improved performance compared to non-penalized models.
The reported models in Table V do not always use the complete set of our metrics.As recommended, we applied data reduction [13] (i.e., eliminating variables from the models to find models that performed.We only report those three with the highest bias-corrected C-indexes among the numerous models that we tested.As a consequence, all suggested metrics are relevant in our prediction models, but in no decision all of them are needed.

B. Detection Strategy Analysis
To analyze and compare the use of our models in the two alternative detection strategies defined in Section IV-F, we calculate the Mean Square Error (MSE) and Spearman correlation.MSE is commonly used as an evaluation measure for ordinal regression predictions [18], [19].Spearman correlation can be additionally utilized for error-sensitive evaluation for ordinal target variables [19].
The results are reported in Table VI.First of all, it can be seen that all strategy/model combinations have very high positive correlation values for all ADDs (all > 0.9).The MSE values overall mostly show low errors in the prediction.The direct comparison of the values, together with the bootstrapped C-Index Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.r For SD, both strategies perform very well with Model 3.
As this has a bootstrapped C-Index of 0.9, selecting one of them is a good option.If a bootstrapped C-Index of > 0.9 is needed, the Means Predictor Detection Strategy with Model 2 performs best.

VI. AUTOMATED APPLICATION OF DETECTION STRATEGIES FOR RESILIENCE ASSESSMENT
To further evaluate our approach, we have applied it to an industrial resilience assessment tool currently being developed by EU-VRi.Resilience assessment refers to evaluating a system's ability to withstand and recover from adverse conditions or attacks.Our detection strategies approach can be utilized in the resilience assessment of a system as an automated component (once the architecture model extraction has been specified, see Section II).The tool aims to assess, monitor and optimize the resilience of a system based on existing industry guidelines such as the ones studied in our data collection and analysis step (described in Section II).

TABLE VI COMPARISON OF DETECTION STRATEGIES
The tool provides an API to integrate building blocks, such as the metrics provided by our approach.It provides an indicatorbased approach to calculate a resilience level index for a system as a composite, multi-level indicator.The tool supports before/after analysis, multi-assessment monitoring over time, and decision support based on sensitivity analysis.As we expect the tool to perform automated assessments, we have designed our metrics and models to be integrated within this tool as an automated component.
Our lessons learned are that the approach can be integrated into existing tools and processes for resilience assessment.By integrating the approach into a resilience assessment tool, developers, architects, or assessors can automatically calculate metrics for the software architectural parts of a composite resilience level index.This can assist in identifying areas where improvements can be made and in evaluating the overall resilience of a system.
The approach is scalable and can be applied to systems of varying sizes.As microservice-based systems are highly modular, each service can be analyzed separately for the design options of the ADDs.Thus, in our experience, the scale of the system under investigation plays only a minor role, and the analysis results also tend to be well-fitting for larger applications.However, the application's scale does play a role in understanding the automated assessment results.For large-scale systems, the metrics generated by the approach are better suited for understanding the assessment results, while for small-scale systems, inspecting the models can be enough.In general, the metrics generated by the approach are easy to understand and can help identify areas for improvement.The models are better suited for inspecting specific service issues or issues of a few interacting services.
The approach has a limitation in that it relies on ADDs to model security recommendations, which may not encompass all pertinent security concerns.However, in the resilience assessment tool, these are supplied by other components.Furthermore, our approach mandates manual model creation or calibration for constructing a regression model.Therefore, if additional systems with distinct practices are not well reflected in our opensource systems dataset, they must be manually included before our approach can be employed for such cases.Nevertheless, microservice systems' modularity permits adding a few midsized systems (e.g., open-source) to our data set to incorporate these practices.Thus, there is no need to manually analyze large-scale systems to apply our approach at scale, which is a positive aspect.

VII. DISCUSSION AND THREATS TO VALIDITY
In this section, we first discuss our lessons learned and then discuss potential threats to validity.

A. Discussion of Research Questions
In RQ1, we have aimed to investigate how to detect conformance to ADDs for microservice system security automatically.Our proposal is based on an analysis of experts' security recommendations, modeling them as ADDs and then letting the experts judge these recommendations using an ordinal scheme typically used in such human assessments.All further steps can then be automated based on a suitable model data set, like ours.Our open-source model data set is thus also a major contribution of this article, as such datasets are needed to calibrate statistical models like ours.
For this first step of our approach, this article has essentially provided validation and extension of the methods introduced in our prior work [11].Based on a different set of ADD and metrics and a substantially extended formalization, we again achieved excellent regression results, as reported in Section V-A.Thus, we are confident that these models are a good basis for our new approach and core contribution, the detection strategies, analyzed in Section V-B.
Overall, we have proposed 24 detection strategies (two kinds of strategies applied for each of the three regression models found for each of the four ADDs).The novel detection strategies are metrics-based rules.In contrast to earlier detection strategy proposals [10], our approach is based on a statistical model's predictions to avoid the problem of having to decide on metric value thresholds for the rules manually.The original detection strategy proposal [10] used simple data filters like HigherT han (value) or Between(value1, value2), which inherently leads to the issue of how to select appropriate threshold values for the rules.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In contrast, our novel approach uses two kinds of statistical predictors and tests each for multiple models to develop the best-performing models.
The comparison of the 24 detection strategies then provided the answer to RQ2.In particular, our results show that all 24 detection strategies show a very high Spearman correlation (> 0.9) to the ground truth of our model data set.Overall, all 24 reported strategy/model combinations perform reasonably well.In Table V, we only report the three best regression models found.Many models were tested that did not perform well or showed significant bias (e.g., due to optimism or overfitting).Essentially by data reduction (eliminating variables from the models), we found models that performed well in all considered measures.Using detailed analysis, we further were able in Section V to select the few best performing models: For all but the ADD TC, we found at least one strategy with an MSE value < 0.2, for most, even < 0.15.The lowest value for TC is also low ( 0.23).In the regression analysis, we used bootstrapping as the recommended technique to avoid bias due to optimism or overfitting [13], and for each ADD one detection strategy with a low MSE value, a very high Spearman correlation (> 0.9), and a very low bootstrapped C-index (< 0.9) was found.
Our approach uses the paths described in Section IV-C.This was used for analyzing specific propagations of security flaws in a system, e.g., if a service can be transitively reached from a client without proper authorization measures.The paths enable many other analysis options regarding the potential propagation of security flaws in a system.For example, in our prior works, we have developed a method for avoiding excessive data exposure in microservice APIs [20].Our paths would enable checking for this flaw in the whole microservice backend.Many other such analyses are possible for future work.
As shown by our analysis of 10 mid-size open-source systems created by practitioners and our experience in applying our approach in the context of the resilience assessment tool, summarized in Section VI, we can assess that the proposed approach can be used in practice to assess architectural security and resilience aspects of a system.As discussed in Section VI, our experience shows that applying our approach as an automated component for systems using similar techniques as those used in the open-source systems dataset for creating the regression model is possible.Otherwise, extending the dataset with additional systems is necessary to reflect the missing techniques (which can be done as an extension to our dataset provided in our replication package).If other ADDs are to be analyzed than the ones modeled in our approach, manual dataset creation, annotation, and initial analysis are needed.If the systems to be analyzed follow a modular distributed systems approach, such as with microservices, our experience is that it is possible to scale the approach to larger systems based on a dataset of mid-sized systems.This is positive as it reduces the upfront manual work required.For this reason, we have limited our study to microservice systems.
While we do not claim that our approach can detect something not identifiable with other techniques, the use of ADDs and detection strategies provides a unique way to assess architectural security concerns in microservice-based systems.The analysis of the introduced ADDs can require substantial effort even for mid-size systems in our dataset (e.g., inspecting each direct or transitive client-service path) and is required after each system modification.Our detection strategies can be run as part of a more extensive automated analysis (e.g., as discussed in Section VI) or as an automated check in a continuous delivery pipeline.This way, e.g., accidentally introduced security issues can be spotted during a resilience assessment or before a system gets deployed without repeated manual effort.

B. Threats to Validity
We mainly relied on third-party systems as the basis for our study to increase internal validity and thus avoid bias in system composition and structure.It is possible that our search procedures resulted in some unconscious exclusion of specific sources; we mitigated this by assembling a team of authors with many years of experience in the field (including three industry experts) and conducting a very general and broad search.Because our search was not exhaustive and practitioners created the systems we found for demonstration purposes, i.e., they were relatively modest in size.This means that some potential architectural elements were not included in our metamodel.Furthermore, this poses a potential threat to the external validity of generalization to other, more complex systems.However, we are confident that the documented systems are a representative cross-section of current practice in this area.Another potential risk is the fact that the author team developed the system variants.However, we did this following best practices documented in the literature and with reviews from industrial experts.We were careful to change only certain aspects in a variant and keep all other elements stable.
Another possible source of internal validity impairment is the modeling process.The author team has considerable experience with similar methods, and the systems' models have been repeatedly and independently cross-checked, but the possibility of some interpretive bias remains.Other researchers may have coded or modeled differently, resulting in different models.Because our goal was only to find a model that could describe all observed phenomena, and we achieved this, we do not consider this risk to be particularly problematic for our study.The individual metrics used to assess the presence of each pattern were deliberately kept as simple as possible to avoid false positives and allow for a technology-independent assessment.
However, it might be the case that the expert judgment for the ground truth would differ for substantially different kinds of systems, e.g., systems from other domains or substantially larger systems.Then it would be necessary to re-run our statistical analysis with data from a few such systems to calibrate the prediction models to the changed circumstances.Thanks to the complete automation of our approach, once suitable models are created (modeled or reconstructed), the analysis steps can be automated.
To avoid threats concerning the generalizability of our approach, we limited our scope to microservice-based systems, even though some aspects of our approach are likely applicable to other kinds of distributed systems than microservice-based systems.
We do not claim completeness of the detection strategies or the metrics we present in this article.They are only complete in the sense that they cover all options of the ADDs they address.

VIII. RELATED WORK
This section compares related works on tactics, best practices and patterns, detection strategies, and conformance checking in general, and then specialized metrics-based approaches for security and microservices.

A. Related Works on Tactics, Best Practices, and Patterns
The collection and systematization of microservice patterns have been the subject of much research.Richardson [3] collected microservice patterns related to key design and architectural practices.Zimmermann et al. [21] presented microservice API-related patterns.Skowronski [22] collected best practices for event-driven microservice architectures.Microservice fundamentals and best practices are also discussed by Fowler and Lewis [2] and summarized in a mapping study by Pahl and Jamshidi [23].Taibi and Lenarduzzi [24] examined microservice bad smells, i.e., practices that developers should avoid, which in our work would correspond to ADD violations.This paper uses such guidance as a basis for modeling microservice architectures.
Similarly, attempts have been made to define security patterns [25], [26].Microservice-specific recommendations from industry organizations [5], [6], [7] are proposed that represent broad-level summaries of existing industry best practices.We used such guidelines to guide our selection of security practices for study in our work.

B. Related Works on Detection Strategies and Conformance Checking
Detection strategies [10] are metrics-based rules for detecting design flaws.Whereas our work focuses on architectural design flaws for security and microservice best practices, the original detection strategies approach by Marinescu addresses generic object-oriented design smells.Whereas our approach leverages statistical predictors in constructing the metric-based rules, Marinescu uses simple data filters such as HigherT han(value) or Between(value1, value2).Thus, in such approaches, the issue of defining threshold values for parameterizing the detection strategies can introduce substantial bias.In our approach, this can be fully automated, and developers can measure the accuracy of the result with established measures such as the bootstrapped C-index, MSE, and Spearman correlation.The following two sections discuss specialized detection approaches based on security and microservice metrics.
Conformance assessment has been applied in various areas of software engineering, such as service composition [8] and traceability to guidelines [27].In general, the conformance relation is defined as the consistency between models [8].In software architecture, conformance assessment addresses the relation between a software system's architecture and its intended architecture [9].Our approach shares with those works the general notion of architectural conformance assessment.
Once metrics can be checked automatically, our approach can be classified as a metrics-based, microservice-specific approach for software architecture conformance checking.In general, approaches for architecture conformance checking are often based on automated extraction techniques [28], [29].Conformance to architecture patterns [28], [30] or other architectural rules [29] can usually be checked by such approaches.Techniques based on a broad set of microservice-related metrics to cover multiple microservice tenets and security do not yet exist.

C. Related Works on Security Metrics-Based Approaches
Security experts can use security metrics to understand the current security state and potentially improve it [31].While several organizations such as Microsoft [32] and OWASP [33] propose processes and checklists for building secure architectures, very few tools can automate these processes for tailored solutions [34] due to the dynamic and polyglot nature of these systems.
Ramos et al. [35] conducted a detailed review of the main existing model-based quantitative security metrics focusing on network security metrics.Model-based security metrics use techniques to describe a system using an abstract model that captures necessary attributes based on attacker assumptions and system behavior.Noel et al. [36] describe a set of metrics for measuring network-wide cybersecurity risk based on a vulnerability model to multi-stage attacks.Their system for calculating security metrics from vulnerability-based network attack graphs uses data imported from sources commonly used in enterprise networks, such as vulnerability scanners and firewall configuration files.Attack Graphs are a model widely used to quantify network security.It uses the causal relationships between vulnerabilities, quantifying the likelihood of potential multi-step attacks that combine multiple vulnerabilities [37].
Such general security metrics and indicators are a foundation for our work.However, since none of them considers the specifics of microservice architectures, they cannot be applied to our research problems or only with significant adaptations.For this reason, we decided to develop metrics based on existing recommendations (such as NIST [5], OWASP [6], or Cloud Security Alliance [7]) specifically for microservice-based systems.

D. Related Works on Microservice Metrics-Based Approaches
Several studies have used metrics to assess microservicebased software architectures.Pautasso and Wilde [38] propose a composite, facet-based metric for assessing loose coupling in service-oriented systems.Zdun et al. [16] study the independent deployment of microservices based on metrics for evaluating architecture conformance to microservice patterns.Bogner et al. [39] propose a maintainability quality model which combines eleven easily extracted code metrics into a broader quality assessment.Engel et al. [40] present a method using real-time system communication traces to compute metrics on conformance to recommended microservice design principles such as loose coupling and small service size.Each of these approaches is focused on narrow sets of architecture-relevant tenets (e.g., loose coupling).Still, no general approach for an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.assessment across different aspects, such as the security-related ADDs in our work, exists.
While a couple of works specifically focus on microservice security metrics, many of these focus on runtime-related or nonarchitectural aspects [41], [42], [43], [44].Chondamrongkul et al. [45] present an early approach to automatically investigate specific security vulnerabilities in a decomposition architecture, such as man-in-the-middle or denial-of-service attacks.In contrast, our work is based on ADD models, suggests detection strategies, and is based on empirical data, whereas the work of Chondamrongkul et al. uses only modeling examples.

IX. CONCLUSION
In this article, a novel approach for automated conformance checking of ADD-based security recommendations for polyglot microservice systems based on detection strategies was developed.The detection strategies and novel metrics for microservice security were formally defined and implemented in a model-based tool.They were applied to an open source model dataset for security in software decomposition models of 10 open source systems and 20 variants of these systems (modeling possible conformance violations and/or improvements to test the suitability of our method for continuous evolution) and in an industrial resilience assessment tool.The novel detection strategies are metrics-based rules.Unlike previous proposals for detection strategies [10], our approach is based on the predictions of statistical models to avoid the problem of having to set thresholds for metric values for the rules manually.We proposed a total of 24 detection strategies (two types of strategies for each of the three regression models found for each of the four ADDs).The strategies and their metrics were statistically evaluated and compared.We found at least one strategy per ADD with a very high correlation and a low MSE value that also had a very low estimated bias (e.g., due to overfitting) according to the bootstrapped C-index value.
In the future, we plan to develop detection strategies for metrics that we have developed in previous work and to develop novel traceability strategies that allow root cause analysis based on detection strategies.

Manuscript received 26
September 2022; revised 16 March 2023; accepted 10 May 2023.Date of publication 16 May 2023; date of current version 16 May 2024.This work was supported in part by the European Union's Horizon 2020 research and innovation programme under Grant 952647 (AssureMOSS project), in part by FWF (Austrian Science Fund) project under Grant API-ACE: I 4268; and in part by FWF (Austrian Science Fund) project under Grant IAC 2 : I 4731-N.Recommended for acceptance by C. Basile.(Corresponding author: Uwe Zdun.)

Fig. 3
highlights the changes compared to PM0 in red, described in the Figure caption.We assume that our evaluation systems are, or reflect, real-world practical examples of microservice architectures.Many of them are open-source systems realized by practitioners to demonstrate practices or technologies, and thus they are at most of medium size complexity.An essential

r
Plaintext Authorization Information: Authorization infor- mation is transferred as plaintext.r Plaintext-based Authorization Information over an En- crypted Protocol: Authorization information is transferred 3 https://doi.org/10.5281/zenodo.7929313as plaintext over a secure (i.e., encrypted) communication protocol such as TLS/SSL.

r
++: All distributed backend connectors are authorized with Token-based Authorization provided by a central access management server.
r +: All distributed backend connectors are authorized with a Token-based Authorization, or some other kind of Encrypted Authorization Information, or with Plaintext-based Authorization Information over an Encrypted Protocol, and not all are authorized with Token-based Authorization.
r ∼: Either the large majority of distributed backend con- nectors is authorized with Token-based Authorization, Encrypted Authorization Information, or Plaintext-based Authorization Information over an Encrypted Protocol; or all distributed backend connectors are authorized, but some or all of those are authorized using Plaintext Authorization Information.r −: At least some distributed backend connectors are au- thorized, but either Plaintext Authorization Information is used and not all connectors are authorized; or, if no Plaintext-based Authorization is used, less than the large majority of distributed backend connectors is authorized with Token-based Authorization, Encrypted Authorization Information, or Plaintext-based Authorization Information over an Encrypted Protocol.r −−: No distributed backend connectors are authorized.
r −: At least some possible paths from a client/UI to a system service are passing through a dedicated API Gateway or Backends for Frontends, or through some kind of Frontend Service.r −−: No API Gateways, Backends for Frontends, or Fron- tend Service are found on the possible paths from a client/UI to a system service.The recommendation scheme for Avoiding Plaintext Sensitive Data (SD) is: r +: No component and no connector contains, uses, or stores plaintext sensitive data.
r ∼: Almost all of the components and connectors contain, use, and store no plaintext sensitive data.
r −: The large majority of components and connectors con- tain, use, and store no plaintext sensitive data.

r
cn_source : CN M → CP M is a function returning the component that is the source of a link between two components.rcn_target : CN M → CP M is a function returning the component that is the target of a link between two components.
are functions that calculate Detector Results DR, such as detector : P (ME_P M ) → P (DR).ME M are model elements of a model M , with: ∀CN M , CP M ∈ M : ME M ⊃ CP M ∧ ME M ⊃ CN M .The detectors either work on such model elements or on paths: ME_P M = ME M ∪ P M .DR is a tuple (mp, res) with mp ∈ ME_P M and res ∈ {successf ul, undef ined, f ailed}.The function d_success : P (DR) → P (DR) selects only the successful detection results in a detector result set, d_fail : P (DR) → P (DR) the failed ones, and d_undef ined : P (DR) → P (DR) the undefined ones.The function d_elements : P (DR) → ME M returns the model elements contained in a detector result set.

r
For TC, only one MSE value < 0.25 can be observed in the Fitted Predictor Detection Strategy of Model 3.But this model has the worst bootstrapped C-Index of 0.76.This indicates slight overfitting in this model.Thus, combining the Means Predictor Detection Strategy with Model 1, offering an MSE value of 0.25, a very high bootstrapped C-Index, and a very high correlation seems to be the strategy with the best performance.

TABLE I OVERVIEW
OFMODELED SYSTEMS (SIZE, DETAILS, AND SOURCES)