Applying and Researching DevOps: A Tertiary Study

DevOps is an emerging software development methodology, that differs from more tradition-al approaches due to the closer involvement of the customer and the adoption of “continuous-*” (e.g., inte-gration, deployment, delivery, etc.) practices. The vast research on DevOps (including numerous secondary studies) published in a short timeframe, and the diversity of the authors’ research backgrounds (e.g., from a Dev or an Ops perspective), has inevitably produced a long list of investigated topics, which use incon-sistent terminology. The goal of this study is to analyze literature reviews on DevOps with respect to: (a) the research topics in DevOps; (b) the terms that are mapped to each topic; and (c) the consistency of termi-nology. To achieve this goal, we have performed a tertiary study, i.e., a systematic mapping study that uses as primary studies “Systematic Literature Reviews” and “Mapping Studies”. For Data Extraction, Analysis, and Synthesis (DEAS) we propose a novel approach relying on thematic analysis, statistical analysis, and meta-analysis. The results unveiled 7 core topics on DevOps research, out of which DevOps features and DevOps practices are dominant ones. Additionally, as expected various terminology ambiguities have been identified, most between features as practices, as well as, between challenges faced before adopting DevOps and while applying DevOps. The main contribution of this study is the disambiguation of the map-ping of terms to topics. Along this process we highlight both inconsistencies—attempting to resolves ambi-guities, as well as topics and terms with high levels of consistency; aiding researchers and practitioners.


I. INTRODUCTION
In recent years, DevOps has emerged as one of the most "modern" and widely discussed terms in the field of software engineering [13] [20]. DevOps aims to bridge the gap between development and operations, in the sense that it promotes the collaboration between the development teams (i.e., designers, developers, testers, etc.) with businesses, creating additional teams that are responsible for developing, managing and supporting the performance of customer-side systems [3]. By definition the DevOps methodology builds a living bridge between development teams (DEVelopment) and system users (OPeration), enabling them to collaborate efficiently and seamlessly.
However, due to the emerging and dual nature of DevOps, several challenges appear both in industry and academia.
First, the literature contains a vast number of proposed approaches for DevOps (see list of primary studies), attempting to cover the complete range of the software lifecycle from requirements to development, deployment, and maintenance. Such approaches are introduced by researchers from different backgrounds, rendering the comprehensive knowledge and understanding of the literature unrealistic [3]. Thus, there is a need for a detailed research panorama of the DevOps research area, in the form of usually studied topics (need-1). Second, there are tentative misunderstandings between the "Dev" and the "Ops" teams, originating mostly from their business goals (e.g., understand requirements, develop software, sell software, maintain software, compile bug reports), the artifacts they use (e.g., source code, designs, working software, customer complaints), as well as the different background of their members (e.g., architects, testers, customers, customer support). As a consequence, different stakeholders might use the same terminology targeting at a different concept, and vice versa [7] [13]. For instance, the term "complexity" for a "Dev" team would probably refer to the complexity of the code; whereas for the "Ops" team it would refer to the complexity of using the software. For this reason, there is a need to propose a consolidated terminology for DevOps, which would be understandable from all stakeholders (need-2). To alleviate the aforementioned problems, we have performed a tertiary study on DevOps that aims at: (a) providing a unified research landscape for DevOps (related to need-1). The unified research landscape will include a list of topics and terms that have emerged from the literature. This list will act as a dictionary of terms that can be used in future research and in practice; and (b) exploring the consistency of terminology within the specific topics (related to need-2). To achieve this goal, we will contrast the use of terms within and between topics, so as to identify terms that are used in a different manner across studies. Identifying and consolidating confusing terms can lead to more accurate presentation of future research ideas, as well as reduce the misunderstandings in an industrial setting.
The rest of the paper is organized as follows: In Section II, we discuss related work that is relevant to DevOps area, whereas, in Section III, we present the adopted systematic mapping protocol. Next, in Section IV we present the results of our study, and in Section V we present the discussion and the threats to validity, and in Section VI we conclude the paper.

II. BACKGROUND INFORMATION & RELATED WORK
In this section we present background information and related work for this paper. In particular, in Section II.A, we present background information on DevOps, so that the nonexpert reader can be acquainted to the basics of DevOps. In the software engineering literature, there are no tertiary studies on DevOps; thus, in Section II.B we expand our related work discussion, on tertiary studies for software processes. Nevertheless, we need to note that the findings of papers presented in Section II.B are not directly comparable to ours, since their scope and context are different.

A. BACKGROUND INFORMATION
The term DevOps was introduced in the 2007-2009 period and it represents the combination of Development (Dev) and Operations (Ops) [6]. More specifically, DevOps methodology [24] is the extension of agile processes (such as: Scrum, XP, etc.), having as a primary goal the promotion of good collaboration between development and operation teams, throughout the software development lifecycle, from design until the delivery of the product to the customer. The success of the DevOps methodology in an organization, is monitored by considering three key areas of evaluation: the culture of collaboration, processes and tools [18]. People are at the heart of the methodology, and in many cases the main obstacle to cultivating the culture of the DevOps methodology in an organization. Adopting the methodology can lead to a complete transformation of product development processes. In addition, monitoring the effectiveness of DevOps interconnected processes throughout the pipeline of product manufacturing and delivery is considered particularly relevant. The selection of tools can also play an important role in the correct and effective implementation of the DevOps methodology. Typical factors to be monitored are operating time and capacity among others. Furthermore, based on the literature, Culture, Automation, Measurement, Sharing, and Services have been identified as the main dimensions of DevOps [8]. DevOps encourages Continuous Integration (CI) and Continuous Delivery (CD). It aims at shortening the product delivery cycle, enabling enterprises to timely launch software products and services without compromising their quality.

B. RELATED WORK
Hoda et al. [12] performed a tertiary study in agile software development (ASD). More specifically, the study focused on identifying: (a) the number of SLRs that have been published; (b) the research areas and topics of interest; (c) the venues that are most active in publishing SLR in the domain of ASD; (d) the quality of the SLRs in ASD; and (e) the progress that have been achieved in ASD research. The search process covers the time frame from 1990 to December 2015 in five digital libraries (IEEE, ACM, Springer, ScienceDirect, and ISI Web of Science). After applying the selection criteria, 28 studies were identified. As a result, the study revealed ten different ASD research areas: adoption, methods, practices, human and social aspects, CMMI, usability, global software engineering, organizational agility, embedded systems, and software product line engineering.
Hanssen et al. [11] conducted a tertiary study on global software engineering. More specifically, the aim of this study is to identify the current trends and the role of agile topics in the global software engineering literature. The search process is applied in ISI Web of Science and Google Scholar and returned 21 studies. The results of the study suggest that agility is one of the topics that are attracting attention in the global software engineering area. Additionally, Curcio et al. [5] performed a tertiary study on the usability of agile software development. In particular, the goal of this study is to categorize secondary studies related to the co-existence of usability and agile software development, and discuss the quality of the selected studies. The search process is conducted between 2001 and February 2018 in four digital libraries (IEEE, ACM, Springer, and ScienceDirect). After applying the selection criteria, 14 studies were identified. The results identify six main categories for representing ways of integrating usability into agile development: processes, techniques, practices, recommendations, principles and different approaches. Additionally, regarding the challenges for the integration, the authors identified seven main categories: issues related to tests, time, work balance, modularization, feedback, prioritization, and documentation.
Nurdiani et al. [19] conducted a tertiary study of Agile and Lean practices. More specifically, the goal of this study is to identify the impact of Agile and Lean practices on project constraints. The search process is performed in five digital libraries (Compendex & Scopus, Inspec, IEEE, ACM, and ISI Web of Knowledge) between 1990 and 2014. 41 secondary studies were retrieved and analyzed. The results of this study highlighted 13 Agile and Lean practices as the most prominent ones, whereas Test-Driven Development (TDD) is studied in ten secondary studies, concluding that it has a positive impact on external quality. Finally, Khan et al. [14] performed a tertiary study on software process improvement. In particular, the aim of this study is to identify the software process improvement topics that have been discussed in the secondary studies; and the quality of these articles. The search strategy identified papers between 2004 and October 2015, and was conducted on five databases (IEEE, Scopus, Google Scholar, ScienceDirect and ISI Web of Science). At the end of the selection process, 24 secondary studies were retained. The results suggest that the secondary studies focus on five topics, i.e., factors, small and mediumsized enterprises (SMEs), process models, software quality, and testing. Factors and process models were the most common topics in software process improvement.

III. TERTIARY STUDY DESIGN
The first tertiary study on software engineering has been published by Kitchenham et al. [17] in 2010. In that work, Kitchenham et al. [17] adopted the guidelines for performing systematic literature review. Nevertheless, given the goals of this work, we preferred to perform our tertiary study by applying the Systematic Mapping process. In this section, we present the protocol of the systematic mapping study, based on the guidelines described by Petersen et al. [21]. A protocol constitutes a plan that describes the investigated research questions and how the mapping study has been conducted. More specifically, the protocol involves three activities, namely: (a) defining research objectives and questions-see Section III.A, (b) defining search and article selection process-see Section III.B, and (c) defining data extraction, analysis and synthesis strategy-see Section III.C.

A. RESEARCH OBJECTIVES AND QUESTIONS
The goal of this study, expressed in the Goal-Question-Metrics (GQM) format [2], is to analyze existing literature reviews on DevOps for the purpose of characterization and evaluation with respect to: (a) the research topics in DevOps area; (b) the terms that are mapped to each topic; and (c) the consistency among terms and within and between topics, from the point of view of researchers and practitioners. Based on this goal, we define the following research questions. RQ1: What are the most common research topics in the DevOps domain? RQ2: What terms can be mapped to each research topic? RQ3: Is the terminology used in the DevOps research consistent?
To answer RQ1, we have identified the topics that have been most frequently studied. The topics have been retrieved from the research questions of each secondary study (e.g., benefits for adopting DevOps, used practices, definition of DevOps). Next, for each topic, (to answer RQ2) we recorded all the answers (terms) for each research question, for each secondary study. Finally, to answer RQ3, we compared all the terms of each topic, to identify the consistency of the terminology within each topic. Inconsistency in terminology exists in two forms: (a) a term defining two meanings; or (b) a meaning represented by more than one term. By answering these research questions, the industrial and academic stakeholders could easily identify the most interesting and active topics in the area of DevOps, as well as a consolidated terminology that can contribute towards the avoidance of possible misunderstandings.

B. SEARCH AND ARTICLE SELECTION PROCESS
The search and article selection strategy were defined by considering the goal and research questions of the study. In Figure 1 we provide an overview of the process along with the number of studies retrieved at each phase.

FIGURE 1. Overview of Search and Article Selection Process
Search Process: More specifically, we have selected to perform an automated search of the complete content of two well-known indexing engines i.e., Google Scholar and Scopus and not in specific venues, so as to not exclude studies that are relevant to our work. First, we developed a search string (see box below) to identify studies relevant to DevOps and applied this search on the title and abstract of the papers. We note that for the first part of the search string we have not employed any synonyms, since the term is very distinct and researchers referring to DevOps do not use alternatives. This search has returned 101 candidate studies.
("devops") AND ("literature review" OR "mapping study" OR "literature survey") Article Selection Process: Next, we removed duplicate papers (78 articles remained). As a last step of this process, we identified all the primary studies satisfying the Inclusion Criteria (IC). First, it was mandatory to assess the study as relevant to DevOps domain and then as secondary study. The Inclusion Criteria of our tertiary study are: IC1-The study deals with the DevOps domain. AND IC2-The study is a secondary study (i.e., literature review, mapping study, or literature survey). The Exclusion Criteria of our tertiary study are: EC1-The study is written in a language other than English; EC2-The study is an editorial, keynote, biography, opinion, tutorial, workshop summary report, progress report, poster, or panel.
In the final dataset, we kept studies that satisfied both IC1 and IC2, and did not satisfy any Exclusion Criteria (EC). The article selection process has been handled by the first three authors of this study, using a simplified version of the voting method, as described by Farhoodi et al. [9]. In particular, the first three authors inspected the publication's full text and assigned a binary vote (include / exclude). Studies with 3 include votes have been included in the study, whereas studies with 3 exclude votes have been automatically excluded. The inclusion of the rest primary studies has been discussed in plenary. In total, since the level of clarity for the inclusion/exclusion was high, only 4 articles have been discussed. At the end of this process, the final set of primary studies was comprised of 41 secondary studies.

C. DATA EXTRACTION, ANALYSIS, AND SYNTHESIS
In this section we present the data extraction, analysis, and synthesis process that we have used for answering the RQs. The proposed process relies heavily on synthesis and metaanalysis methods that are applied in the software engineering domain, as presented by Cruzes and Dyba [4], dos Santos and Travassos, [22], and Kitchenham et al. [16]. In Figure 2, we provide an overview of the proposed Data Extraction, Analysis, and Synthesis (DEAS) process that can be applied to any tertiary study that aims at building a dictionary of a field of research, comprised of topics and associated terms, exploring their consistent usage among secondary studies.
As a 1 st step, we extract all the research questions that are answered in secondary studies, compiling a list of research questions that are of interest in the field (e.g., "What are the main concepts related to DevOps?", "What are the main expected benefits and challenges of adopting DevOps?")the research questions are noted exactly as they appear in the study, without any interference of researchers. In case (a rare one) that a study does not have explicitly stated research questions, we use the goal. If no goals exist, we use the goal based on the organization of the results section. We note that since in this work we rely on the systematic mapping study process, no quality appraisal has been performed. For instance, using DARE would have excluded studies without RQs [15].

FIGURE 2. Data Extraction, Analysis, and Synthesis Process
As a 2 nd step, we perform thematic analysis 1 so as to consolidate a list of topics of research interest. To achieve this, we first extract a topic for each RQ (e.g., feature and benefits and challenges, according to the previous examples); and subsequently we merge similar topics together (e.g., we merge areas and features under the same topic, named features). To extract topics, we used open coding [10]. In particular, we examined the text of the RQs, subdivided them into words, and labeled the important ones with codes. When possible, codes are generated as words, "as-are" in the RQ. Otherwise, "synthetic" codes representing the semantic meaning of the research topic were created by the researchers. Next, topics were clustered into fundamental categories, which guided the future data collection.
In the 3 rd step, we build a collection of 2D arrays, in which for every study, we note a tuple of terms and topics. The terms are recorded as presented by the authors of the secondary study in tables or figures that answer the corresponding research question (e.g., Jira, Jenkins, Chef for the topic tool)-without any interference from the researchers. In case that an original (in the secondary study) RQ is not answered in a compact form (quite infrequent: only 5 out of the 41 examined studies), then terms are extracted from the corresponding text. The thematic analysis has been performed by using the Open Card Sorting method, introduced by Spencer [23]. In particular, we: (a) identified "Consolidated Terms" (i.e., super-categories)-e.g., we developed a term "Deployment"; (b) we mapped "Original Terms" to the consolidated ones-e.g., we mapped "Continuous Deployment" and "Automated Deployment" to "Deployment"; and (c) defined the final names of the Consolidated Terms, after we mapped all Original Terms. The first and the second author performed the process of identifying the terms, and the third and fourth authors validated the results. During the process of consolidating terms, along with their naming, there were some disagreements (approximately 2%), which have been resolved by a discussion among the authors. The low rate of disagreement was reached, by applying a process to guarantee the common understanding of researchers. In particular, first a thorough discussion among authors was performed. Next, we piloted the first 10 papers, which have been assessed in pairs by the three authors, so as to have an open discussion on the recording of variables' scores. All authors explained their scores, until a consensus was reached.
Upon the completion of these steps, the following data are extracted for every secondary study: [V1] Title: title of the paper.
[V2] Author: list of authors of the paper.
[V3] Year: publication year of the paper.
[V4] Type of Paper: conference or journal.
[V5] Publication Venue: name of the journal or conference.
[V7] Topics: the topics studied in each secondary study.
[V7] is an array of the topics that are studied in the secondary study (outcome of step 2 of DEAS), and [V8] is a 2D array, for which rows are the topics of [V7] and columns the identified terms (outcome of step 3 of DEAS). To answer RQ1 and RQ2, we have produced basic descriptive statistics (i.e., frequencies) and visualization methods (i.e., word clouds) on [V7] and [V8]. The complete dataset is available online 2 .
As a 4 th step of DEAS, we perform statistical analysis to answer RQ3. The statistical analysis explores the usage of the terms for pairs of topics, defined in different studies, calculating a Consistency Factor (CF). Given the set of terms mapped to two topics into two studies, CF is calculated as a fraction of same terms, divided by the size of the largest set. The calculations have been automated with an open-source tool developed by the authors for this purpose. The tool has been substantially tested before performing the data analysis process, following software engineering testing principles. The tool receives as input text files containing the terms and topics explored in each study, and calculates IntraCF and InterCF metrics.

In a domain with consistent and wellestablished terminology, it is expected that same topics (referred in different studies) are having a high CF (i.e., use the same terms); whereas different topics are having a limited overlap in terms (i.e., low CF) is expected.
where: 2 https://users.uom.gr/~a.ampatzoglou/aux_material/TS_DevOps.xlsx • terms(topici) denote the terms that are used for topici • Sx, Sy are two studies for which the consistency of terms on topici is calculated.
As part of the 5 th step, we have performed meta-analysis for interpretation purposes. For this step, we synthesize CF values per topic, calculating IntraCF and InterCF. InterCF is calculated as an average of the CF of one topic, against all others. IntraCF for topici is the consistency of terms on topici between all tuples of studies that are related to topici. An example of the calculations is presented below: Suppose that studies S1-S3 explore the following topics [S1]  To present the outcomes of RQ3, we have: (a) used frequencies of studies in which a term is used in conflicting topics; and (b) Venn diagrams for visualizing the "grey zones" among topics, i.e., terms used inconsistently.

IV. RESULTS
In this section we present the results of the performed tertiary study, organized by research question. More specifically, in Section IV.A we present the results on the most commonly researched topics in the DevOps area (RQ1). In Section IV.B, we focus on the terms that are mapped to each topic (RQ2). Finally, in Section IV.C we present our findings regarding the consistency of the DevOps terminology (RQ3). We note that in this section we mostly provide raw results, as well as their interpretations, since implications to researchers and practitioners are discussed in Section V.

A. DevOps Research Topics (RQ1.1)
To identify the most commonly studied DevOps topics in the secondary literature, we have analyzed the research questions answered by each secondary study, following the DEAS approach (see Section III.C). Upon synthesizing the topics of focus for each research question, we have identified 8 DevOps research topics as the most prominent ones-see Table I. We note that from the table, we have excluded research questions targeting demographics analysis, e.g., load of research per year, publication venues, top authors, application domains, etc. S3, S4, S8, S10, S14, S21, S24, S34, S41

Benefits
RQs that deal with the benefits that can be identified after applying DevOps. The benefits can be related to all viewpoints of DevOps: software product, customer relationship, process improvement, etc.

Challenges
RQs that focus on required activities, problems, etc. identified before the adoption of DevOps. Apart from the term "Challenges", in this topic we also classified RQs mentioning "Readiness Models", "Problems in Adoption", and "Adoption Success Factors".

Features
RQs that describe the features that differentiate DevOps from other development methodologies. In this topic we have classified also RQs mentioning: "Area", since after exploring the answers to such questions, we conclude it that the terms are used for the same purpose. S1, S3, S5, S10, S11, S15, S18, S21, S22, S24, S29, S35, S37, S38

Practices
RQs that deal with practices (i.e., specific processes or methods) that can be used while applying DevOps. Practices can be related to either software development (e.g., patterns), processes (monitoring), or any other viewpoint of the DevOps methodology. In this topic we have classified also RQs mentioning: "Methods", and "Capabilities".

Problems
RQs that focus on problems, challenges etc. identified while applying the DevOps development methodology (i.e., after adoption).

Quality Characteristics
RQs identifying the quality characteristics that improve upon the application of DevOps, or are of interest while monitoring DevOps. In this topic we have classified also RQs mentioning: "Measurement", and "Metrics". S7, S9, S13, S17, S18, S25, S32

Tools
RQs cataloguing and discussing the available tools that can be used when applying the DevOps development methodology.
S4, S6, S7, S8, S12, S19, S22, S34, S39 The most commonly studied topic is the identification of the Practices used while applying DevOps, which is researched in 63% of the secondary studies focusing on DevOps. Such practices can be software engineering ones (e.g., patterns, traceability, etc.) or operations-related (e.g., improved customer satisfaction). The popularity of Practices as a research topic is considered intuitive in the sense that understanding the practices that need to be considered for applying DevOps can lead to the development of practical guides of needed skills, tools, and potential training targets for DevOps industries. Second ranks DevOps Features, which are discussed in 34% of the secondary studies. We need to note that the term DevOps features appears to correspond to a list of foundations of applying DevOps (e.g., Culture, Communication, Sharing, etc.). The popularity of DevOps features can be explained, since various studies targeted at understanding the essentials of DevOps, as well as the differences from other development methodologies, especially, since the DevOps methodology is still not widely established.
The next group of topics relates to problems or benefits while applying DevOps. In particular, the Challenges topic (studied in 31% of the secondary studies) corresponds to problems that might be faced while mitigating the development methodology from a traditional one to DevOps, as well as the readiness to perform such a migration. Within the topic Problems, we have classified studies (29%) that pose research questions on the problems that industries encounter after adopting DevOps; whereas questions on the obtained Benefits are addressed by 22% of the studies. This group of topics targets to answer basic questions of DevOps adopters in terms of what to expect from DevOps, and if DevOps adoption is fitting to their organization. Additionally, 22% of the secondary studies attempt to catalogue the Tools that are used while applying DevOps in industry; as well as, Definitions of DevOps. Finally, 17% of secondary studies have dealt with specialized gains in terms of the Qualities that DevOps application can improve.

B. DevOps Terminology (RQ2)
In this section, for every topic identified in RQ1 we present the terms it comprises. We note that from this analysis, we have excluded "Definitions", since only one secondary study [S10] presented a definition of DevOps synthesized from the primary studies; the rest of the studies presented definitions from their primary studies, but did not synthesize. Therefore, no further synthesis from our side was possible. Regarding "Features", "Practices", "Tools", and "Quality Characteristics" we present the results in Tables II-V; whereas regarding "Benefits", "Challenges" and "Adoption" in  For tabular data, in the first column we list the (consolidated) terms identified for the topic, in the second column the percentage of secondary studies in which the term is reported as part of this topic, and in the third column the (original) terms as identified in the secondary study. As part of interpretation, we note that consolidated terms with high percentage refer to terms that many studies acknowledge as important. Within those, there are terms which are mentioned with many possible synonyms, a fact that indicates that this term is not uniformly used in the literature, and attention from the community is required. In Table II, we present the terminology under the topic DevOps Features. Based on our findings, when researchers refer to DevOps features, they (with some certainty-Freq.>50% and low number of synonyms) refer to the need for Automation, Sharing, Measurement, DevOps Culture, and Collaboration. Quality Assessment is referred as a Feature in 50% of the secondary studies, but both as a process (QA) as well as specific quality characteristics, e.g., Resilience, Complexity, Cost, Scalability. We remind the reader that despite the existence in the list of the quality characteristics, we needed to mention them as features, since they are classified as such in the secondary study. Lower in the list, we can identify terms that are related to more specific software development activities, such as Project Management, Service-Based Development, Deployment, Design/Architecture, etc., which however are not comprehensively classified as DevOps features (Freq. < 40%) and some of them being identified with various naming alternatives. Regarding DevOps Practices (see Table III), a long list of non-consolidated practices has been identified. However, with limited level of synthesis (e.g., "Continuous Deployment", "Automated Deployment", etc. are consolidated under the term "Deployment") 4 practices have occurred in more than 50% of the studies, namely: Deployment, Testing, Monitoring, and Quality Assessment. Among these practices, the first 2 are solely "Dev"-oriented, whereas Monitoring and Quality Assessment can be applied to either the "Dev" or the "Ops" branch of DevOps. The application of these practices seems to be non-negotiable for applying the DevOps process. Next, we can identify practices that appear in more than 30% of the studies, namely: Delivery, Continuous Integration, Design / Architecture, Continuous Improvement, Planning, Version Control, and Infrastructure. Additionally, an interesting observation is that after these DevOps practices, we have spotted Project Management, Automation, and Collaboration, which are defined as DevOps Features, as well-denoting a possible confusion or overlap between the two topics. This also leads us to the conclusion that a DevOps feature (i.e., Collaboration) is very important and needs support by the associated practices (i.e., face-to-face communication, shared responsibility, etc.). Finally, by inspecting Table III, we can conclude that the terminology used under the topic DevOps Practices is much more diverse, compared to the terminology under DevOps Features, which is considered expected, since Practices pre-existed the DevOps initiative, and therefore the terminology follows the more general terminology of Software Engineering. The last observation denotes that there is common understanding among stakeholders in terms like Deployment, Testing, Planning that are commonly used by Software Engineers unlike more general terms like Collaboration. With respect to tool support, while applying DevOps, in Table IV we have mapped the identified tools to the DevOps practice (as mentioned in the secondary study) that they can support. The table is ordered alphabetically, by DevOps practice. By inspecting the results, we can observe that the majority of the tools are related to Monitoring, Security, Project Management, Infrastructure Management, and Continuous Integration and Continuous deployment (CI/CD). By contrasting the results of Tables III and IV, we need to note that despite the prevalence of Testing as a DevOps practice, the tool support for it is limited compared to other practices. This finding can be interpreted either as: (a) lack of tool support; or (b) the existence of so well-established tools for this DevOps practice that can cover the current needs, leading to no requirement for introducing additional ones. We believe that the same arguments apply also to Build and Version Control tools. Finally, regarding Quality Characteristics (see Table V) of interest when applying the DevOps, Testability stands out as the most important one, followed by Maintainability, Performance, and Security. These results comply with the ones on DevOps practices, as well as tool support. The additional evidence on the importance of Testability probably signifies that regarding tool-support there is probably no lack of tools, but that due their significance some wellestablished solutions monopolize the market and hinder future research. On the contrary, the importance of security, combined with the number of existing tools, suggests that probably this is an open research field and that practitioners have not concluded with the tool support in this direction. Regarding the rest of the topics (Benefits, DevOps Adoption, and Challenges), we have visualized the main terms in the form of word clouds. In the word clouds, the larger the fonts of a term, the higher the number of papers in which they appear. In terms of "Benefits" from adopting DevOps most of the studies focus on the "Ops" branch (see Figure  3). In particular, as the main benefit they highlight the improved customer satisfaction, which is an intuitive outcome in the sense that the customer is actively involved in the "Dev" branch, mostly by validating requirements in almost real-time. Additional benefits are related to improved security control, since the system is operational from early stages, enabling the run-time security assessment for a longer period. Finally, the agile principles that govern DevOps development enable a continuous planning, which can be updated based on customer requests, as well as, the development resources, leading to a better application.

FIGURE 4. Perceived Challenges before Adopting DevOps
On the other hand, with respect to problems / challenges, in Figure 4 we present the most common challenges in "Adopting" DevOps; whereas in Figure 5 the "Challenges" faced when applying DevOps. The main challenge before adopting DevOps is the selection of and familiarization with the tools that will be used while applying DevOps. As can be observed from Table IV, despite the fact that some of the required tools are generic (not necessarily DevOpsspecific), others (e.g., related to Monitoring) might be unfamiliar to non-DevOps organizations. Additionally, another aspect that concerns companies is the fitness of the software (e.g., software functionality) and the customer (e.g., line of business) for applying DevOps. From this point of view, we can deduce that not all application domains are fitting for DevOps, e.g., continuous development / deployment / delivery might not be applicable, or the customer might not have the ability to be as active and knowledgeable. Furthermore, the application of DevOps, dictates changes in the "Ops" branch of the company, since the operations department needs to adopt its processes to more central role of the customer, providing him/her the ability for providing continuous feedback. Finally, it seems that setting mean cycle times (and more generally time management) is a challenge for companies that are not experienced in DevOps, considering it as a main challenge before migrating their process to DevOps.
Finally, with respect to the "Problems" that are faced after the adoption of DevOps, from Figure 5, we can observe that problems are related to various aspects of DevOps that might be lacking from the company. On the one hand, in terms of processes, problem might be related to lack of automation, lack of management, lack of flexibility, frequency of delivery, etc. On the other hand, from a more technical perspective, the company might face challenges from the complexity of deployment, immature automated deployment, etc. Additionally, as a problem that might arise along development is the need for security in modern applications, which is considered as problem, while applying DevOps.

C. DevOps Terminology Consistency (RQ3)
In this section we present the results of the consistency analysis, following the DEAS approach. For each identified topic, we explore the consistency in the used terminology, i.e., that the same terms are consistently and orthogonally mapped into research topics.
In Figure 6, we present the IntraCF (orange bar) and In-terCF (blue bar) scores for each topic. The results suggest that the topic with the maximum consistency is "DevOps Features" followed by "DevOps Practices". As expected more generic topics, such as "Challenges", "Benefits", and "Problems" are less consistent. With respect to "Tools" we can observe that although no confusion is tools' names can be performed, the number of tools mentioned in all secondary studies is rather limited. This finding can be explained by the different focus of secondary studies that can potentially lead to the mention of different tools (used for different purposes). Another interesting observation is that "Quality Characteristics" is a topic that has limited overlap with others (low InterCF).

FIGURE 6. Intra-and Inter-Topic Consistency
Finally, very similar values of InterCF between "Practices" and "Features", as well as "Challenges" and "Problems", provide a hint of a possible confusion between the two pairs of topics by the DevOps researchers, and needs further consideration. First, to explore the confusion between "Practices" and "Features" in Figure 7, we present the 16 terms (37% of all terms used) that are used interchangeably in the secondary studies (intersection of the Venn diagram). As a second step in this analysis, in Figure  8, we present the number of studies, in which each one of these "grey zone" terms are classified as either "DevOps Practice", or as "DevOps Feature". Based on this analysis, we can classify each term to a single topic for cases with a clear difference (e.g., Version Control can be more safely classified as a term for DevOps Practices, or Automation mapped to DevOps Features). Nevertheless, even after this analysis, some terms are safer to be considered as both Features and Practices (e.g., Communication, and Trust). The last approach may help the community towards associating the desired feature (Communication) to the associated practices (Open channel Communication with continuous feedback). Regarding the second most usual confusion (i.e., "Challenges" and "Problems"), we can observe a logical continuation: challenges before adopting, not being resolved before starting the project; therefore, being identified as problems along the application of the DevOps methodology. For example, when the challenges: (a) adopting automatic testing techniques; and (b) setting up an automated DevOps pipeline; are not satisfied before the start of the projects, leads to the: (a) lack of automation; and (b) lack or immaturity of automated deployment problems.

A. Recap and Synthesis of Research Questions
In this section we synthesize the findings presented in Section IV, based on the answers to research questions. The main findings are summarized below: F1. Secondary studies on DevOps appear to investigate two main lines of research: (a) understand the main DevOps features, as well as the practices and tools that are used for DevOps application; and (b) catalogue the problems that are faced when applying and before adopting DevOps, as well as the benefits from applying DevOps. F2. Automation, Sharing, Measurement, DevOps Culture, and Collaboration are indisputable features of DevOps that need to be considered for the successful application of DevOps. F3. Continuous Deployment, Testing, Continuous Monitoring, and Quality Assessment are practices that need to be employed for a successful application of DevOps. F4. A variety of tools exist for most of the practices. F5. The majority of DevOps benefits are related to customer involvement, whereas the majority of the problems are more related to Dev branch of the methodology.
F6. The terminology of DevOps is ambiguous, especially for practices, since various terms are used for referring to the same practice. F7. The terminology in terms of practices and features is mixed: many practices are also listed as features, and vice versa. F8. A significant number of problems before adopting DevOps continue to be problems during the application of the DevOps methodology.
Driven by F1, we perform two synthesis actions: one for the first line of research (features, practices, tools) and another for the second (problems and benefits). Considering F4, F6, and F7 in Figure 10 we present a synthesized classification of terms to features, practices, and tools.
The main contribution of this classification is the disambiguation of the "grey-zone" features and practices presented in Figure 8. To classify the "grey-zone" features and practices, even after the analysis of Figure 8, we performed the classification, based on the definitions of Table I. Therefore, Trust and Communication have been classified as "Feature", since they are more conceptual term; whereas Behavior-Driven SE as a "Practice" since it is an activity that can be performed along DevOps application. Additionally, we have mapped the tools presented in Table IV to the final set of DevOps Practices, so as to provide a comprehensive panorama on how certain DevOps activities can be automated, or at least be toolsupported. Next, driven by F8, in Figure 10, we present a mapping of challenges faced before the adoption of DevOps to problems that have been identified along the application of the DevOps methodology. The rationale for this synthesis process was the identification of a common practice or feature in the raw data of Figure 4 and Figure 5. The column on the left side of the figure corresponds to challenges that a team can face before DevOps adoption, whereas the right side of Figure 10 corresponds to problems that the secondary studies have identified as important along the application of DevOps. The majority of the "unresolved" problems from adoption to application are mostly related to practices (rather than features) and are mostly linked to the "Dev" branch of DevOps (e.g., security, testing, development productivity, etc.)

B. Implications to Researchers and Practitioners
Based on the findings of this study, several implications to researchers and practitioners can be highlighted. On the one hand, regarding researchers, we propose the use of the synthesized terminology presented in Figure 9, as a unified vocabulary so as to reduce ambiguity in the DevOps terminology, and enhance communication among stakeholders and researchers. Also, we could stress the need for identifying specific security tools as state-of-the-art ones, and propose their consistent use in practice. Finally, it seems im-portant to consider the quality characteristics that are important to DevOps, such as testability and maintainability and focus their future research endeavors into proposing methods for safeguarding them. For instance (given the problem highlighted in Figure 10), with respect to testability it seems important to propose methods that enhance automated deployment and testing through a pipeline that will enable continuous delivery. Whereas for maintainability, it might be interesting to explore the technical debt metaphor to speed-up development, achieving a higher pace. 6 VOLUME XX, 2017

FIGURE 10. Transformation of Challenges to Problems
On the other hand, regarding practitioners, the following implications can be highlighted: (a) attempt to seek for automation solution (or at least tool support) for their DevOps activities. Guidance is provided by Figure 9. Emphasis shall be placed on automating the most prolific practices: Continuous Deployment, Testing, Continuous Monitoring, and Quality Assessment; (b) attempt to promote a DevOps Culture to the employees of the company, so as to secure the adoption of the main DevOps features: Sharing, Measurement, and Collaboration; (c) make full benefit of the main advances that DevOps bring, such as involvement of customer, leading to a more relevant product, and at the same time consider try to prevent the problems that are faced along DevOps application. To this end, during DevOps planning, special emphasis shall be given to the adoption challenges that lead to the most common problems (see Figure 10).

VI. Threats to Validity
In this section we present the threats to validity of the current study based on guidelines for identifying, reporting, and mitigating threats to validity, specialized for secondary research studies in software engineering, as they are suggested by Ampatzoglou et al. [1].

A. Study Selection Process
Study selection validity concerns the early phases of the research, i.e., the search process and the filtering of studies.
To guarantee that our search process adequately identified all relevant studies (from the studied top-quality venues) we used a well-defined process, based on strict guidelines [21]. The search string was systematically constructed (see Section III.B), in the sense that we have used the term DevOps combined with well-established terminology for secondary studies. However, it could be possible to exclude studies that have used different terminology from the more established ones-i.e., not referring to a secondary study as "Mapping Study" or "Literature Review". Regarding the first part of the search string, we have preferred to use only the broader term (i.e., DevOps), instead of DevOps alternatives, such as: DevSecOps, BizDevOps, DataOps, etc. This decision was made to protect our dataset from being biased due to the specifics of alternatives, e.g., DevSecOps boosting the Security Quality Characteristic. Furthermore, in the inclusion / exclusion phase, it could be possible to exclude relevant articles. To mitigate this threat, we used three authors in the selection process, discussing any potential conflicts and a systematic voting procedure. Also, the inclusion / exclusion criteria have been extensively discussed by the authors to ensure their clarity and to avoid misinterpretations (see Section III.C). Moreover, from our process we have excluded grey literature, since the goal of the study focuses on systematic secondary studies, almost never published in grey literature. Our study suffers from missing non-English papers; however, most top venues (journals and conferences) in software engineering are only publishing in the English language. Finally, we were able to access all publications because our institutions provide access to DLs.

B. Data Validity
The main threat for the data validity is related to data extraction bias and the selection of publications. First, all relevant data were extracted and recorded manually by the first and the second author. Due to the potential for subjectivity in this process (e.g., regarding the classification of each term), two other authors reviewed and further refined the collected data, re-validating them. After this process, the results were discussed among all authors and they resolved any conflicts (see Section III.D). Additionally, there is no publication bias in the selected studies, in the sense that the secondary studies have been retrieved by various venues. Thus, the aforementioned studies are not affected by a closed and small circle of researchers. Our tertiary study is not affected from the following threats: (a) small sample size, as we retrieved all possible secondary studies that focus on DevOps; (b) lack of relationships, the study did not aim to identify relationships between data, but only to classify; and (c) the selection of variables to be extracted, as the research questions of this study did not create disagreements in the discussions between authors based on the variables to be extracted. Moreover, we did not identify issues with the use of statistical analysis, in the sense that the nature of our research questions did not require hypothesis testing, but only basic statistical analysis (descriptive statistics). Finally, to mitigate the researchers' bias in data interpretation and analysis, the authors discussed the data clustering based on the topics that the research questions of each secondary study focuses and the terminology that they have been used. Nevertheless, we need to note that some explanations express the viewpoints and personal opinion of the authors, based on the understanding of the results.

C. Research Validity
In terms of research validity, threats are related with research method bias and repeatability. Regarding the first one, the majority of the authors are very familiar with the process of conducting secondary studies, as they have participated in a large number of secondary studies as coauthors and reviewers. On the other hand, it could be argued that the following evaluation process ensures the reliability and replication of this study. Therefore, all important decisions for the review process have been thoroughly documented in this manuscript and can be easily reproduced by other researchers. Second, the fact that the export of data is based on the opinion of four authors can to some extent guarantee the reduction of potential bias. Finally, all extracted data have been made public so that the results can be compared and validated. Additionally, through discussion among the authors, we have defined three main research questions in which they accurately map to the study goal. This is clearly illustrated by the mapping of each research question to the research objectives / goals. Furthermore, in the literature we have been able to identify a sub-stantial amount of related works that can be used for comparison to our results. Finally, the selection of the research method is adequate for the goal of this study and no deviations from the guidelines have been performed.

VII. Conclusion
This tertiary study provides a structured understanding of the state-of-research on the DevOps development methodology. For this purpose, 41 secondary studies focusing on DevOps have been identified and analyzed with respect to: (a) the topics that the authors address, (b) the mapping of terms to the different topics and (c) the consistency in the use of terms across different studies. Based on the findings, it seems that at the moment there is ambiguity regarding the terminology used among DevOps stakeholders. This fact makes the retrieval of information relevant to DevOps practices, features and supporting tools a difficult process. We believe that despite the fact that there are research potentials on the particular topic, the impact of the studies and their theoretical development will be limited unless: (a) the DevOps community adopts a common terminology; and (b) the researchers and practitioners focus on cumulative building of knowledge.
Future research can focus on ways that the various features of DevOps methodology should be integrated into practices and automation tools enabling the smooth collaboration between the Dev and the Ops teams supporting the whole process of deploying software, collecting and communicating real-time data for achieving measurable goals. Finally, we believe that an additional study that would explore possible confusing aspects of DevOps terminology with practitioners, and validate the proposed consolidated terms would be highly valuable for both researchers and industrial stakeholders.