Decomposition of Monolith Applications Into Microservices Architectures: A Systematic Review

Microservices architecture has gained significant traction, in part owing to its potential to deliver scalable, robust, agile, and failure-resilient software products. Consequently, many companies that use large and complex software systems are actively looking for automated solutions to decompose their monolith applications into microservices. This paper rigorously examines 35 research papers selected from well-known databases using a Systematic Literature Review (SLR) protocol and snowballing method, extracting data to answer the research questions, and presents the following four contributions. First, the Monolith to Microservices Decomposition Framework (M2MDF) which identifies the major phases and key elements of decomposition. Second, a detailed analysis of existing decomposition approaches, tools and methods. Third, we identify the metrics and datasets used to evaluate and validate monolith to microservice decomposition processes. Fourth, we propose areas for future research. Overall, the findings suggest that monolith decomposition into microservices remains at an early stage and there is an absence of methods for combining static, dynamic, and evolutionary data. Insufficient tool support is also in evidence. Furthermore, standardised metrics, datasets, and baselines have yet to be established. These findings can assist practitioners seeking to understand the various dimensions of monolith decomposition and the community's current capabilities in that endeavour. The findings are also of value to researchers looking to identify areas to further extend research in the monolith decomposition space.


I. INTRODUCTION
W ITH the passage of time, successful software systems grow large and become complex, due to the addition of a plethora of functionalities resulting in highly coupled but less cohesive components [1], [2]. In these large and complex systems, monolith architectures [3] embody the centralisation of functionality in large individual components, giving rise to inherent limitations in terms of scalability, maintenance and deployment performance [1], [4], [5], [6], [7], [8]. In contrast, microservice architectures are distributed, favouring the decomposition of systems into various relatively small and independent components that may be invoked as required [9], [10], delivering benefits in areas such as increased scalability and improved deployment frequency [2], [4], [11], [12].
Prior to the introduction of microservice architectures, monolith architectures were commonly adopted. However, as software systems continued to grow in size and with demand growing for ever-faster release cycles, the need for partitioning systems into separately compilable services came to the fore. Also, with the advent of cloud-based infrastructure innovations such as Software-as-a-Service [13] and Function-as-a-Service [14], the relative benefits of monolith architectures have been reduced [15]. As a result, there is growing interest in microservice architectures [16], [17].
While a great deal of important work has been conducted in the decomposition space to date, the existing published material tends to focus on addressing a specific scenario, domain, or programming language. We present a consolidation of the innovations to date, and we introduce a framework as a means to organise and classify the key methods and approaches that may be adopted when undertaking decomposition projects. To do so, this research adopts a Systematic Literature Review (SLR) [28], [29], [30] and snowballing to address the following research objective: to systematically identify and organise existing research on the decomposition of monolith applications into microservices. To this end, this research initially identifies 5022 search results from major computer science literature databases and applies four rigorous refinement steps to identify 33 research papers. After applying a snowballing method, this research finally examines 35 research articles.
Central to our findings is the observation that monolith decomposition into microservices is a complicated and expensive task. Many different approaches have been proposed, each of which may be used in isolation or combined for greater effect, and as demanded by the context. This suggests that automated or semi-automated tool support is advantageous and therefore, various existing tooling is also identified and analysed in this review. More importantly, it suggests the need for an organising framework for the area and consequently, the Monolith To Microservices Decomposition Framework (M2MDF) introduced in this research provides a comprehensive map for researchers and practitioners to navigate this complicated landscape.
This research concludes that although monolith decomposition has gained more attention in recent years, it remains at a preliminary stage for the following reasons: (i) the lack of integrated, comprehensive data collection and analysis methods with regard to crucial aspects of the monolith application, (ii) the lack of comparison between various decomposition methods and algorithms, (iii) the lack of validated and widely accepted metrics and benchmarks to measure the quality of resulting microservice candidates, and (iv) the shortage of integrated tools to support the various stages of microservices extraction pipelines.
The remaining sections of this paper are organised as follows. The background and related work section (Section II) introduces monolith to microservices decomposition, its challenges, and related literature. Section III presents the Systematic Literature Review (SLR) methodology employed along with a description of the analyses performed. In Section IV, the M2MDF is presented along with the detailed results of the SLR. Section V identifies existing gaps in the field, with Section VI discussing acknowledged threats to validity. Section VII presents a summary of the work in the context of specific implications for practitioners and researchers, and includes a post-review reflection that evaluates the utility of the proposed M2MDF by examining literature published following the primary review timeframe. Section VIII presents concluding remarks and proposed future research directions.

A. Monolith versus Microservices Architectures
An application based on a monolith architecture is recognised as one which combines all (or many of) its modules into large components that are self-contained and independent from other applications [1]. Monolith architecture brings the benefit of localising significant portions of application functionality in a single manageable space but, as it becomes overly large and complex, it can inhibit maintenance and deployment flexibility [44]. Whenever a module in a monolith changes, an entire application may need to be reintegrated, rebuilt, retested, and restarted upon deployment [1]. Moreover, the scalability of monolith applications may introduce significant challenge as application usage grows due to size and load distribution [45].
Microservices architectures are presented as an alternative to monolith architectures. They are organised around business functions that perform a single task and maintain their data in a decentralised manner. They are built on the principles of single responsibility, high cohesion, low coupling, scalability, deployability, and low perturbation (meaning, minimal disturbance to other microservices that comprise the software system) [46], [47], [48], based on components that can be evolved and deployed independently [12], [49].

B. Monolith to Microservices Decomposition
Many companies have chosen to continue with monolith implementations, as they have large existing monolith investments and limited knowledge of the complex process required to decompose their monoliths into microservices [50].
But, even for the the large proportion of organisations embracing microservices, manual decomposition of the monolith application is a challenging task [2] and may require existing application experts to devote a considerable volume of time to the tedious task of profiling and fully understanding the intricacies of the monolith code. As a complementary or alternative approach, automated code analysis tools have been used to identify service boundaries within a monolith. But, as we shall outline in Section IV, research dealing with automated system analysis and microservices identification is still somewhat in its infancy (although growing steadily).

C. Existing Literature Reviews on the Decomposition of Monolith Applications to Microservices
Very recently, the amount of secondary research on the decomposition of monolith applications to microservices has been growing. The studies investigate a wide range of research questions related to the rationale of the migration [11], the problems and challenges of the migration process [32], microservices analysis [40], [41], and strategies for supporting microservices extraction. Other secondary studies investigate the broader aspects of microservices architecture including the use of microservices architecture in DevOps [51], tools and techniques to support microservices development [52], microservices granularity [53], and cloud migration processes [35].
Abdellatif et al. [40] conducted a review of 41 studies (2004-2019) to identify the inputs, processes, outputs and usability of service identification approaches used for the modernisation of legacy software systems. The aim of this work was to assist practitioners in the selection of a suitable approach for identifying services. The authors presented a multi-layer taxonomy of existing approaches used in the broader context of service identification and web services whereas, we focus on the approaches used to decompose monolith applications, particularly into microservice architectures. Another distinction is that Abdellatif's study focuses on service identification without a particular focus on microservices, resulting in only three overlapping studies.
Velepucha and Flores [32] conducted an SLR to identify the problems and challenges associated with microservices migrations. To answer their research question, the authors reviewed 37 studies and identified major problems concerning suitable tool selection, team reorganisation, the complete or partial migration decision, microservices identification, and multiple database consistency. The identified problems cover technical, functional, and behavioural aspects of the migration process. Although this study provides practitioners with crucial information as to what problems and challenges they should expect in the migration process, it does not provide a detailed overview of the availability and usability of decomposition approaches and tools that are in use.
The question of why and how monolith legacy systems are migrated to microservices is investigated by Wolfart et al. [11]. The authors identified 11 migration driving forces: optimised scalability, independent and automated deployment, easier maintenance and evolution, independence of teams, loosely coupled services, cohesive services, technology flexibility, infrastructure facilities, agility enabling, easier reuse, and reduced time. The authors furthermore categorised the modernisation process into eight activities organised into four phases: initiation, planning, execution, and monitoring. Bushong et al. [41] also investigated the practice of analysing microservices architecture focusing on analysis and debugging methods used for microservice systems. The most relevant aspect of this research is the investigation of the practice of migration of monolith applications into microservices, but with less emphasis on the decomposition techniques.
A rapid review has been conducted on the migration from a monolith architecture to microservices by Ponce et al. [33] with the aim of gathering, organising, and analysing migration techniques. The review, which comprised of 20 research articles, investigated available migration techniques, the types of systems the techniques are applied to, the validation methods used, and the challenges faced. Three migration techniques were identified: Model-Driven (MD), Static Analysis (SA), and Dynamic Analysis (DA), with 70% of the applications being web-based and more than 90% of the applications being Java-based. As part of the review, the paper also discovered that case studies are the most widely used validation method, followed by experiments and examples. This review is informative and related to the focus of our research, however, the paper acknowledges that while rapid reviewing provides a lightweight methodology to analyse the migration process [54], it can result in reduced coverage.
Complementary to the above rapid review [33], a systematic review was conducted to identify and classify existing refactoring approaches in the context of monolith application decomposition, by comparing 10 existing studies [44]. The paper classifies the studies using static code analysis aided (SCA), meta-data aided (MDA), workload-data aided (WDA), and dynamic microservices composition (DMC) approaches. The authors of the review evaluated the selected methods using parameters including granularity, input and output types, result evaluation, tool support, and validation. The review outlines existing approaches using a decision guide but does not cover the microservices decomposition methods. In addition, the review has a broad scope since it includes greenfield microservices development as well as monolith application decomposition. A re-examination of the findings of this review is crucial because of its scale and because a considerable number of new studies on the topic have been published recently: significant growing interest has been observed over the past three years with an influx of research studies examining the use of novel tools and techniques to assist with the migration problem.
A systematic mapping study of microservices construction was conducted in [55], where 103 primary studies were examined to identify, classify, and evaluate the state-of-the-art microservices architecture solutions. Although related to our research, this particular mapping study does not focus directly on decomposition projects. Kazanavičius and Mažeika [56] also conducted a literature review on the migration of legacy software to microservices architecture to understand the techniques employed and the challenges faced. The paper studied the benefits and drawbacks of certain earlier studies and presented some interesting comparisons between refactoring and rebuilding decisions. However, it is not directed at decomposition specifically, is limited to only six earlier studies and the SLR methodology used to select the studies is not discussed in detail.
An overview of the lessons learned and the associated difficulties/challenges of the migration process is investigated in [42]. The review focuses on papers discussing migration difficulties. However, only five papers emerge from the SLR, with a further seven papers based on the suggestion of the authors. Although this study reports the challenges, it does not cover aspects of the migration process such as data collection, analysis, decomposition, and evaluation.
Additional studies covered related aspects of the migration process. An earlier study investigated the importance of variability (the adaptability of a system for a particular context) for supporting the extraction of microservices from monolith legacy systems in industry [37]. Cojocaru et al. [57] further studied the attributes used to assess the quality of microservices that are derived from monolith applications, proposing minimal indicators for use in evaluating the quality of microservices and Service-Oriented Architecture (SOA).
As a summary, we extracted the research questions and the broader topics of the existing review papers (in reverse chronological order) in Table I. This is helpful, as it reifies the focus of these related earlier works. It also helps to focus our research. Extending earlier works, this paper conducts a detailed review that includes the different phases of the decomposition of monolith applications into microservices. It extends the findings of the existing reviews and draws a comprehensive and interrelated picture of the various decomposition techniques, benchmarks, and metrics, identifying gaps and outlining future directions. Although several SLRs are presented on the migration of monolith applications in general and the decomposition to microservices in particular, there is no single SLR that fully address our research objective (and consequently our research questions).

III. METHODOLOGY
This paper follows the three-phase literature survey process (planning, reviewing & reporting) proposed in [28], [29], [30]. The planning phase includes identifying the need for a review and the development of a review protocol. It is described in Section III-A The review phase includes selection of the primary studies, assessment of the studies, data extraction, and data synthesis. It is detailed in Sections III-B and III-C. Finally, the reporting phase focuses on documenting the review, which includes document observation, and the reporting of results. It is described in Sections IV and V.

A. Planning the Survey
The planning phase examined the research motivation, ultimately leading to the development of research questions.
1) Identifying the Need for an SLR: Of itself, the absence of comprehensive and up-to-date secondary research examining monolith-to-microservices decomposition highlights the need for a comprehensive SLR. Although there are rapid and partial reviews of the area, their focus is limited to specific aspects of the migration and they lack an in-depth analysis and organisation of the methods and algorithms. The majority tend not to address benchmarking or report on the benchmarking employed. The searches used in this study have not discovered any comprehensive secondary research that addresses all these aspects and, consequently the research questions in Section III-A2 were derived. We believe that answering these questions will significantly contribute towards consolidating and supplementing existing research with rich analysis.
Furthermore, evidence from earlier research studies indicates a demand for research of this type, that the topic is of interest to academic and industrial practitioners, and that the body of related literature is growing [55]. Our own initial searches confirmed this to be the case and established that despite growth in the volume of publications year-on-year, there was no systematic, comprehensive, and robust evaluation of the state of the art focusing on a holistic view of monolith to microservices decomposition in recent years. It is this position that fundamentally motivates the SLR presented in this paper.
2) Specifying the Research Questions: When conducting an SLR, it is crucial to identify relevant research questions with the capacity to deliver unambiguous answers [28]. We identified four such research questions in this area: RQ1 What are the primary phases of monolith-tomicroservices decomposition and the major constituent elements of those phases? RQ2 What are the existing approaches, tools and methods observed in the decomposition of monolith applications into microservices? RQ3 What are the metrics, datasets, and benchmarks used for evaluating and validating monolith decomposition into microservices? RQ4 What research gaps can be identified in the current literature? 3) Defining and Evaluating the Review Protocol: This work has been conducted in the context of the Future Software Systems Architecture (FSSA) project based at Dublin City University and Lero, the Science Foundation Ireland Research Centre for Software. Formal industrial collaborators include the FINEOS Corporation and fourTheorem Limited. The review process was led by the first author who prepared the SLR protocol by selecting the topics and the search strings. The protocol, as presented below, was internally evaluated by a team of seven researchers who are members of the FSSA project working in the area of monolith code migration as well as two expert practitioners in the microservices industry. The protocol was applied iteratively and, at each iteration, the scope, inclusion/exclusion criteria, and the search terms were revised as appropriate to address the research questions.

B. Selection of Primary Studies
Guided by the research questions, initial terms representing the research topic were extracted. We identified three main topics and built the search terms around these topics: Monolith, Microservices, and Decomposition. To formulate the search keywords under each topic, we further considered the previous literature reviews in the area. Using synonyms and related terms, each of the topics was expanded to include additional search terms as follows: (i) monolith, existing, and legacy, (ii) microservice and micro-service, and (iii) decomposition, migration, identification, extraction, refactoring, modularisation, transformation, transition, and conversion. The keywords under the three main topics were combined using the Boolean 'OR' and 'AND' operators along with a wildcard(*) representation of the keywords to ensure a higher recall. The topics and search strings were reviewed by the third and last authors.
The final search string is represented as: (monolith* OR exist* OR legacy) AND (microservice* OR micro-service*) AND (decompos* OR migrat* OR identif* OR extract* OR refactor* OR modular* OR transform* OR transit* OR conver*). Major platforms such as IEEE Xplore and ACM Digital Library only support a fixed number of Boolean operators and wildcards, effectively forcing splitting the search string into three queries: (i) (monolith* OR existing OR legacy) AND (microservice* OR micro-service*) AND (decompos* OR migrat* OR identif*), (ii) (monolith* OR existing OR legacy) AND (microservice* OR micro-service*) AND (extract* OR refactor* OR modular*), and (iii) (monolith* OR existing OR legacy) AND (microservice* OR micro-service*) AND (transform* OR transit* OR conver*). After removing duplicate studies, the results from the three queries were combined.
1) Initial Search: We searched on the established platforms for the publication of robust peer-reviewed computer software engineering research, including IEEE Xplore, ACM Digital Library, Science Direct, SpringerLink, Wiley Online, and Scopus. Where possible, we conducted an advanced search on all metadata available on these platforms. Google Scholar has been used to promote the retrieval of recent papers that may not yet be indexed on the established publication platforms. The search was conducted on all platforms by the first author in collaboration with the fifth and sixth authors.
We also adopted Publish or Perish 1 to harvest and organise the metadata, which is also used as an additional sanity check to ensure the discovery of relevant papers. The last search on the platforms was conducted on the 28th of October, 2021. We restricted the search to peer-reviewed scientific publications found in journals, conferences, and workshops between 2015 and 2021 inclusive, as microservices literature only started to gain significant momentum from 2015 on [52], [59]. A total of 5022 studies were initially retrieved.
2) Refinement: The screening of the studies was conducted by applying general criteria for the exclusion of studies on the search results. To refine the studies, we applied filtering based on: (i) the title, (ii) the title, abstract and conclusion, and (iii) full-text analysis, by applying the inclusion and exclusion criteria. Studies were screened by applying the following exclusion criteria (EC): (EC1) Duplicate Study, (EC2) Books and Patents, (EC3) Non-peer-reviewed Study, (EC4) Secondary Study, (EC5) Study written in languages other than English, and (EC6) Study published before 2015. Note that there is an additional step at the end of the process to protect against oversight of significant pre-2015 published material (details of which are provided at the end of this subsection). Further studies that did not satisfy the inclusion criteria were removed. The three inclusion criteria (IC) are: (IC1) the primary objective of the study should be the decomposition of monolith applications into microservices, (IC2) the study should include structured and preferably automatic or semi-automatic decomposition approaches, and (IC3) the study should sufficiently describe the decomposition method, code, algorithm, and its evaluation (i.e., abstracts and extended abstracts are not included).

Refinement
Step 1: By merging the results obtained from the search platforms, we automatically removed duplicate studies, incomplete data, books, websites, and reports which resulted in 968 studies. Refinement Step 2: Refinement step 2 was conducted by inspecting the title of the studies. The main focus in this step is to further refine studies that do not satisfy the inclusion and exclusion criteria. This step resulted in 126 studies. In circumstances where it was not possible to decide based only on the title, the studies are promoted to the next refinement step. Refinement Step 3: The studies that passed refinement step 2 are further refined by carefully inspecting the abstract and the conclusion sections. After the application of the criteria, 92 studies passed to the full-text review. Refinement step 2 and Refinement step 3 were conducted by the first author as they involve applying objective exclusion criteria using the collected metadata and because this was an initial filtering only, to be refined by further group-wise filtering later on.

Refinement
Step 4: Next, a full-text review of the studies was conducted against the inclusion criteria, resulting in 33 selected papers. Refinement step 4 which is based on the full-text analysis, was conducted by the wider team strictly applying the inclusion and exclusion criteria by organising a series of weekly literature review presentations with the third, fifth, sixth, and seventh author. Specifically, the first and the fifth authors jointly participated in the full-text screening by focusing particularly on the methodology, experiment and evaluation sections. Their deliberations were presented at a series of literature review meetings conducted jointly with the first, third, sixth and seventh authors to decide the final selection, based on the full consensus reached after these review meetings. These review presentations were venues for critically reflecting on the reviewed studies, filtering studies, identifying gaps, assessing the validity of proposed methods fostering theoretical debates and even triggering replication of selected proposed methods. Additional high-level review meetings were conducted to further scrutinise the M2MDF framework and the associated analysis results, which have been conducted by the first, second, third, fourth, and last authors. The details of the studies following each refinement step are included in the replication package [87].
Snowballing: One forward and backward snowballing iteration [88] was then conducted by the first author, on the 33 papers. Using backward snowballing, we extracted 941 references from the reference section of the studies. Using forward snowballing, we extracted 594 studies citing the 33 selected studies. After extracting the references and citations from the 33 studies, we conducted two major refinement steps each containing additional minor refinement steps discussed as follows. Refinement Step 5: We combined the forward and backward snowballing results and filtered out duplicate and incomplete records, along with books, patents, non-peer reviewed studies and publications before 2015 resulting in 212 studies. Refinement Step 6: Again, we compared the 212 studies with the 968 studies obtained from the literature search to further refine the studies. Further inclusion and exclusion criteria are applied on the full-text of the studies. As a result, an additional two studies from Springer-Link and the Association for the Advancement of Artificial Intelligence (AAAI) were included in the study. The details of the snowballing process and its results are also included in the replication package [87].
Quality Assessment Criteria. Based on the quality assessment of primary studies proposed in [89], [90], the following quality assessment questions that correspond to the inclusion criteria are adopted to determine the quality of the studies concerning the objective, method, and coverage of the studies, respectively. Q1) Does the study's primary objective explicitly focus on the decomposition of monolith applications into microservices? Q2) Does the study include structured and preferably automatic or semi-automatic decomposition approaches? Q3) Does the study sufficiently describe the decomposition method, code, algorithm, and evaluation? These questions are implicitly used in the refinement stages. For each candidate study during a particular refinement stage, each question is answered (in the order of its appearance) using a numeric value 0 or 1, where the value 0 indicates that the study does not answer the specific question, and the value 1 indicates that the study answers the question. Studies that answer all three questions are included in the review.
Munir et al. [91] defines quality assessment for SLRs in terms of 'rigor and relevance'. Note that two of our quality assessment guidelines address relevance. But, in this case, 'rigor', referring to the quality of the empirical work performed in the primary studies, is not appropriate as it leads to an assessment of the quality of empirical results that are then synthesized with the 'whole' to address the RQ. Instead our third criteria refers to the primary studies explicitly addressing the component phases, approaches, tools, methods, and (evaluation) metrics, datasets and benchmarks in monolithic application decomposition. This is in line with the stated research questions in this work, but does give a Systematic-Mapping-study feel to this SLR.
An additional, critical review of the overall methodology and research findings has been conducted by the eighth and ninth authors, who are FSSA project Advisory Board Members and have therefore been generally advising on FSSA technical implementations. This final critical review process was iterative in nature and required a number of review cycles and feedback.
Ultimately, 35 relevant studies (see Table II) are considered in this research. The search and snowballing process is summarised in Fig. 1. Note that from this section onward, we refer to the selected studies using their chronological study number as P1, P2,..., P35 to explicitly highlight and distinguish included studies from other references (refer to Table III).
As all the works included in this study are from leading academic dissemination fora (publishers/conferences/journals), where they have already been peer-reviewed for quality by expert reviewers, we did not see the need to re-iterate with ranking based on quality review for these papers, beyond the quality assessment criteria already performed. It is nevertheless important to emphasise that different sources, even among those subject to peer review, will not present with the same quality level. Indeed, it has been observed that there exists no standard definition of quality for software engineering research studies [92].
As a confirmative step, we examined works in the 2011-2014 time frame as a means to reduce the risk that any significant earlier work was not incorporated into this systematic review. The original search string was explored using Google Scholar. A total of 19 additional papers were discovered in this step. Nine studies do not relate to monolith to microservices decomposition at all and were excluded. Five studies are thesis reports, where two reports are written in languages other than English. The five studies are excluded because of EC3. The other three results are book publications excluded due to EC2. Only two studies discuss monolith and microservices in the software engineering context: the first covers microservices development with a brief mention of monolith partitioning [93] and did not satisfy IC2. The second study presents the experiences and lessons learned in incremental migration and refactoring [94]. A revision of this second paper is published in 2016 and it is therefore covered in the 2015-2021 timeline.

C. Analysis of the Data
The analysis of data and the steps (Refer Fig. 2) that are employed to extract the data are discussed in the following sections.

1) RQ1: Grounded Theory Components:
The M2MDF analysis was primarily based on memoing, coding, constant comparison and theoretical sorting on the selected literature, proposed in [95], [96] as part of a grounded theory (GT). Given the diverse, informal, and diffuse nature of the initial memos and codes, it would be impractical to list those memos and codes in the replication package, and ultimately of limited utility to the reader due to their overwhelming number. In addition, readers interested in replicating the approach in a truly GT fashion should not be guided/constrained by our codes. Hence, instead we present a snippet of a selected paper [P24] with the associated memos and codes, to give the reader a feel for our approach.We wish to highlight that given the nature of the study, it would not have been appropriate to apply a full grounded theory process: GT activities such as theoretical sampling and theoretical saturation, for example, would have little meaning when applied to a pre-defined dataset (the 35 papers).
Obvious and Inviolable Element Identification: It should be noted that certain migration elements were obvious and inviolable (even if they had not been assembled in an end-to-end framework up to this point). For example, it is clear that monolith data collection is required. Analysis of that data, and identification of the resultant microservices are also fundamental migration steps. Beneath these (and other) migration phases, further subclassification was required. In some cases, these subclassifications were already explicitly provided in previous studies. For example, two of the four categories for data collection were previously identified in [97], while a further data collection category was identified in [P28]. A further example of the often-explicit presentation of sub-categories can be seen in [33], which explicitly identifies both Static Analysis (SA) and Dynamic Analysis (DA). [33] also identifies Model-Driven (MD) analysis which is also presented in our results, but where it is ultimately classified as Domain Analysis. In [33], we also note that Examples, Experiments and Case Studies are reported as the major categories for evaluation. These are noted elsewhere in the literature and directly feed into the derivation of these sub-categories.
Element Aggregation: Thus, the research is concerned with knitting existing, explicitly-identified and established categories into an end-to-end decomposition framework. Hence the elements of GT were combined with a light form of quantitative content analysis [98] in a mixed method. In studies where the information is not explicitly presented, we synthesised the appropriate phases and sub-categories based on GT, employing an analysts' lens of phases and approaches-within-phases, in the organisational spirit of [99]'s approach to GT. In studies where phases and sub-categories were explicitly identified, a light form of content analysis was performed where those phases and sub-categories were noted and quantified.
Iterative Refinement: At the end of each full-text review, this information was analysed, compared and assigned categories. A new category was assigned in cases where no existing category was deemed suitable. The framework was revised incrementally to accommodate the new categories and to refine existing categories.
2) RQ2-RQ4: To answer RQ2, RQ3, and RQ4 a dataextraction template was generated based, in part, on the emerging M2MDF, but also on the specific RQs. After application of this template to individual sets of papers, review meetings presented the results of the data extraction process to the wider group for discussion and review. While the data analysis activity was undertaken by the first author, the analysis outputs were reviewed by all authors. Full-text versions of all the studies were shared among all authors enabling them to assess the accuracy of the process.
Using the data collected from the data extraction stage, the data synthesis stage focused on comparing, organising and re-organising the extracted data in a format that answers the research questions. It is important to note that this process was also revisited by the group, every time a paper was reviewed. Fig. 2. Data analysis steps. The data analysis step begins with reviewing the full text of each selected paper step-by-step by applying GT components, identifying and aggregating elements. The process involves several iterations of reviews and presentation meetings resulting in a continuous refinement of the M2MDF and extraction of the data required to answer the research questions. A simplified excerpt of the memoing, coding and aggregation process is included at the bottom of the diagram.
After several revisions conducted during the literature review presentation meetings, we reached the final analysis result.

IV. LITERATURE REVIEW
Initially, Section IV-A presents an overview of the literature reviewed in this work. Thereafter, Section IV-B discusses the organisation of existing monolith-to-microservices decomposition research, as per RQ1. Then Section IV-C addresses RQ2 by focusing on the collection and analysis of monolith data, and on the resulting microservices identification and optimisation. Section IV-D reports on the evaluation of approaches and thus addresses RQ3, with Section IV-E briefly reporting on deployment of identified microservices to complete the description of the organisational framework.

A. Overview of the Selected Literature
In this subsection, the profile of the research included in this review is presented in terms of date of publication and publication venue.
1) Temporal Distribution of the Studies: On the evidence of our analyses, research in the decomposition of monoliths to microservices is gaining significant traction in recent years. Consistent with [47], since the time the term "microservice" was discussed in 2011, the number of studies has shown a steady increase from just a single paper in 2016, to three papers in 2017, Semi-automatic migration was first identified in 2016 [24] and since then, both semi-automatic and automatic decomposition approaches have been suggested. The early works propose solutions to decompose monolith applications to microservices using static, dynamic, and evolutionary approaches. Certain recent papers introduce genetic algorithms [21], [25] and Neural Network methods [62]. These recent contributions examine the use of feature evolution to extract microservices.
2 Of the 35 papers included in this review, 12 were published by IEEE, 12 by Springer, five by ACM, three by Science Direct, and one each in Web of Science, Wiley Inter Science, and the Association for the Advancement of Artificial Intelligence (AAAI). This analysis demonstrates that the publication of microservices migration research is fragmented across conferences ranging from generic software engineering to more niche web services events. This is perhaps not surprising as the core architectural migration challenge is fundamentally a software engineering one, but one that may be implemented in the context of taking greater advantage of web/cloud-based computing innovations. The maturity spectrum (x-axis) plots newer research areas further to the right, while the sequence of phases (y-axis) positions earlier phases closer to the bottom. For example, Model-based (MIC) data has been used for data input for some time and is utilised at the start of the monolith decomposition process (Phase I). In contrast, Version Analysis (VA) is a relatively recent research focus for decomposition studies, and it is part of Phase II, Monolith Analysis. This framework emerged from the literature following our approach discussed under Section III-C Analysis of Data particularly focusing on RQ1.

B. A Framework for Classification and Comparison of Monolith Application Decomposition
Existing decomposition frameworks focus on specific aspects such as monolith structure analysis [100], microservices validation [101], and microservices assessment [100], [102], but no single study captures all phases and techniques available for robust decomposition. In order to address RQ1, this research has systematically produced the Monolith to Microservices Decomposition Framework (M2MDF). 2 Given the methodology employed, it can be argued that the proposed framework significantly reflects the phases and approaches proposed across the incorporated studies. It also serves as a taxonomy for the classification of the proposed approaches. The conceptual phases of the framework are as follows: r Phase I-Input Collection. r Phase II-Monolith Analysis. r Phase III-Microservices Identification. 2 The M2MDF and the detailed analysis of the literature essentially co-evolved in this research. Therefore, the M2MDF could be presented either prior to or following the detailed literature findings. The authors have elected to present the M2MDF prior to the detailed literature review findings as it will aid readers in visualising the broader landscape prior to progressing to the details.
On the left side of Fig. 4, the major phases of a monolith to microservices decomposition are presented. Although indicated as a sequential process, some studies entirely skip certain steps. For example, Phase IV Microservices Optimisation is not observed in all works. By explicitly identifying these phases, the M2MDF framework, facilitates a scoped analysis of the individual works by logically partitioning the major activities associated with the end-to-end migration task. Along with the framework, we present a mapping of the studies when applicable (subsequently, in Fig. 5). Fig. 4 further depicts the relative maturity of individual techniques, where maturity relates to the relative newness of the technique in the context of monolith to microservice decomposition. Maturity in this context is viewed in light of software process maturity where a matured process is repeatable, defined, managed, and optimised [103]. Capturing this maturity perspective is not easily given to quantitative representation, especially as the adoption of certain techniques will inevitably overlap. Therefore, the maturity dimension presented in Fig. 4 should be viewed as indicative and not as entirely discrete.
However, in more-general terms, we can say that techniques closer to the left-hand side, such as domain analysis and static analysis have been under consideration for a longer time frame. Whereas techniques further to the right, such as the incorporation of version control data, are relatively newer.
1) Phase I-Input Collection: The decomposition efforts commence by acquiring data that describes the essential characteristics of the monolith application, for example, its domain model, codebases, log files, or code versions. Practically, many of these characteristics of the monolith, particularly domain models, are identified by prior reviewing of design documents, architecture diagrams, or by interviewing the custodians of the systems. However, input collection from monolith codebases, log files and versions are often supported by sophisticated tools and methods.
One or more of these inputs are used in practice, however, no single work has been identified that combines all these inputs. This observation is likely related to the large effort required to implement any one of these collection techniques and the significant differences of knowledge and skillsets required to successfully apply them. Theoretically, all the inputs could be used in combination by collecting data from domain-driven artefacts, the source/executable codebase, log files, and the different revisions of the monolith collected from version control systems such as Git. The combined use of these various data sources could be of significant importance, given that greater volumes of pertinent data hold the potential for improved understanding of the wider behaviour of a monolith ecosystem and could contribute to improved decisions concerning monolith decomposition.
2) Phase II-Monolith Analysis: The monolith analysis phase focuses on filtering and transforming the collected data into a representation that is suitable for the subsequent phases. This phase may adopt multiple stages of analysis including domain analysis and static analysis to extract the structural relationships between the artefacts of the monolith, while dynamic analysis and version analysis focus on enriching the relationship with frequencies and associations observed on the execution and the evolution of the codebase over time respectively.
3) Phase III-Microservices Identification: The microservices identification phase uses heuristics to guide the microservices identification process by partitioning the monolith application into suitable microservice candidates [104]. Clustering algorithms are widely used to extract microservices by representing the monolith data as a graph/matrix and treating the problem as a clustering problem that tries to identify artefacts that can be grouped as microservices without any prior awareness of the number or size of the resulting microservices.

4) Phase IV-Microservices Optimisation:
Some emerging studies adopted a two-phased approach to microservice identification that first generates a large pool of microservice candidates in order to subsequently select the optimal microservice partitions. Attempts to optimise microservice configurations in this way have employed genetic algorithms in addition to cluster-based microservices identification methods. Genetic algorithms assist in the identification of optimal combinations of packages, classes or methods within the monolith application using objective functions based on constructs such as cohesion, coupling, and semantic similarity. Evolutionary algorithms also employ heuristics to guide the optimal selection of microservices. This microservices optimisation phase is often valuable when transforming large parts of an existing monolith application into microservices, where the efficiency of microservices interaction and execution in the target system is conceivably of significant concern. Where just a single microservice is to be extracted for a specific purpose, this optimisation phase may not be implemented as no significant microservices orchestration may be required in the target implementation.

5) Phase V-Microservices Evaluation:
Evaluation of the success of monolith decomposition into microservices has received limited attention in the literature to date. Most often the assessment of the resulting microservices is informally reported by experts evaluating the proposed candidate microservices. In recent years, more structured evaluation approaches have been proposed including examples, case studies, and experiments. Example-based evaluation uses illustrative examples to show how the identification process is applied. In case studies, a target monolith is used to evaluate the migration process in detail by looking at relevant cases, whereas in experiments, selected codebases are migrated to microservices and experimentally evaluated, using a range of quality metrics such as coupling, cohesion, modularity, and evolvability.

6) Phase VI-Microservices Deployment:
The deployment phase focuses on the implementation of the proposed microservices candidate(s) to evaluate if the proposed technique is successful or the extracted microservices candidate(s) exhibit(s) the desired properties. Many of the existing works have not reported how the proposed candidate services are built from the monolith code and delivered as a microservice. While the previous phases can propose effective microservices, it still requires a huge effort by the programmers to compose, build, and repackage the candidate microservice, in order to provide a complete microservice offering.

C. Existing Approaches, Tools and Methods
This section explores RQ2 by examining the approaches, tools and methods reported in decomposition studies. This includes the data gathered, the analyses employed, and the microservice identification and optimisation methods used in the literature.

1) Phase I. Input Collection:
The collection phase involves the acquisition of monolith application data that can later be used to identify candidate microservices from within the monolith. The related works are first categorised based on the data collection methods, and thereafter on the type of data collected. a) Data Collection Methods: The data collection methods used in the selected studies are extracted into four categories: Model-based Input Collection (MIC) [97], Code-based Input Collection (CIC) [97], Version-based Input Collection (VIC) (also known as evolutionary coupling [P28]), and Log-based Input Collection (LIC).
MIC uses domain-driven design artefacts (e.g., data flow diagrams, entity relationship diagrams, and process models) to extract the features of the monolith application. CIC focuses on the collection of the structure and behaviour of the monolith using the source code or executable code of the application. VIC uses simultaneous or consecutive versions of the source code or domain-driven artefacts to extract features that are useful to characterise the monolith application (essentially because versioning data can be indicative of significant feature level contributions over time). Finally, LIC uses log files that are generated during the run time of the application.
One example of log files are performance logs, which contain information indicating method-level CPU and memory consumption (and perhaps additional data also) [70]. Alternatively, web access logs contain information such as client IP, requested resource URI, timestamp, HTTP status code, and document size [75]. In some cases, web access logs may, however, include just a subset of this data, for example the URI, document size and response time [82].
Additionally, there are studies that combine one or more of the methods, particularly a combination of CIC either with MIC (CIC+MIC) or with VIC (CIC+VIC). From the analysis, it is clear that CIC is the dominant approach. This research did not identify any study where VIC has been used independently. A summary of the distribution of the studies based on the type of collected data is presented in Table IV.
b) Type of Collected Data: Two types of data are prominent in the literature: Static Data (SD) and Dynamic Data (DD). SD focuses on the source code of the application reflecting the structure, whereas DD deals with the data generated during the execution of the monolith application. Data may also be obtained using MIC and VIC. Model-based Data (MD) focuses on domain-driven design artefacts, and Version-based Data (VD) focuses on the change history of the source code. While SD is the most widely used type of data (42.9%), it is also used in combination with DD, VD, and MD (refer to Table V).
A frequent combination is the use of SD and DD (SD+DD) to increase the understanding of the monolith application and raise the efficiency of the microservices identification process. In [P2, P6, P10, P13], static data is used to represent the system with a call graph where the nodes represent components and the edges represent dependencies between them. Then, operational data is collected at runtime to identify dependencies and compute the weight of the edges. In [P16], source code packages are mapped to bounded contexts using static analysis, while dynamic data is employed to refine these contexts by computing their associated runtime frequency. These runtime frequencies can then be used in a postmortem analysis to further discover new candidate microservices.
In a broadly similar fashion to [P13], [P31] uses static code analysis to represent a call graph of the monolith application, subsequently overlaying dynamic information to indicate the frequency of edge execution. This tendency to combine different types of data is confirmed in earlier research [33], but in our analyses, we note an increase in combining not only static and dynamic data but also static and version-based data [P28, P32]. Some correspondence between the data collection methods and the type of data collected is observable. For example, although all the methods that use MIC collected MD data, CIC methods are used to collect either SD or DD. Furthermore, DD collection is not restricted to CIC, but it is also collected using LIC as in [P17, P27] where log files are used.
2) Phase II. Monolith Analysis: Monolith analysis strategies are essentially comprised of three distinct perspectives. The first perspective concerns the type of analysis employed, the second perspective is the desired unit of analysis (for example, classlevel or package-level), and the third perspective, depending on the type of data and unit of analysis, concerns the various forms of data representation that may be employed. This section outlines the findings with respect to all three perspectives. a) Type of Analysis: The studies included in this SLR suggest that four major analysis types may be used in isolation or in combination: Domain Analysis (DomA), Static Analysis (SA), Dynamic Analysis (DA), and Version Analysis (VA). These analysis types correspond to the type of the collected data. Although log analysis appears in some of the studies [P12, P17, P27], the analysis employed is similar to dynamic analysis. Thus we treat it as a subset of the dynamic analysis.
DomA focuses on the analysis of various models of the monolith application that have been employed during the requirement analysis and modelling stage of software development [105], [106], [107], [108] (or those extracted using reverse engineering methods). DomA includes data flow diagrams, activity diagrams, use case descriptions, entity relationship diagrams, and other UML artefacts that describe the process and intent of the SA is sometimes referred to as static code analysis and is performed by studying the (non-executing) source code of an application [109], [110]. Typically, supporting software-based tooling is employed. Fifteen of the studies relied entirely on SA, with an additional nine studies using SA in combination with DA, VA, or DomA. This indicates that SA is a prevalent and important choice for microservices identification. SA is fundamentally aligned with static data (SD).
The essential difference between SA and DomA concerns the fact that DomA analyses the monolith using models employed to structure/build the source code rather than directly analysing the actual source code or associated executing programs. In areas outside of monolith decomposition [111], DomA has been used to refer to a SA technique that involves analysis of the flow of data in a program source file. 3 DA is sometimes referred to as dynamic code analysis and involves the examination of the properties of a program while it is executing [112]. DA aids the understanding of the runtime behaviour of an application. DA of monolith applications can be online (where the program is examined as it executes) using code instrumentation features provided by IDEs (such as AspectJ Runtime, AspectJ Tools, and AspectJ Weaver) [113]. DA may also be conducted offline, where the runtime data is collected and stored for later analysis (for example, in log files).
DA collects rich information about executing programs, including the thread id, calling class, called class, calling methods, called methods, method parameters, timestamp, and database queries. Where DA is performed on non-object-oriented programs, collected data may refer to functions in place of methods and classes. Although DA is getting more attention recently, the published literature suggests that it remains less utilised. Only 20% of the studies used DA exclusively, with a further 20% using it in combination with SA or DomA. This could be due to the difficulty of collecting the dynamic data during the execution of the monolith and/or the overhead it introduces to the running monolith [113].
VA examines monolith versions to learn how the monolith was extended or changed over time. Although no study applied VA exclusively, VA has been applied in combination with SA. For instance, SA can produce a call graph that is supplemented with evolutionary coupling data (the code that was co-committed) [P28]. An alternative to this approach involves injecting VA into early-stage SA while the monolith's graph is still under construction [P32]. This alternative approach shares some conceptual underpinning with [P13] in that, artefacts of the monolith are analysed using source code repository information to identify items that appear to co-evolve or evolve according to a similar timeline. Using this approach, increased co-evolution can raise the edge weighting between the affected items.
[P16] combines SA, DA and DomA. Process documentation and interviews, along with context diagrams, use cases, domain context and context maps are first used to identify context boundaries. Static analysis and dynamic analysis are then applied in order to further refine the bounded contexts. Although 3 https://glossary.istqb.org/en/term/data-flow-analysis this approach combines three of the four identified analyses, the effort required to conduct such detailed domain analysis may render it a less attractive solution (this may be particularly relevant for larger monolith applications).
b) Unit of Analysis: Existing research on microservices decomposition uses a wide spectrum of granularity when extracting microservices. Unit of analysis in this review refers to the smallest unit of the system a study uses when seeking to identify microservices. Units can be as small as individual methods or functions, ranging up to classes and packages. An additional unit of analysis could involve, in the case of domain analysis, software artefact-based units. These could, for example, include use cases and data stores. Table VI presents an overview of the unit of analysis adopted by the studies identified in this review, with the following paragraphs providing a summary of the different approaches.
Method-level analysis explores the use of methods as the base unit of analysis, thereafter extracting microservices by recombining highly-related methods [P2, P6, P14, P31, P34]. In this approach, one method from a given class could be merged with methods from other classes to make a new microservice. However, method-level analysis, due to its fine granularity, requires relatively large hardware computing resource support. The implementation of method-level microservices may also require additional validation and testing, particularly when methods are extracted from different classes and merged into a new service.
This validation process includes, initially, ensuring the behaviour of the method when it is run outside of its class and, subsequently, ensuring how the composed methods function when they are used in combination with methods from other classes. Existing research has not yet addressed these challenges, however, method-level analysis can potentially produce highlyoptimised microservices as it can cross class boundaries, thus identifying chains of related methods. Applying method-level analysis, it is possible to extract individual methods as microservices, which may be particularly well-aligned with Functionas-a-Service (FaaS) based cloud computing [14]. Certain utility methods, for example a method that provides a payment capability across a monolith, could also be identified using method-level granularity.
Class-level analysis elevates the base unit of granularity up to the level of a class. Given that classes perform a vital role in established monolith programming languages, for example, Java and C++, and that the computational analysis cost at a classlevel is likely to be considerably less than at a method-level, it is perhaps unsurprising to discover that class-level analysis granularity is widely adopted in the literature [P1, P4, P5, P8-13, P15, P18, P20, P22, P24, P26, P28, P29, P30, P32].
Using class-level analysis, classes within a monolith are clustered based on several criteria including coupling and cohesion [P10, P21, P26], topic modelling [P1, P5], and business object modelling [P9, P30], leading to the identification of candidate microservices. They may be represented as nodes in a call graph, with edges in this case representing various object-oriented constructs, including object references, inheritance relationships, and evolutionary couplings. Later, this information is used to cluster closely related classes into candidate microservices.
Package-level analysis, as implemented in [P7, P16, P19, P25], adopts a higher level granularity than class-level or method-level analysis. Source code packages (SCPs), which may be comprised of large numbers of classes and associated methods, are used as the base unit of analysis. The general approach with package-level analysis involves identifying packages that can be grouped together as microservices. Packagelevel analysis does not seek to split packages when projecting candidate microservices. This may be advantageous where packages are highly decoupled and exhibit high internal cohesion. However, if packages are very large or externally dependent for end-to-end service completion, the microservices resulting from this approach could be prohibitively large in size. For well-managed codebases where larger services are preferred, package-level analysis could offer a fast-track to miniservice or macroservice [114] identification. In such a scenario, productlevel decomposition risks may be reduced as there is less fragmentation of the existing monolith.
Software artefact-based analysis [P3, P21, P33, P35] involves the use of artefacts for candidate microservices identification. Although the term software artefact refers to various by-products of a software development process [115], [116], in the decomposition context, it mainly refers to documents such as data flow diagrams, use cases, process models, and ER diagrams [P21, P33]. Source code and executable files are not included in this type of analysis. This is the least clear of all the proposed analysis approaches, perhaps because the mapping between actual source code classes and meta-level artefacts is not always obvious. Indeed, meta-level artefacts may not be produced in all settings, or they may become inconsistent with the codebase over time.
Analysis based on Uniform Resource Identifiers (URIs) [P17, P27] occurs in the context of web-based monoliths. URIs represent different resources available to a system. With URI-based analysis, a monolith is decomposed based on its association with, and the frequency of its association with, URIs at runtime. Finally, one paper [P23] has not explicitly reported the unit of analysis except indicating that it used software components. c) Input Data Representation: The input data is represented using a variety of data structures. For example, several studies (51.4%) [P2, P3, P4, P5, P9, P11, P13, P19, P22, P25, P26, P28, P29, P30, P31, P32, P33, P35] have used graphs for input data representation. Graph nodes are used to represent granular elements such as methods, classes, packages, and processes, while edges represent the call relationship between the nodes. Both directed edges (that indicate the flow of information or the direction of the call) and undirected edges are employed to represent rich data. The edges may carry further information including static and dynamic call frequencies or weights, process flows, and similarity and distance metrics. Graphs can also be used in combination with topic modelling that uses bag of words 3) Phase III. Microservices Identification: This section presents the microservice identification tools reported in the literature. Thereafter, the microservice identification methods are presented. These generally fall into two categories: rule-based approaches and cluster-based approaches.
a) Microservices Identification Tools: Some preliminary work has been carried out to build tools for microservices decomposition. One such tool is Service Cutter [P35] which has been used not just for microservices identification but also for initial attempts to benchmark effectiveness in [P13, P21, P33, P34]. Various other tools have been developed explicitly towards a microservice extraction agenda using static and dynamic data. Kieker [P12, P24] extracts dynamic data from Java applications [117], Dbeaver [P16] is used to extract database tables accessed by a monolith at runtime, ExploreViz [P16] is a tool for 3-D visualisation of candidate microservices based on static and dynamic data, and OpenAPI (formerly Swagger) is a language-agnostic interface that is used to define and extract web service interfaces. are also used to assist in the extraction of microservices. Each of these tools supports specific extraction algorithms which are identified in the respective papers. These data collection tools are typically built to support specific programming languages (usually Java) and are therefore not immediately applicable to codebases in other languages. Additionally, various in-house algorithms have been developed as part of the studies [P9, P18, P24, P30] and have been utilised in the related literature. This demonstrates that software-based tools are an important dimension when decomposing monolith applications into microservices.
b) Rule-Based Microservice Identification Methods: Rule-based identification methods involve varying degrees of direct human involvement and engagement in the process by providing some guidelines or rules about how to partition the monolith source code. Although it covers only 11.5% of the studies, rule-based methods [P11, P19, P21, P33] are observed in model-driven approaches, with the rules being defined based upon prior knowledge of the application domain and the experience of the decomposition engineers, such as in [P19, P21, P33]. Rules make extensive use of dataflows, use cases, and data stores as input, which could produce high-quality results. In [P33], the authors formulate four rules to apply on purified data flow diagrams. These include, for example, rules related to the number of inputs, and operations included in a single business process.
Although capable of producing microservices with finegrained service boundaries [P21], rule-based approaches require significant manual human effort owing to their high dependency on expert knowledge. This limitation may render rule-based approaches less attractive for the decomposition of larger and more complex codebases. Rule-based methods have also been used in combination with SA and DA, for example in [P11, P16], where abstract syntax trees are analysed, followed by manual inspection of the resulting candidate microservices.
c) Cluster-Based Microservice Identification Methods: We use the term cluster-based microservice identification methods to refer to various unsupervised machine learning algorithms. These algorithms are widely observed in the literature, and they tend to favour different forms of statistical clustering. Many clustering techniques have been proposed, including hierarchical, K-means, affinity propagation, Girvan-Newman, fast community, SArF, Minimal Spanning Tree (MST), Epidemic Label Propagation, collaborative clustering, and the Hungarian method as in [118]. While variants of hierarchical clustering algorithms are used to generate dendrograms that identify the cutting points for the desired number of microservices, some require prior selection of the cluster size, but others determine the cluster size automatically.
Heuristics-based clustering, as used in [P30], defines two rules related to subtype and common subgraph (a step which enables the grouping of call graphs). The CO-GCN3 (Clustering and Outlier-Aware Graph Convolution Network) approach has also been applied to cluster classes [P4]. This demonstrates that neural network (NN) based approaches are considered to have some value in microservice identification. Indeed, it is claimed that NN-based approaches can present improved results when compared with the widely used clustering variants [62]. Other contributions use genetic algorithms [P2, P12] to partition the classes into cohesive but loosely coupled clusters. This involves rearranging the groupings over a number of generations and mutations. Topic modelling, using Latent Dirichlet Allocation (LDA) and Seeded LDA (SLDA) algorithms, have also been applied in dependency graph analysis and topic detection for microservices identification [P25].
An overview of the reported unsupervised identification algorithms is presented in Table VII, which demonstrates that a wide variety of algorithms have been applied to identify microservices from monolith applications. However, it should be noted that the use of clustering algorithms for the identification of microservices from large-scale monolith applications has not yet been explored in-depth in the literature. This may be due to the lack of resources and access to large-scale (millions of lines of code) enterprise systems. Thus, further research is warranted to determine the efficacy of clustering based analysis for larger systems, especially when method-level granularity is selected.

4) Phase IV. Microservices Optimisation:
Although microservices optimisation can be integrated into the identification phase, recent studies have treated the optimisation method as a distinct phase [119]. Unsupervised clustering renders multiple microservices candidates which can be used as input to an optimising algorithm, for example, a genetic algorithm. Furthermore, fitness functions have been proposed as way to evaluate proposed microservices. a) Optimisation Algorithms: Distinct from the clustering algorithms applied to identify candidate microservices, genetic algorithms have been utilised to refine optimal combinations of software units (for example, classes) in pursuit of higher optimisation in the target microservices system [P12, P24]. Optimisation algorithms are less popular in the studies: 29/35 (82%) studies have not used them. However, they are gaining popularity in recent studies (in the past 3 years) where three studies [P12, P24, P26] used NSGA II (Non-dominated Sorting Genetic Algorithm II) [120], one study [P4] used NSGA III, and another study [P23] used a combination of NSGA II and SPEA II (a Strength Pareto Evolutionary Algorithm) [121]. The Study that used CO-GCN3 [P4] has employed the ADAM optimiser [122].
Genetic algorithms have shown improvement in identifying microservices when used in combination with clustering algorithms [P24, P26]. However, a recent (and still ongoing) study suggests that the use of NSGA-III in multi-objective optimisation could contribute to efficient microservices candidate identification without going through the clustering step [119].
b) Connectivity Fitness Functions: Structural and Conceptual Intra-Connectivity and Structural and Conceptual Inter-Connectivity fitness functions are used to assess the optimisation process in [P24]. Structural Connectivity is calculated based on the number of edges between classes in the so-called functional atom groups, whereas Conceptual Connectivity is calculated based on shared terms in the class identifiers. Fitness has also been evaluated on the basis of CPU and memory consumption where candidate microservices are extracted at a class-level granularity using NSGA-II [P12].

D. Phase V: Microservices Evaluation
In this section, we investigate the metrics, datasets, and benchmarks that are used for evaluating and validating the extraction process and its results. For the purpose of this work, we classify evaluation efforts into two categories for discussion: the metrics employed for evaluation and the benchmarking systems used as the basis for the evaluation of datasets.
Evaluation metrics allow us to estimate candidate microservice qualities, by characterizing and quantifying their relevant features. These metrics enable architects to understand, compare, and improve candidate microservices and decomposition methods. In contrast benchmarks and datasets refer to the systems and system artefacts chosen as the subject matter for the decomposition.
The majority of the studies (88%) have reported the evaluation of their resulting microservices. 71% of the studies have used some evaluation metrics, whereas 17% reported the use of manual evaluation. All identified papers from 2021 [P1-P9] have used metrics to evaluate their results. But this was not the prevailing case in earlier research which incorporated higher levels of manual evaluation [P29, P33, P34, P35]. Only four studies did not report the evaluation or do not evaluate the resulting microservice candidates at all. 1) Evaluation Metrics: Analysis of the literature suggests that evaluation metrics are comprised of coupling metrics, cohesion metrics, non-functional metrics, and cluster size metrics.
a) Coupling Metrics: The most prevalent and stable metrics with long-lasting literature support in the software development domain are coupling and cohesion [57], [123], [124]. While coupling measures the degree of dependence between different microservices [57] Coupling and cohesion metrics in combination have become the most widely used metrics in recent publications [P2, P3, P4, P5, P15, P18, P21, P24, and P30]. Performance related metrics are also gaining popularity [P6, P7, P12, P18, P23, P27, P30], particularly in evaluating the performance of the candidate microservices in cloud environments. Although there is a rising interest in using these metrics, there are subtle differences in their calculation particularly when they are used along with data collected from static, dynamic, and version-based datasets. A list of the metrics that are used by two or more studies is provided in Table VIII. The asterisk (*) in the "Studies" rows indicates that the study has used additional metrics that are not used in other studies.
2) Evaluation Benchmarks: Various codebases are available for the purpose of evaluation, and different validation methods have been utilised in different studies. It is also the case that differences in programming languages presents as a distinct feature of evaluation efforts. a) Evaluation Codebases: For consistent evaluation of proposed decomposition methods, the availability of datasets and benchmarking data is crucial but this is potentially the least explored area [127], [128]. The selection of monolith applications for experimental purposes is driven by the availability of monolith source code (often Open Source Software), the equivalent microservices, and access to the monolith programming environment. Due to the absence of published and agreed-upon  Whilst at an earlier point, case studies were reported to be the primary validation techniques [126], it is clear that experiment-based validation is both of growing interest and now the prevailing method. This could be attributed to the progress made in the last few years towards the availability of widely applicable metrics and benchmarks, enabling comparison of different decomposition efforts. 2% of studies using one or more Java-based applications, there is clearly a strong tendency for decomposition studies to incorporate Java codebases. This may be because many of the existing decomposition tools (i.e., for dynamic and static data collection and analysis) are designed for use with Java [113]. Whilst this is somewhat appropriate given the prevalence of Java-based systems, it is also an example of the "researching under the lamppost" problem [129]. As a result, legacy technologies with possibly the greatest need for translation into microservices are studied less.

E. Phase VI: Microservices Deployment
Deployment focuses on the implementation of the extracted microservices and studying whether the identification process achieves its objectives. We found just a single study [P27] that addresses this deployment phase but it does not discuss the deployment process in detail. In the examined studies, the lack of technical coverage on the deployment of the extracted microservices leaves a major gap regarding the suitability for production environments. This could be addressed, in future research, with industrial case studies where monolith applications are transformed into microservices and their quality evaluated in a production environment. Finally, Fig. 5 presents the mapping of the studies relative to the M2MDF diagram.

V. GAPS AND FUTURE RESEARCH
This section answers RQ4 by presenting the research and practice gaps in the area of monolith decomposition into microservices. We categorise these gaps according to the phases of the M2MDF.

A. Phase I: Collection Gaps
Existing research exploits MIC, CIC, LIC, VIC, or a combination of these data collection methods. These collection methods  IX  SUMMARY OF RESEARCH GAPS AND FUTURE RESEARCH DIRECTIONS extract SD, DD, MD, or VD to represent the structure and behaviour of the monolith application. The data provides crucial information regarding monolith implementation and organisation, however, this research suggests the following gaps.
i) MIC approaches require significant manual input and experts with a detailed understanding of the monolith. For large enterprise applications, this may prove prohibitively slow or expensive. There would appear therefore to be significant opportunity for automation of domain-driven artefact collection. This process could benefit from studying the decision-making process of experts to understand the tasks they conduct and the artefacts they focus on when they try to identify microservices. It is important to note that such artefacts may not accurately (or even approximately) reflect the actual system.
ii) Although some studies address the issue to an extent [P13, P31], data collection methods that efficiently combine static, dynamic, and evolutionary data are not yet fully employed. This is further related to the absence of a common formal representation of the input data and its core content. Up to this point, call graphs extracted from static data have been somewhat enriched with dynamic data using the frequency of the artefacts involved in the call graph. Some papers compute the static and dynamic data separately, while others combine the two into one measure (usually in the form of a weighting). Combining collection techniques can improve the quality of the microservices identification process since additional perspectives bestow increased monolith knowledge (as identified in other domains [130]).

B. Phase II: Analysis Gaps
This research has identified the following gaps in relation to analysis phase.
i) When employed in isolation, dynamic data collection cannot guarantee full code coverage or that all operational use case scenarios have been catered for, with the result that the sole use of dynamic analysis carries increased risk of unintended operational issues in target microservices systems. To address this problem, one study [P24] proposes the use of comprehensive functional tests that ensure full functional coverage of the monolith code. However, this has not been systematically tried and tested. Furthermore, the design and implementation of tests conveying complete functional coverage can be very expensive [131] and error-prone. Furthermore, it is foreseeable that unimagined operational use case scenarios might be accidentally overlooked. Techniques to remedy this problem should be examined in future research.
ii) Criteria for determining the appropriate unit of analysis are not presently available. Researchers and practitioners would benefit from guidance in this area. For example, method level analysis may be preferred over package level analysis if the resulting microservices are to be relatively small in size but the trade-off in complexity and monolith size need to be considered.
iii) The analysis approaches identified in this work do not examine the efficiency of existing business processes. Rather the research focus is on a technical evaluation of the monolith to identify candidate microservices. A monolith-based system that is in use for many years may not be optimised in respect of the business processes that the monolith supports. For some firms, the process of pivoting to a distributed architecture from a monolith system may require additional business-level concerns. One of these additional concerns may be the redesign of business processes delivered in the monolith system. This however is outside the scope of the current research.
We found no evidence of the application of supervised machine learning approaches and reinforcement learning methods. The absence of sufficient training data to support supervised learning methods is a major constraining factor. A further inhibitor to supervised learning techniques stems from the absence of a generalised conceptual representation of the decomposition model and process. Input data and expected output data formats and structures would need to be developed in order to support future supervised learning techniques.
However, it should not be inferred that these techniques have no potential value in this space. For example, supervised techniques might increase the opportunity for human involvement in the decomposition loop. This human agent could be an established expert for the monolith under examination, and as such, could raise the quality of decisions reached. Owing to their potential as complementary or independent techniques, supervised methods and reinforcement learning techniques have been identified as future research directions.

D. Phase IV: Optimisation Gaps
The optimisation phase has to-date received only modest attention. NSGA-II [P12, P23, P24, P26] and NSGA-III [P4] [119] have both been employed. Defining optimisation functions that represent the desired characteristics of the final output requires further research, along with supporting metrics. A neural network-based approach has also been proposed [P4], perhaps indicating that deep learning algorithms hold some promise in terms of microservice candidate optimisation. Research on the use of such approaches in the decomposition process, either in combination with clustering or in isolation, can be considered to be in its very early stage. Nevertheless, advanced evolutionary approaches may become more prevalent in decomposition projects as the associated monoliths become larger and more complex.

E. Phase V: Evaluation Gaps
Having identified candidate microservices, evaluating their suitability is a vital task. Various supporting metrics have been proposed, including coupling and cohesion measures. A variety of evaluation methods have also been utilised, including experimental validation, case studies, and manual evaluations. However, current evaluations of candidate microservices generated during monolith decomposition exhibit a number of gaps.
i) There is an absence of datasets. Studies have mostly used small or medium scale (typically less than 200,000 lines of code) open-source applications, mainly developed using Java. Although Java is among the top programming languages used to build enterprise applications, C/C++ and Python are also widely used in this space [132]. COBOL, which accounts for billions of code lines in large monolith systems [133] has not received the attention it warrants. Furthermore, although recent papers show diverse test cases in their experiment [P5, P8, P24, P32], the limited overlap of test cases across the studies undermines robust evaluation. JPetstore is used in six studies, with DayTrader, Acme Air and PetClinic each used in four studies.
ii) There remains a requirement to determine appropriate decomposition evaluation metrics. Coupling, cohesion, and many other metrics are used frequently for the identification and evaluation of microservices [134]. But there is considerable variation in specific metric definitions. Standardised evaluation metrics together with benchmarking datasets are essential for candidate microservice evaluation. The availability of such resources would also enable supervised machine learning approaches by providing essential training data.
iii) Several unsupervised learning algorithms have been proposed, including variants of hierarchical clustering [P8, P15, P22, P24], kmeans [P6, P18, P27, P31], and Girvan-newman [P13, P35]. However, a comparative study of these algorithms when applied to monolith to microservices decomposition is not available. This frustrates the work of researchers and practitioners, especially in terms of algorithm selection. This gap needs to be addressed by making experimental data and the source code openly available (as in P24) to encourage comparison.
iv) The use of semantic similarity (sometimes referred to as conceptual similarity) is increasing in recent years [P6, P18, P24, P32, P34]. Thus, there appears to be opportunity for further use of these techniques as an enabler of program understanding, for example by integrating semantic similarity measures such as Wu & Palmer [135], Lin [136], and Resink [137].

F. Phase VI: Deployment Gaps
To date, just a single study considers the deployment of extracted microservices [p27]. This is undesirable because the ultimate test of microservice fitness will only arise once deployed. Through examination of deployments, errors and inefficiencies in microservice extraction methods can be identified.
Though not strictly a deployment concern, there is also an absence of end-to-end tooling to automate (or partly automate) the identification, extraction and deployment of microservices. Such tooling might support the selection of collection methods with different granularity levels (method, fragment, class, package). It might also support the selection of various other key considerations, including the clustering approach and the optimisation methods. A further capability of an end-to-end tool might permit the visualisation of resulting microservices candidates, along with quality metrics to compare the candidate microservices. Initial tooling implementations such as Service Cutter [24] and Kieker demonstrate the potential in this space, but much work is yet required.

A. External Validity
The primary threat to external validity is related to the selection and inclusion of primary studies, and to the search time frame which covers the period 2015-2021. As a result, selected studies may not be fully representative of the state of the art in the decomposition of monolith applications to microservices. Furthermore, and although the authors involved in the activity were engaged full time on the associated research programme and therefore well versed in the topic, the selection of studies for inclusion was based solely on consensus. Had the selection process incorporated inter-rating to record and formally process the individual author evaluations, the process as a whole would be both more robust and more transparent. To mitigate the potential impact of threats surrounding study selection, search keywords were carefully developed, evolved, and the major computer software databases were included in the scope of the searches.
Three search strings were used and the results were systematically combined. Alternative search platforms such as (Publish or Perish, 4 google scholar and semantic scholar) were used to identify potentially omitted articles. In addition, the SLR methodology was faithfully applied and later stages subject to group review. Likewise the analysis was subject to group review and snowballing was used to expand the paper collection. In combination, these methodological steps significantly reduce the possibility that papers were missed or that some significant earlier contributions were overlooked. As a subsequent mitigation following completion of the SLR, we performed a targeted literature search in the period 2011-2014. No additional works of relevance were identified in this subsequent search.
Supplementary materials, available online, are likely to exist, especially in the non-academic sphere and future work could look to integrate any industry-led innovations in this important research area. Industrial applications have been proposed to address monolith decomposition into microservices, for example IBM's Mono2Micro tool 5 and Amazon's AWS Microservice Extractor for. NET. 6 However, the technical implementation details for these industrial applications are not publicly available.
Patents and grey literature are also not within the scope of this work and if examined, they might contain additional relevant material. Although the inclusion and exclusion criteria were strictly applied, their application in Refinement Steps 1 and 2 was conducted solely by the first author, as was the snowballing element of Refinement Step 6. Had additional authors been engaged in these steps, it would have reduced the effect of author selection bias.
In terms of the quality of the selected works, only studies that underwent rigorous peer-review were included, where leading academic publishers were the central focus of the search. These included IEEE, ACM, Springer, and Science Direct. And the review protocol should largely assuage the affect of subjective paper selection decisions, as studies relevant to the topic were selected by the consensus of the majority. Furthermore, identified works were included in the research where they fell within the search timeline, providing a focus on more up-to-date and state-of-the-art work. To the knowledge of the researchers, all works of central relevance to the research objective and within the search window have been included.
Finally, and although appropriate techniques from Grounded Theory were employed to systematically identify the major phases and their application sequence in a monolith to microservices decomposition, it is inevitably the case that universal agreement on phase boundaries, naming and sequencing, will not be possible given the broad nature of the problem and the fact that individual firms and research efforts might design alternative and specific decomposition pipelines. Nevertheless, this work is the first published study that the authors are aware of that has attempted to systematically identify and classify the general problem space as presently reported in the prominent peer review literature to date.

B. Internal Validity
It is important to highlight that technology solutions -such as is the focus of this work -may ultimately only present a partial solution to a difficult challenge. At various points in our work, we have noted the important role of human experts as part of the decomposition process. We do not envisage an immediate future devoid of this important human contribution. Rather, as codebases to be migrated inevitably present as larger and more complex, the role of supporting technology may grow to support humans charged with this important strategic activity.
The transition from a monolith architecture to a microservices architecture can enable certain key strategic objectives, including increased frequency in the delivery of new features with less impact on an operational system (sometimes referred to as low perturbation). The realisation of such objectives is constrained by the cost of decomposition projects and the quality of the resulting microservices-based system. And while microservices architectures deliver certain desirable benefits for certain market offerings, they are not the only approach that may be adopted: more generalised and larger granularity service-oriented offerings, such as so-called modular monoliths may be a more pragmatic alternative in certain settings. However, even for modular monolith transitions, the technologies and phases identified in this work are likely to be largely relevant.
This research has created and applied the M2MDF as a means to compare existing techniques adopted in up-to-now largely disparate migration studies. The framework itself is built based on steps derived by inspecting all the implicit and explicit decomposition steps used in the selected papers, using aspects of grounded theory. Although the framework has been systematically derived, it is possible that other research efforts might have reached different classification decisions.

VII. DISCUSSION
This systematic review has produced various important contributions for practitioners. While microservices architectures have been in use for some time, the task of decomposing a monolith application into microservices can vary significantly from context to context. Furthermore, with the passage of time, improved tooling has enabled the part-automation of aspects of the process. No previous work has been identified that consolidates all these concerns into a single work and mapped the various existing contributions into a decomposition framework. This is one of the substantial contributions of this research, and it may be adopted by both practitioners and researchers to effectively explore the decomposition challenge in the context of the end-to-end process and the existing literature.
One of the greatest challenges of any monolith to microservices decomposition effort can be identified in the fact that there is uncertainty regarding the many aspects of the task. For practitioners more accustomed to designing, building and maintaining monolith-based systems, learning is required to understand the nature of microservices architectures. However, this is just the start of the decomposition activity, and a great deal of technical innovation may be required to effectively execute a decomposition project. The material presented in Section IV-C provides clear and categorised approaches to the various elements of the decomposition. In Table IV, the different methods for monolith data collection are presented, with Table V presenting the different types of analysis that may be employed. In both sections, there is a rich variety of techniques available and no single study has employed all of them. Indeed, economics may prevail in practice, and practitioners may need to choose which techniques they can deploy in line with budgets and timelines. The key point here is that all the known techniques are presented in the M2MDF, and therefore, practitioners can quickly evaluate their options without the need for an extensive literature review.
A further challenge for practitioners arises from the decision regarding the unit of analysis to be employed. In the context of a Java system, this could be at a package level, or it could be more granular, including down to the individual method level. This is an important consideration for practitioners as the unit of analysis will affect the size of the projected microservices. Packages may initially present as intuitively appealing fragmentation points (assuming low coupling across package boundaries), but in practice this might result in relatively large microservices that may not be suited to all manners of hardware deployment. For example, large microservices may not be suited to a Function-as-a-Service paradigm [14]. Table VI presents a summary of the identified units of analysis and maps them to individual studies.
Many different techniques can be employed when attempting to identify microservices in data collected from monolith systems, and a variety of tools exist to support this activity. Section IV-C3 addresses these dimensions, identifying the tools reported in the literature and systematically elaborating the various different algorithms that can be utilised (refer to Table VII). The provision of this information in one single source consolidates the field to date and will assist practitioners in quickly navigating the available tools and algorithms to support monolith decomposition into microservices.
Evaluating the effectiveness of a proposed decomposition is perhaps the most elusive problem in the migration agenda. As presented in Table VIII, there are a various evaluation metrics in use across many different studies. While these metrics will be of use to practitioners, the broader challenge of devising a means to examine decomposition effectiveness is a task perhaps best suited to the research community. The materials presented in Section IV-D summarise the accumulated work to date which is very much fragmented. But it is nevertheless a starting point for future research seeking to work towards standardised techniques for evaluating the effectiveness of proposed microservices.
Some promising work on benchmarking has been conducted and it is presented in Section IV-C3. However, this work has some significant limitations, especially in the context of the programming languages examined. Furthermore, the absence of clear guidance on what constitutes an effective monolith decomposition contributes to challenges in this space. It is in these areas that future research can make important contributions. Without a clearer understanding of these concerns, it will not be possible to have robust and consistent examination of proposed decomposition techniques and their effectiveness in comparison to existing reported decompositions. Although this is a particularly challenging aspect of the problem domain, in the fullness of time more elements of the decomposition task may become automated and therefore, it will be essential to have consistent methods of evaluation (because the role of manual human input may be reduced).
Beyond evaluation and benchmarking, to date only modest attention (a single study) has examined the microservice deployment dimension. Future studies, especially in practice, should actively examine the deployment phase. Microservices should be both deployable and effective once deployed, otherwise the substantial effort to identify microservices might be undermined.
While this research demonstrates that static analysis is the most commonly used analysis technique (refer to Section IV-C1), this does not necessarily imply that it is the most effective approach. Rather, it may be the case that static analysis is more accessible. Various tools exist to support static code analysis, it is a long-established field. Furthermore, there is no requirement to install a monolith system and generate traffic through it in order to obtain static data. In contrast, to obtain dynamic data, a monolith system must be executing and processing requests. This may require the isolation of test bed infrastructure and the execution of test cases. Although there is a greater set up time and cost associated with the collection of dynamic data, the fact that the dynamic data (e.g., stack trace) identifies the actual internal runtime behaviour of a monolith is potentially very valuable. Whereas static analysis can present the various theoretical considerations (e.g., all possible paths and dependencies), a robust set of test cases can identify the actual considerations (e.g., the actual paths navigated at runtime under expected traffic). It is perhaps the case therefore that the concurrent use of static and dynamic data may facilitate greater pragmatism and effectiveness in monolith decomposition.
A key consideration for any decomposition project is the scope and type of data storage employed in an existing monolith. Where a monolith employs a large, centralised data store (for example, a relational database), the decomposition into microservices can be particularly complicated. If constructs such as database stored procedures are utilised, then some of the monolith program logic may in effect be implemented inside the database. This raises various issues, including the impact on static code analysis which may not automatically examine internal database logic. Related research has partially addressed this challenge, for example through the use of Dbeaver in [P16] (refer to Section IV-C3) but various other studies largely ignore the data layer which might present as a fundamental limitation. This research suggests that an examination of the data layer should be one of the first considerations for any decomposition project.
On a point of terminology, this research has identified that there is some terminological inconsistency across the research domain. Most notably, this arises in the case of the core focus: monolith decomposition into microservices. It is not uncommon for related research to refer to this as a migration, for example in the case of [4], [31], [32], as opposed to a decomposition. Clearly, either term could be utilised, but perhaps the term decomposition is more appropriate given that any firm seeking to rearchitect a monolith into microservices will also have various other migration-oriented tasks. For example, the firm may require organisational adjustments, and to do so business processes may need to be adapted in order to pivot to a more continuous form of software engineering [138]. In this work, therefore, we prefer the term decomposition over migration and it is for this reason that the framework produced in this study has adopted the title: Monolith to Microservices Decomposition Framework.

A. Post-Review Reflection
To evaluate if the literature review was representative of the up-to-date material in the field, a further post-review reflection was undertaken. In this step, we applied the SLR protocol and conducted the literature search for the time period of October 2021 to April 2023. A total of 2712 additional studies were discovered, and following application of the original refinement steps, 21 new studies were identified. The temporal distribution shows a growth of publications with one additional study in 2021 and 18 studies in 2022. Two studies were identified in the period January 2023 to April 2023. All additional studies in the 2021 calendar year [134] and the 2023 year to April [139], [140] were included. Since the volume of literature in 2022 is large, eight studies were randomly selected for consideration [141], [142], [143], [144], [145], [146], [147], [148]. The first author extracted the information from the studies to be used in evaluating the proposed M2MDF and possibly to also highlight new findings.
The majority of the studies [139], [144], [145], [147], [148] used CIC for data collection and collected SD to perform static analysis. Three studies [141], [142], [143] collected SD+DD and performed SA+DA on the data, and one study [140] collected DD and performed DA. One study [134] used CIC+VIC for data collection and performed SA+VA, and another study [146] used CIC+MIC to collect the data and applied SA+DomA to extract microservices. These findings are consistent with the data-collection and data analysis findings reported on in Section IV-C.
While classes remain the most dominant unit of analysis [134], [139], [140], [141], [143], [144], [145], [148], other units of analysis are also employed, such as methods [142] and the combination of software artefacts and methods [146]. A new development here is that one study [147] presented its unit of analysis purely based on APIs that are extracted from the codebases using the OpenAPI (Swagger) tool. Concerning input representation, a graph is the most common input data representation method [139], [140], [142], [143], [144], [145], [147], [148] followed by matrix or vector [134], [141], and in one case the representation was unspecified [146]. In many of these studies, the graph representations are later converted into a matrix to serve as an input for clustering algorithms. Regarding microservices identification tools, Kicker [142], Java Call Graph [134], Service Cutter [146] and MoDISCO/DISCO [145], [146] are used. Several hierarchical clustering algorithms are used with new variants of the existing hierarchical algorithms. Neural network-based hierarchical clustering [146] and graph deep clustering [140] are new additions. One study [140] used Loss Function for the optimisation of their graph deep clustering algorithm.
These results from this post-review reflection indicate that the proposed M2MDF framework captures the major methods for input collection, monolith analysis, microservices identification, and evaluation metrics.

VIII. CONCLUSION
This research has identified 35 studies using a SLR protocol and snowballing method, and systematically reviewed studies examining the decomposition of monolithic applications into microservices. Various existing literature addresses aspects of the decomposition task, but up to this point, there is no single published study that organises and addresses the end-to-end decomposition task. We find that monolith decomposition into microservices is a complicated, large, varied and relatively new task. However, with the demand for more frequent software delivery and innovations in cloud computing that support distributed software systems, it is the view of the authors that monolith decomposition into microservices is a domain of importance for software engineering. The evidence in the literature suggests growing interest in this domain.
This research observed an absence of a description of the endto-end decomposition process. We therefore applied elements of Grounded Theory to systematically develop the M2MDF as a means to outline the key tasks that comprise monolith decomposition. This enabled the research to address RQ1 (What are the primary phases of monolith-to-microservices decomposition and the major constituent elements of those phases?). The primary phases of the M2MDF are: I. Input Collection, II. Monolith Analysis, III. Microservices Identification, IV. Microservices Optimisation, V. Microservices Evaluation, and VI. Microservices Deployment.
Within Input Collection, we identify four major methods for collection: Model-based (concerned with Specification/Design artefacts), Code-based (concerned with the source code), Logbased (concerned with logs produced by a system), and Versionbased (concerned with changes to systems over time). Using the data obtained from these collection methods, we find that there are four major Monolith Analysis methods: Domain Analysis (concerned with discovering domains/themes in a system), Static Analysis (analysis conducted without executing a system), Dynamic Analysis (analysis of systems at run time), and Version Analysis (analysis of changes to systems over time).
The Microservices Identification phase employs two major methods: Rule-based methods (human input, defining partitioning rules) and Clustering methods (applying unsupervised machine learning algorithms). In the Microservices Optimisation phase, candidate microservices are optimised using two identified techniques: Genetic Algorithms (to refine optimal combinations of software units within microservices) and Neural Networks (as a means to cluster classes into microservices). Through systematic examination of M2MDF phases I-IV, this research has answered RQ2 (What are the existing approaches, tools and methods observed in the decomposition of monolith applications into microservices?) In Microservice Evaluation, we find three distinct evaluation approaches: Case Studies (in-depth evaluation of individual decomposition), Examples (manual evaluation of individual decomposition), and Experiments (experimenting with different decomposition approaches). By systematically investigating this M2MDF phase, this research has answered RQ3 (What are the metrics, datasets, and benchmarks used for evaluating and validating monolith decomposition into microservices?). Finally, the Microservices Deployment phase is concerned with observing extracted microservices in production systems. Just a single study reports attempting microservices deployment at this time.
Given the relative newness of this area and the potential for growing interest in this space, we identify key areas for future research. A summary of the gaps identified is presented in Table IX, which answers RQ4 (What research gaps can be identified in the current literature?). The review suggests that major gaps exist in several areas. Much of the reported work to date is focused on just a small number of programming languages and is heavily biased towards Java. Many large enterprise monolith systems have been built using technologies other than Java (e.g., COBOL and C/C++) and we suggest that this is an area that requires more attention.
Gaps also arise from the inconsistent use of metrics at the present time. Coupling and cohesion are regularly utilised, but they are not consistently measured across the studies. As a result, it is not possible to reliably compare the effectiveness of methods proposed in different studies. Future work could seek to establish a standard set of metrics for use in monolith analysis and microservices identification. And it is not just in the analysis and identification phases that metrics are required. There is a clear absence of consistent evaluation of resulting microservices, for which the publication of datasets is warranted. A dataset should comprise various elements: including the monolith source code, the extracted microservices, and should identify the metrics utilised at the various stages. In addition to the analysis and evaluation metrics identified above, non-functional metrics should also be employed. For example, metrics to report on performance and hardware utilisation.
It is the view of the authors that the material produced by this research can have an important impact on a complicated pursuit in contemporary software engineering. The relentless drive towards better, faster, cheaper software has entered a disruptive and challenging new phase. Because they are large in size, legacy systems based on monolith architectures are slow to deploy and, when making small changes, large components might need to be rebuilt, retested and redeployed. Monoliths may be too large to be deployed to certain serverless infrastructure, for example Function-as-a-Service [14]. As a result, the potential economic benefits of services such as AWS Lambda, Google Cloud Functions and Microsoft Azure Functions (where hardware is available on demand and is charged in a pay-per-use model) are not realisable.
A word of caution is warranted. Although monolith architectures may inhibit faster and cheaper software objectives, they have proven resilient and useful over a lengthy time period. The structure and organisation permitted in monolith architectures can be amenable to developer understanding of systems. Furthermore, the development of monolith systems can be considered to be substantially refined at this point; many large operational systems are based on the monolith architecture and they are effective in delivering their functionality. While the march of technological innovation continues and economic arguments advocate emerging constructs such as serverless computing, monolith to microservice decompositions are large and expensive, and they are not risk free. It is also the case that to be successful, the broader migration activity may require adaptation to various other processes. For example, the hardware provisioning model may change and other work practices may need to be reengineered. Microservices may not be warranted in some contexts.
The work presented in this research can assist researchers and practitioners tasked with monolith decomposition into microservices. It can help firms understand decomposition tasks more completely, thereby assisting decisions surrounding proposed decompositions. Where decompositions are sanctioned, the M2MDF identifies the major options available to practitioners. For researchers, the present literature on monolith decomposition into microservices is fragmented. This research consolidates the available literature and organises the decomposition landscape. In addition to this important contribution, various significant gaps are identified. The research community can address these gaps in areas such as extending programming language support, introducing robust and standardised metrics, and in producing datasets for the consistent evaluation of migrations.
Glenn Jackson received the BA degree from the School of Computer Science, Trinity College Dublin, in 2015. He is a software developer working for Vectra AI, a company specialised in AI-driven threat detection and response. He has worked on the Future Software Systems Architecture (FSSA) Project that was aligned with Lero, the Science Foundation Ireland Research Centre for Software, School of Computing, Dublin City University.
Murat Yilmaz received the master's degree in software engineering from the University of Minnesota, with a particular focus on game theory in software engineering, and the PhD degree from Dublin City University. He is an associate professor with the Computer Engineering Department, Gazi University, where he also serves as the deputy director of the Informatics Institute. His professional journey spans more than thirteen years in the software development industry, complemented by a decade of academic experience, equipping him with a profound knowledge base and diverse expertise. He is actively engaged in numerous projects and has an extensive portfolio of academic works published in internationally recognized conferences and journals, with a primary focus on the software process, software management, empirical software engineering, algorithmic game theory, virtual reality, serious games, and gamification.
Jim Buckley received the PhD degree in computer science from the University of Limerick, in 2002. He is a professor with the Computer Science and Information Systems Department, University of Limerick, Ireland and is a co principle investigator with Lero, the Irish Research Centre for Software. He was awarded the (Lero) Director's prize for Research Excellence in 2020 and his main research interests focus on supporting software developers who are tasked with maintaining and evolving software systems. Thus, specific areas of interests include feature location, software comprehension, and architectural analysis of such systems.
Paul Clarke is an associate professor with Dublin City University (DCU) and is a member of Lero, the Science Foundation Ireland Research Centre for Software. His research interests include software practices, software architecture, and artificial intelligence. He is principal investigator on the Future Software Systems Architectures Project, an initiative which brings together industrial practitioners and academia to collaborate on machine learning techniques for architectural remodelling. He is presently DCU BSc in computer science programme chair, and National Head of Delegation for Ireland to ISO/IEC Joint Technical Committee 1, Sub Committee 7, Systems and Software Engineering. He is currently serving as Steering Committee chair for the International Conference on Software and Systems Processes (ICSSP) and as an editor for the European System, Software and Service Process Improvement (euroSPI) Conference.