The Shape of Your Cloud: How to Design and Run Polylithic Cloud Applications

Nowadays the major trend in IT dictates deploying applications in the cloud, cutting the monolithic software into small, easily manageable and developable components, and running them in a microservice scheme. With these choices come the questions: which cloud service types to choose from the several available options, and how to distribute the monolith in order to best resonate with the selected cloud features. We propose a model that presents monolithic applications in a novel way and focuses on key properties that are crucial in the development of cloud-native applications. The model focuses on the organization of scaling units, and it accounts for the cost of provisioned resources in scale-out periods and invocation delays among the application components. We analyze dis-aggregated monolithic applications that are deployed in the cloud, offering both Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS) platforms. We showcase the efficiency of our proposed optimization solution by presenting the reduction in operation costs as an illustrative example. We propose to group similarly low scale components together in CaaS, while running dynamically scaled components in FaaS. By doing so, the price is decreased as unnecessary memory provisioning is eliminated, while application response time does not show any degradation.

''how to create'' the applications. It can be realized by follow-100 ing either the CaaS computing model or the FaaS paradigm, 101 depending on the granularity level that the developer can 102 consider when creating the software. Our contribution in 103 this paper is two-fold: i) we propose a model to define the 104 trade-off between response time performance (i.e., latency) 105 vs. cloud resource footprint when it comes to the decision 106 about designing, packaging and deploying a cloud native 107 application, and ii) we evaluate the model in an illustrative 108 example to provide insights to the extreme sides of this 109 trade-off through a cost analysis of public CaaS and FaaS 110 offerings. 111 This paper is organized as follows: in Section II we present 112 the main differences between cloud services that offer various 113 application deployment options for the cloud tenant and we 114 give an overview of relevant research findings; building on 115 those observations we propose an analytical cost model that 116 accounts for deployment-related costs in Section III; after-117 wards we analyze illustrative examples with optimized model 118 cost parameters and currently advertised cloud service fees in 119 Section IV; finally, we conclude the paper in Section V.

121
Virtualization techniques have brought abrupt changes not 122 only to web applications, but also to how telecommunica-123 tions systems are designed. Network Function Virtualization 124 (NFV) offers the opportunity to move the software running 125 on traditionally expensive custom physical nodes into the 126 cheap multi-purpose cloud, resulting in fast configuration and 127 development cycles and cost-efficient scalability. The emer-128 gence of concepts like cloud computing, Software-defined 129 Networking (SDN), and, ultimately, NFV, raises new pos-130 sibilities for the management of telecommunications appli-131 cations with a positive impact in terms of agility and cost. 132 From a telecommunications viewpoint these concepts can 133 help to both reduce operational expenditure and open the 134 door to new business opportunities [7], [9]. As a specific 135 example, let us take the IP Multimedia Subsystem (IMS) that 136 enables various types of media services to be provided to 137 end-users using common, IP-based protocols. To protect and 138 hide vulnerable details of the mobile operator's core network, 139 the Border Gateway Function (BGF) is placed between the 140 access and core networks providing pinhole firewall and 141 Network Address and Port Translation (NAPT) functionality. 142 As such, it is responsible for filtering and transferring the 143 RTP (Real-time Transport Protocol) based media streams 144 exchanged by mobile subscribers. Traditional telecommuni-145 cations nodes couple the states of the user sessions with the 146 physical executors. Accordingly, if a physical entity fails, 147 then the handled user sessions get lost. On the other hand, 148 it is also common that each functionality is implemented 149 on top of a dedicated hardware resource, e.g., board, DSP 150 chip, that overall makes the system distributed and inherently 151 more robust against hardware failures. In case of a failure, 152 only those calls are affected that shared the same resources, 153 which is an insignificant number of sessions. However, this 154 97972 VOLUME 10,2022 does not apply to the cloud anymore where a virtual machine  Besides the pricing aspects, resource footprint and latency 207 overhead have also been in the focus of the research commu-208 nity; particularly the cold-start latency FaaS platforms suffer 209 from when a new VM or container has to be launched to run 210 the invoked task [12]. The size of the image to mount, the 211 number of libraries and dependencies all have an impact on 212 this latency [4]. Even though communication is significantly 213 faster the closer the parties are (e.g., same data center, same 214 rack, same server machine, same process), currently available 215 platforms miss out on co-locating entities that often commu-216 nicate with each other [4].

217
To the best of our knowledge, the trade-off of scale-out 218 footprint and latency has never been addressed within the 219 packaging options of cloud-native microservices. These two 220 important aspects, which are at odds with each other, are 221 tackled in this paper. On one hand, the co-location of applica-222 tion components within the cloud results in lower operational 223 delays, hence better QoS for the application user. We consider 224 the strictest affinity policy that can be expressed in public 225 clouds today: packaging those components together within a 226 container or a function that must be run on the same hardware. 227 On the other hand, with more packaging comes less mod-228 ularity, which results in superfluous resource consumption 229 especially during scale-out regimes. Our focus is particular 230 to CaaS and FaaS, therefore, IaaS, Platform-(PaaS), and 231 Software-as-a-Service (SaaS) systems (a high level overview 232 of those is depicted in Figure 1) are out-of-scope. The reason 233 for this is that we are particularly interested in cloud services 234 that offer application-agnostic, automatically scaled deploy-235 ment options for the tenant. PaaS and SaaS are not well-suited 236 to run proprietary code in the cloud, e.g., telecommunications 237 core functions of a mobile operator.

238
Emerging from the agile practitioner communities, the 239 microservice-oriented architecture emphasizes implement-240 ing and employing multiple small-scale and independently 241 deployable microservices, rather than encapsulating all func-242 tion capabilities into one monolithic application. Microser-243 vices architecture has become enormously popular because 244 traditional monolithic architectures no longer meet the needs 245 of scalability and rapid development cycle. However, per-246 forming the migration process is not trivial. Most systems 247 acquire too many dependencies between their modules, and 248 thus cannot be sensibly broken apart. It is for this reason that 249 studies that provide information associated with the migration 250 process to practitioners are necessary. A key challenge in 251 this context is the extraction of microservices from exist-252 ing monolithic code bases. While informal migration pat-253 terns and techniques exist, there is a lack of formal models 254 and automated support tools in that area. Reference [13] 255 tackles that challenge by presenting a formal microservice 256 extraction model to allow algorithmic recommendation of 257 microservice candidates in a refactoring and migration sce-258 nario. The results show that the produced microservice can-259 didates lower the average development team size down to half 260 of the original size or lower. Furthermore, the size of rec-261 ommended microservice conforms with microservice sizing 262 reported by empirical surveys and the domain-specific redun-263 dancy among different microservices is kept at a low rate. 264 In [14] the authors address the same challenge: they propose 265 a top-down analysis approach and develop a dataflow-driven 266   Daoud et al. [17] proposes an approach combining different 285 independent models that represent a business process's con-286 trol dependencies, data dependencies, semantic dependen-287 cies, respectively. The approach is also based on collaborative 288 clustering. Reference [18] analyzes 20 migration techniques 289 proposed in the literature. Results show that most proposals 290 use approaches based on design elements as input; 90% of 291 the proposals were applied to object-oriented software (Java 292 being the predominant programming language); and that the 293 main challenge is to perform the database migration. Com-294 pared to this vast body of research, our work is novel in 295 the sense that it addresses the repercussion of dissecting a 296 monolith into too many microservices: the response time 297 performance of the application potentially worsens due to 298 added delay of inter-microservice communication. There-299 fore, we propose a model to take such aspects also into 300 account when designing a microservices-based cloud-native 301 application.

304
Cloud deployment enables easy scaling to the actual appli-305 cation load. The cost of scaling is greatly determined by the 306 organization of application into scaling units. In this section, 307 we propose an analytical model to reflect the resource foot-308 print overhead at scaling, and the latency overhead of organiz-309 ing application code into several deployment units. We show 310 that these cost terms are opposing forces that steer the 311 application designer towards organizing the application code 312 in an optimal packaging setup for reaching the sweet spot in  in Figure 2, the scaling factor is 1, 3, 4, 6, and 7 for the 368 individual modules from left to right, respectively.

369
The overall memory footprint of the whole application 370 is the area under the curve, i.e., the sum of the rectangles' 371 areas. By separating the application into modules and scaling 372 those modules with different factors, the end-to-end applica-373 tion execution time, e.g., response time for a web request, 374 is greatly reduced, but the price to pay is the above-mentioned 375 overhead: inter-module latency. Let us see how the memory 376 footprint changes if some modules are merged into a joint 377 scaling unit. For example, if the application designer decides 378 to add the module in the middle of Figure 2 to another module 379 which has either a lower or a higher scaling factor. We assume 380 that the designer does not want to make any compromises 381 on the execution speed at scale out regimes, so in the former 382 case, the applied scaling factor will be the one dictated by the 383 middle module; in the latter case it will be that of the other 384 module. In both cases there will be modules to be scaled to 385 an unnecessary extent, leading to an extra scaling cost. In the 386 specific example of Figure 2 the extra cost is represented 387 by the dashed rectangles: if the middle module is packaged 388 together with its left neighbor, then the scaling factor of the 389 merged module will be that of the middle module; similarly, 390 merged with its right neighbor, this latter will dictate the 391 scaling factor.

392
The overall operational cost is therefore increasing by 393 merging different modules of the application that require 394 diverse scaling factors. However, merging them might be 395 necessary to meet the latency requirements dictated by appli-396 cation SLA. The questions naturally arise: how many scaling 397 units should the application designer account for, and which 398 modules should be packaged together into those? We make 399 these statements in the following and provide hints on their 400 proofs.
holds. neighboring scaling unit pair j − 1, j for which 1 < j < x, the 432 following inequality must hold: It is straightforward to see that in case this inequality does must hold for the scaling group borders, i.e., the points on the 461 x-axis that fall on the borders of neighboring scaling groups. 462 ρ L denotes the width of the scaling group to the left, σ is the 463 scaling factor value that belongs to the scaling group on the 464 left, and σ R denotes the scaling factor of the scaling group to 465 the right.  Proof: The statement holds since any group that consists 480 of at least 2 modules with different scaling factors can be 481 split into 2 groups that have a lower overall scaling cost. 482 As the superfluous resource footprint of the individual scaling 483 units gets smaller, in case of scaling them out, the amount 484 of memory consumption scaled out unnecessarily is also 485 smaller.

486
In contrast to the statement in Lemma 3, the communica-487 tion costs increase monotonically with the number of scaling 488 units due to the resource overhead of virtualization and to 489 the higher number of inter-module invocation delays. These 490 opposing effects call for an optimization exercise in order to 491 find the sweet spot in operational costs of polylithic appli-492 cations. However, hindered by the complexity of the model, 493 for a joint optimization of all the listed costs, i.e., scaling and 494 communication, we propose a 2-step heuristic approach: in 495 the first step the minimal scaling cost is calculated for a set of 496 scaling group numbers, in the second step the communication 497 costs are calculated for the same set of scaling group num-498 bers, then the overall cost is minimized by summing those 499 respectively, and seeking the group number with the lowest 500 total cost. In the next section we present such a calculation 501 for an illustrative example.

502
The limitation of the model is that it ignores the call graph 503 among the modules: it groups those modules together that are 504 close in scaling factor, not necessarily those that frequently 505 invoke each other, or whose lifetime overlaps the most. 506  and FaaS usage characteristics [21]. In the latter, the authors 524 focus on the challenge of serverless platforms: the added 525 latency due to cold starts. After a thorough analysis of usage 526 data, the authors arrive at the conclusion that the resources 527 the provider has to dedicate to each application are highly 528 variable, and therefore the cost of keeping these applications 529 warm, relative to their total execution (billable) time, can be 530 prohibitively high, since the functions are very short lived 531 compared to other cloud workloads, e.g., VMs. 50% of the 532 functions run for less than 1 s on average, and 50% of the 533 functions have maximum execution time shorter than 3 s; 534 90% of the functions take at most 60 s, and 96% of the 535 functions take less than 60 s on average; while [22] shows 536 that 63% of all VM allocations last longer than 15 minutes, 537 and only ≈ 8% of the VMs last 5 minutes or less.

538
Building on the statistics published in [21], we provide a 539 numerical evaluation of the model presented in Section III. 540 We need the following types of data for our scaling vs. 541 communication costs. Reference [21] reports that half of the applications have a 544 single function, i.e., monoliths. 5% and 0.04% of the appli-545 cations have more than 10 and 100 functions, respectively.

547
The number of function invocations per day scatters over an 548 8 order of magnitude wide range. Half of the functions are 549 invoked infrequently, i.e., once per hour or less; the fifth of 550 them are invoked more than once per minute [21]. The inter-551 arrival times of invocations show an extremely high variation, 552 i.e., a coefficient of variation higher than 5, for 20% of the 553 functions, which means that the invocation rate is hectic for 554 many functions [21].

556
The authors of [21] found that 90% of the functions consume 557 less than 400MB of memory, and half of them consume less 558 than 170MB. 559 We build an illustrative example on the measurement 560 data set of [21] for showcasing our model's usability. First, 561 we assume that all public cloud providers experience similar 562 usage characteristics; second and more importantly, we sup-563 pose that modern and future applications will follow a similar 564 design in terms of modularity and deployment. In the next 565 cost calculations, we take an imaginary example application 566 for which we draw the following attribute values from the 567 empirical distributions of [21]. We make the case of an appli-568 cation consisting of 10 modules, each of them having the 569 same memory footprint of r i = 200MB for ∀i ∈ {1, . . . , 10}. 570 As per scaling dynamics, for simplicity, we distinguish off-571 peak hours and peak hours: during the former each module 572 runs at a steady pace with a scaling factor of 1, during peak 573 hours however we assume 4 modules at scaling factor 1, 574 2 modules at factor 2, 1 module at factor 4, 8, 16, and 64 each. 575 With our model's notation: s 1 = s 2 = s 3 = s 4 = 1, s 5 = 576 s 6 = 2, s 7 = 4, s 8 = 8, s 9 = 16, s 10 = 64. The logarithmic 577 steps in the modules' scaling factors are meant to reflect the 578 high dynamics of invocations reported in [21].  In the case when the call graph density is low, marked with 601 Sparse in Table 1 Table 1, however, we assume that the communication cost

608
In Table 1   The results in Table 1 show that when the fusion factor is  fusion factor the communication cost is 40, as there are 639 4 groups each imposing 10 ms of invocation delay. In the 640 same scenario, the scaling cost is 8, because in the optimal 641 grouping (1,1,1,1,2,2) (called as modules in this paper); these services work together 651 to serve more than 10 business domains and form thousands 652 of call paths. A regular application's call path may include 653 up to 100 dependent services. The microservice system pro-654 cesses around 26 billion requests per day, i.e., 300 thousand 655 requests per second. Luo et al. [19], [20] put more emphasis 656 on the interconnection of the separate microservices, and pre-657 sented an in-depth study of call graphs within the large-scale 658 deployments of microservices at Alibaba clusters. Their main 659 findings are that i) the size of microservice call graphs follows 660 a heavy-tail distribution: around 10% of call graphs consist 661 of more than 40 unique microservices (they have found that 662 the largest call graph can even consist of 1500+ microser-663 vices), ii) a small percentage of microservices are hot-spots 664 in call graphs, specifically, about 5% of microservices are 665 multiplexed by more than 90% of online services and handle 666 95% of total invocations in Alibaba traces.

667
Based on the reported numbers, let us take a considerably 668 large application, consisting of 100 microservices (the 95th 669 percentile in the cumulative distribution of the number of 670 microservices in each call graph reported in [20]). Further-671 more, let us assume an exponential skewness in the invocation 672 rate by combining the findings in [20] and [25]: a 400-fold 673 difference between the most and least frequently called 674 microservice, averaging to 100 calls per second. For sim-675 plicity, we assume a balanced memory footprint over the 676 modules, each microservice instance taking 256MB mem-677 ory. Following these assumptions, we approximate the 678 scaling factor vs. memory footprint relationship by 679 σ (ρ) = a exp (bρ), such that σ (100×256)  We depict discrete scaling factor values and the continu-683 ous approximation in Figure 4. In summary, we consider a 684 microservices-based application, consisting of 100 compo-685 nents each taking 256MB of memory, and following an expo-686 nential curve in terms of scaling factor ranging from 1 to 600, 687 averaging to 100.

688
As the next step we leverage Theorem 1, and solve the 689 differential Equation 3 for various numbers of scaling groups.

690
With the assumed exponential function, the equation that 691 must stand for scaling group borders is

693
In order to avoid an exhaustive search for the set of values of 694 ρ L and σ R that satisfy the equation, we apply the following 695 heuristic approach. We note, however, that other approaches 696 might be applied as well, e.g., [26], [27], [28]. We iteratively  The results are shown in Figure 5; the curves show the

Algorithm 1 Heuristic Algorithm for Finding Scaling Groups for Large Microservices-Based Applications
Require: σ (ρ), n n: range of numbers of scaling groups Ensure: P ∈ R that satisfy dσ cloud, potentially resulting in additional delays during service 720 invocations. As [20] reports, end-to-end latency of online 721 services increases linearly in the length of critical path, which 722 is usually proportional to the number of microservices of the 723 cloud-native application. The reason behind this phenomenon 724 is that invocation between a pair of microservices is usually 725 performed via HTTP REST API, RPC calls or Message 726 Queues, and this can lead to a large communication overhead 727 when many instances of these dependent microservices are 728 located far away from each other. Indeed, various measure-729 ment studies [4], [23], [24] report non-negligible additional 730 delays in end-to-end service response times due to invocation 731 path of separately deployed virtualized components. In [20] 732 the authors state that co-location of dependent microser-733 vices could improve response time performance by 22% on 734 average. For demonstrative purposes we depict 3 different 735 communication cost scenarios in Figure 5. The ''sparse'' 736 scenario stands for the case in which the additional delay 737 of inter-scaling group invocation increases with the square 738  The grouping strategy proposed in this paper focuses solely 771 on the scaling cost, however one might consider a grouping 772 strategy that builds the scaling groups with awareness to 773 the call graph, further decreasing the communication costs.

774
Several related work, e.g., [4], [19], [29], propose co-locating 775 such microservices that frequently invoke one another in 776 order to decrease the overall response time of the application.

777
In a set of experiments, we co-locate couples, triples, quadru- costs in terms of scaling memory overhead. For compari-781 son, we calculate the scaling cost yielded by our optimiza-782 tion algorithm. The results are depicted in Figure 7. On the 783 x-axis we show the number of microservices that are assigned 784 into groups, the remaining components are assumed to be 785 deployed as singletons. We see an increasing scaling cost as 786 the fraction of co-located microservices grow which is inline 787 with Lemma 3. In fact the larger groups we create by co-788 locating components, the faster the memory overhead grows. 789 Both phenomena are due to the fact that the scaling factors 790 of the grouped microservices are not necessarily similar. For 791 comparison, we depict the result of our proposed algorithm 792 for the couples and quintuples cases, i.e., ''couples-opt'' and 793 ''quints-opt'', respectively, in which the number and the sizes 794 of scaling groups are the same as in the co-locating policy, but 795 the grouping is performed along the scaling factor order of 796 the microservices. The scaling cost is only a fraction of those 797 of the co-locating setups. The reason for the significant dif-798 ference is clear: while our proposed algorithm minimizes the 799 scaling cost irrespective to its effects on the communications 800 costs, the co-locating policy focuses only on the invocation 801 graph, thus minimizing communications costs by grouping 802 together modules with potentially diverse scaling factors. 803 Then the more and larger groups of components with ran-804 domly picked scaling factors lead to increased scaling costs. 805 The drawback of co-locating microservices with similar 806 scaling factors, and not those that are tightly connected in 807 the call graph of the application, is visible in Figure 8. 808 We take the couples and quintuples cases from the 809 previous experiments, and calculate the communication cost 810 by assuming medium density call graph among the groups 811 after co-locating the microservices that are frequently invok-812 ing one another. As more microservices are grouped together 813 and as larger those groups are, the communication cost is 814 lower. The trend is exactly the opposite as with the scaling 815 cost in Figure 7. However, when our algorithm sorts the 816 microservices again into the same number of groups with 817 the same sizes, those strongly coupled components might 818 end up in separate scaling groups (assuming no correlation 819 between the scaling factors and network positions in the call 820 graph), making the call graph of groups dense. Therefore the 821 communication costs resulted by our grouping algorithm are 822   Table 2 we calculate the hourly fee 845 in those cloud services in the optimal fusion factor scenario 846 under the medium call graph density regime (middle row of 847 Table 1 in bold, as depicted in Figure 3). In the rightmost  warm, a premium feature available at Amazon (called Provi-863 sioned Concurrency) and Microsoft (called Premium Plan). 864 However, we omit invocation fees (negligible in this setting, 865 less than 1% of total price assuming an invocation every 10s), 866 free tiers (offered by Google), data traffic fees, and manage-867 ment fees of CaaS (charged by Google). In order to give a 868 basis for the processor fee calculation, we assume that the 869 code is continuously running for an hour, each module taking 870 1 vCPU. As Amazon does not allow to provision memory and 871 CPU independently, we consider allocating the memory [30] 872 that is necessary to reach 1 vCPU dedicated to it, i.e., 1800 873 MB instead of 200 MB for each module. 874 We show the total cost in Table 2 for each selected provider. 875 Memory consumption is computed as the total memory foot-876 print of all modules multiplied by their respective group's 877 scaling factor. In contrast, CPU fee is calculated as the num-878 ber of modules multiplied by their own scaling factor. Both 879 memory and CPU unit prices have been collected from [1], 880 [2], and [3].

881
Summarizing the figures of Table 2, we have 3 main obser-882 vations. Comparing CaaS to FaaS, we can firmly state that 883 deploying the application in CaaS is 2-fold cheaper, but it 884 is widely known that reacting to hectic demand with scale 885 out events is slower than doing the same with pre-warmed 886 FaaS [30]. This aspect does not show in our analysis. Sec-887 ond, the FaaS offering of Amazon and Microsoft come with 888 warm starts, hence the price difference compared to Google's 889 service, which is cheaper but lacks the pre-warm feature. 890 The cost is therefore expected to appear on the application 891 QoS side when customers suffer from prolonged response 892 times. Finally, the hybrid proposal, in which single modules 893 (outliers regarding their scaling factors) are run in FaaS, 894 while modules that are similar in scaling factor are pack-895 aged together in CaaS, yields a 10% saving compared to the 896 FaaS-only scenario. The cost cut is due to efficient memory 897 provisioning, and the hybrid solution does not compromise 898 on fast scaling dynamics as low scaling factor modules are 899 grouped for which scale-out rarely happens.

901
While the choice of cloud computing is unquestionable 902 when it comes to deploying an application, as public cloud 903 providers spoil the tenants with more and more service 904 models, it has become a difficult question for application 905 architects which service to use. The two major choices 906 are CaaS and FaaS, the latter being originally tailored to 907 running short-lived tasks serverless. We investigated this 908 question from the perspective of the cost vs. latency trade-909 off, for which the stressful situations are scale out periods. 910 We proposed an analytical model that incorporates the 911 memory footprint during these episodes in case the appli- the application under study in order to be able to take optimal 931 decisions regarding its packing and deployment.