Serverless on Machine Learning: A Systematic Mapping Study

Machine Learning Operations (MLOps) is an approach to managing the entire lifecycle of a machine learning model. It has evolved over the last years and has started attracting many people in research and businesses in the industry. It supports the development of machine learning (ML) pipelines typical in the phases of data collection, data pre-processing, building datasets, model training, hyper-parameters refinement, testing, and deployment to production. This complex pipeline workflow is a tedious process of iterative experimentation. Moreover, cloud computing services provide advanced features for managing ML stages and deploying them efficiently to production. Specifically, serverless computing has been applied in different stages of the machine learning pipeline. However, to the best of our knowledge, it is missing to know the serverless suitability and benefits it can provide to the ML pipeline. In this paper, we provide a systematic mapping study of machine learning systems applied on serverless architecture that include 53 relevant studies. During this study, we focused on (1) exploring the evolution trend and the main venues; (2) determining the researchers’ focus and interest in using serverless on machine learning; (3) discussing solutions that serverless computing provides to machine learning. Our results show that serverless usage is growing, and several venues are interested in the topic. In addition, we found that the most widely used serverless provider is AWS Lambda, where the primary application was used in the deployment of the ML model. Additionally, several challenges were explored, such as reducing cost, resource scalability, and reducing latency. We also discuss the potential challenges of adopting ML on serverless, such as respecting service level agreement, the cold start problem, security, and privacy. Finally, our contribution provides foundations for future research and applications that involve machine learning in serverless computing.

Sections V and VI. With Section VII, we close the paper and 98 discuss future work. 100 This section provides background information on defining 101 Serverless and ML pipeline, as we found during our system-102 atic mapping. 103 A. MACHINE LEARNING PIPELINE 104 Microsoft team members introduced a typical ML pipeline 105 [10], where they show a series of steps chained together to 106 form the machine learning workflow essential stages. These 107 stages include data and model-oriented artifacts from data 108 collection and cleaning until model evaluation deployment. 109 These stages construct an ML pipeline lifecycle. Recently, 110 with the commercial use of AI, the MLOps field has been 111 introduced, aiming to automate the ML pipeline [11]. A stan-112 dard ML pipeline broadly consists of the following five stages 113 shown in Figure 1: 114 • Data retrieval: is the process of identifying and extract- 115 ing data from a database, based on a query provided by 116 the user or application.

117
• Data preparation: is the process of gathering, combin-118 ing, structuring and organizing data.

119
• Model training: The process of training an ML model 120 involves providing the data features to an ML method or 121 algorithm to reduce errors and generalize the represen-122 tations learned from the data.

123
• Model evaluation: is evaluating the built model against 124 certain criteria to assess its performance. Model per-125 formance is usually a function defined to provide a 126 numerical value to help us decide the effectiveness of 127 any model.

128
• Hyperparameters tuning: is choosing a set of optimal 129 hyperparameters for a learning algorithm. A hyperpa-130 rameter is a parameter whose value controls the learning 131 process.

132
• Model Deployment and monitoring: is the method by 133 which you integrate a machine learning model into an 134 existing production environment to make practical busi-135 ness decisions based on data.

136
• Model Monitoring: is the close tracking of the perfor-137 mance of ML models in production.

139
Cloud computing is a widely adopted paradigm for the deliv-140 ery of computing services. Leading cloud platforms such as 141 AWS, 2 Google Cloud, 3 and Microsoft Azure 4 offer a variety 142 of provisioning services that can be used for model serving. 143 In addition, they provide several architectures with different 144 access management. These are the list of the most common 145 cloud computing architecture.
the literature on machine learning usage on serverless archi-178 tecture. In the following, we present the design of our 179 study, including the search keywords, search technique, data 180 sources, and inclusion and exclusion criteria are explained.

181
A. RESEARCH QUESTIONS 182 We set the following list of research questions as a guideline 183 during the systematic mapping review:   193 OF APPLIED MACHINE LEARNING ON SERVERLESS 194 COMPUTING ? 195 By answering this research question, we aim to provide 196 a solid foundation to classify existing research on machine 197 learning in a serverless architecture. As shown in Figure 2, we present our search and selec-232 tion process. We designed a two-stage process, a system-233 atic search similar to a previous study [18], to identify the 234 current literature on serverless usage of machine learning. 235 On Stage 1, we performed an automated search since it is 236 the typical search strategy to identify relevant studies for a 237 Systematic Mapping [19].

238
Defining the review goal, keywords were carefully selected 239 to obtain relevant articles. In stage 1, several keywords 240 were formulated and later narrowed down based on the 241 research objectives. We designed our search query based 242 on ''machine learning'' and ''serverless.'' We executed the 243 following search query on Scopus 9 : Since we are looking for a particular subject, we applied 250 the default automatic search, including the title, abstract, and 251 keywords. We executed the query in June 2022, where we 252 found 198 studies. The papers were either included among 253 the relevant articles or excluded as irrelevant for the review 254 by studying their titles, abstracts, conclusions and complete 255 content.

256
To extract only relevant articles for review, certain inclu-257 sion (IC) and exclusion (EC) criteria were set, specifically: 258 The selected papers of this study were analyzed to deter-295 mine the trends in publication and the thematic evolution.   Serverless computing has trended a significant engagement 303 over the past years [2]. This boost has been caused by indus-304 try, academia, and developers for several reasons [69]. With 305 the appearance of MLOps that include continuous and repet-306 itive tasks (i.e.,code integration, training, deployment [70]), 307 Serverless has started attracting ML developers. Researchers have been contributing on the usage of server-310 less on ML pipelines. Table 2 shows the various publication 311 venues we find in the selected research papers.

320
Following the interpretation of publications, the most pro-321 ductive and primary journals, symposiums, conferences, and 322 workshop venues related to serverless computing can be clar-323 ified. The list of journals we found is shown with their full 324 names in Table 3. All eight journal venues were mentioned 325 only once each.      that tried to employ Serverless in the end-to-end ML pipeline. 347 These results confirm that the use of serverless benefits in the 348 different stages of ML is advantageous.  Table 6 presents the serverless platforms used in the con-351 sidered research papers included in this study. It can be 352 noticed that ''AWS Lambda'' has significant usage in 39 stud-353 ies. We also found that ''Apache OpenWhisk'', ''IBM Cloud 354 Function'' and ''Google Cloud Function'' are used with 4, 355 4, and three published papers, respectively. Each platform 356 has its own set of features and differs from others. We later 357 compare the different providers in RQ3 IV-C. The main solved / discussed challenges are cost / 361 pricing (37/53) and resource scalability (30/53), as reported 362 in Table 7. The high number of studies that discussed 363 (1) cost/price and (2) scalability might indicate that Server-364 less provides a fair price architecture that provides a pay-365 per-use model that auto-scales in needs. Researchers seem 366 to be interested in using Serverless for model deployment 367 and make sure to keep a rational inference latency (22/53), 368 VOLUME 10, 2022

372
The cold start was discussed in (10/53) studies trying to 373 TABLE 9. Type of machine learning algorithms used to train models. mitigate it since Serverless containers have start-up latencies 374 in the hundreds of milliseconds to several seconds, leading 375 to the cold-start problem [71]. A significant number of stud-376 ies (10/53) discussed the Service Level Objective (SLO). 377 We mention that the SLO is an agreement set by a Server-378 less provider where there is the pre-defined service mini-379 mum response time [40]. Interestingly, few papers considered 380 security and privacy with (4/53) and (4/53), respectively. 381 However, only (2/53) paper mentioned the portability and 382 reproducibility of the run-time environment. The machine learning frameworks helped the researchers 386 to test their proposed solution easily, without understand-387 ing the underlying algorithms. Therefore, the choice of 388 framework depends on the complexity of the targeted 389 task. As reported in Table 8, the predominant ML frame-390 works are: Tensorflow (22/53), Keras (10/53) and MXNet 391 (9/53). Indeed, other frameworks have been used in recent 392 studies such as Pytorch (8/53) since it can be used for 393 distributed training in parallel machines [61], Numpy (5/53) 394 and OpenCV (3/53). The type of machine learning used to train the models 398 depends on the research goals. Table 9 shows what type of 399  It is interesting to see Serverless providers evolving their 420 services over the years. Carreira et al. [20] discussed about 421 the Serverless capacity as they were not able to run Tensor-422 flow [76] or Spark [77] functions on AWS lambda due to size 423 limits (3GB RAM). Today the limit RAM size has increased 424 to 10 GB for each serverless function [78]. We present in 425 instance is shutdown after a timeout set by the provider. 435 We can see that Amazon has the longest function timeout. 436 We noticed that the deployment package size is small and 437 differs from one provider to another. It is the total size allowed 438 for the function source code i.e.,model. Providers offer the 439 possibility of hosting the deployed model in extra storage if 440 the model exceeds the limit for the serverless package size. 441 For example, there is an option to use an external database 442 or S3 bucket to store large payloads and pass the data iden-443 tifier to the function calls. However, this option will cause 444 additional latency to the system. 445 For better services, the serverless providers may ensure 446 better user performance, especially the timeout function, 447 to keep the instance warm and avoid the cold start latency.  [56], [67]. They ensured that 456 the inference model in their proposed solution was respected. 457 For example, Amazon SLO regarding inference latency is that 458 at least 98% of inference queries must be served in 200 ms. 459 However, failing to acquiesce with the SLOs results will lead 460 to compromised quality of service or even financial loss, e.g., 461 end users will not be charged for queries not responded in 462 time [79]. Regarding the machine learning models inference, 463 the execution of small models (e.g., MNIST, Textcnn-69) 464 can respond within 50ms under each memory configuration, 465 but for the other large models, such as Bert-v1, ResNet-50 466 and VGGNet, a small memory configuration leads to quite 467 a long execution time (exceeding hundreds of milliseconds). 468 If configured with the maximum allowable memory size, the 469 execution time for a single request exceeds 200ms, which 470 makes it challenging to meet the latency SLO in the pro-471 duction environment [67]. Therefore, providers should share 472 such agreements and statistics of service violations to help 473 customers choose the best one, leading to a competitive envi-474 ronment for better services.  However, serverless functions do not support customized 489 scaling. Barista uses predictive scaling to achieve low-latency 490 VOLUME 10, 2022 durations (e.g., one day). By tracking the application, they 545 can select the pre-warming window to send inference requests 546 to continuously keep the function instance alive. Their 547 method helped to reduce resource waste while avoiding cold 548 starts. Privacy and security are always major concerns in serverless 551 computing, especially for managing and analyzing sensitive 552 data, such as healthcare data. 553 We observed that it is essential to set roles for every cloud 554 function with specific security policies to provide only neces-555 sary access and prevent non-permitted operations. For exam-556 ple, Kaplunovich and Yesha [49] applied special protection 557 to the hyperparameter metadata spreadsheet, where metadata 558 is loaded directly during the startup and stored safely and 559 securely in the protected Cloud location.

560
The Federated Learning-based architecture was proposed 561 in the primary dataset [46], [55], [65]. This computing model 562 supports edge computing, where the processing edges can 563 learn from a shared machine learning model while keeping 564 the model training on remote clients, followed by global 565 aggregation of the updated model parameters. This keeps 566 the training data local, which provides privacy and security 567 benefits. Grafberger et al. [55] considers that the challenges 568 of FL systems, such as scalability, complex infrastructure 569 management, and wasted computing, can be solved with 570 the Function-as-a-Service (FaaS) paradigm. However, it is 571 necessary to be aware of the threats caused by malicious 572 participants. For example, Tolpegin et al. [83] showed that a 573 malicious subset of participants could decrease the accuracy 574 of the model by injecting poisoned data when sending updates 575 to the global model.

576
Several additional security measures can be applied, where 577 only authorized and authenticated entities can invoke client 578 functions. A practice of security between clients was applied 579 in [55], where the FL server allows clients authenticated to 580 read only from a shared global model and write back results 581 without access to other clients. Another security measure was 582 applied in [55], where HTTP function requests exchanges can 583 be encrypted using Transport Layer Security (TLS). 584 Rausch et al. [34] chose to transmit the base model to an 585 edge device to refine the base model locally using a serverless 586 function with the private data to ensure data privacy. The edge 587 computing paradigm allows training distributed machine 588 learning models between local edge data to secure data pri-589 vacy and save resources in the cloud [65]. Bac et al. [65] 590 applied a federated learning approach on serverless edge 591 computing, where they saved bandwidth and ensured the data 592 privacy of the edge nodes.

593
Moreover, Anthony S. Deese [12] handled the used access 594 by applying AWS Cognito and Identity Access Management 595 services. These services allow a user to access and mon-596 itor only the lambda function instances he created, which 597 maintains the privacy of user training data and machine 598 results.

599
Another essential factor that heavily impacts both the cost 601 and the performance of ML serving inference is batching. operations, but a relatively small benefit from batch reads. 616 Carreira et al. [26] finds that data fetching latency becomes 617 low when applied mini-batches buffers. Wang et al. [27] 618 considered that machine learning serverless functions should 619 have a different size of data batch since many training sam-620 ples need to be processed by different workers in parallel. 621 Zhang et al. [29] showed that inference serving could benefit 622 significantly from batching using costly hardware accelera-  Edge computing is a distributed computing paradigm that 661 brings computation and data storage closer to the data 662 sources, especially popular with IoT device architecture. 663 Edge computing has several benefits, such as reducing 664 latency and bandwidth associated with public cloud [53], 665 ensuring data privacy [34], and reducing computational 666 resources relative to public and private clouds [45].

667
The serverless edge computing platform that provides the 668 appropriate support to define AI workflow functions has 669 been extended to work at the edge of the network to reduce 670 response latency and bandwidth associated with the public 671 cloud [34], [53], [65]. One of the primary purposes of using serverless with machine 681 learning is cost reduction. High service costs are the major 682 issue that papers try to reduce in different ways. Serverless 683 usage is adopted to reduce unnecessary costs and improve 684 manageability, like the allocation of virtual machines without 685 full resource usage. For example, Wang et al.
[27] demon-686 strated that a substantial amount of cost savings can be 687 achieved by replacing dedicated IaaS cloud clusters with 688 a serverless architecture. They proposed a solution called 689 SIREN to reduce the training cost compared MXNet architec-690 ture. The AMPS-Inf achieves up to 98% cost savings without 691 degrading response time performance [48]. Chahal et al. [59] 692 presented an architecture based on load balancing the ML 693 inference workload to reduce costs.

694
Cost reduction is a primary concern for developers and 695 researchers. The cost is related to the design architecture, 696 computing, inference deployment, and read/write queries. 697 Depending on the machine learning project, a serverless-698 based architecture could be an effective option to reduce the 699 cost. Inference latency was well studied in the primary set of 702 papers. Yu et al. [56] showed that the inference latency 703 increases as the model grows. They proposed a serving model 704 and generated a parallelization scheme deployed on server-705 less platforms to achieve optimal inference latency.

VOLUME 10, 2022
Several studies have searched on serverless challenges. 760 For example, Khatri et al. [90] presented a review of the 761 potential bottleneck and measured the performance of server-762 less computing. Their work was more related to serverless 763 limitations such as peak and spike scenarios, scalability, cold 764 start, and portability. They showed especially the difficulties 765 of testing and performance measurement of serverless appli-766 cations and how machine learning can monitor and predict 767 performance. Moreover, Hassan et al. [69] applied a survey 768 including 275 research papers that examined the challenges 769 that serverless computing faces nowadays and how future 770 research could enable its implementation and usage. Further-771 more, Wu et al. [91] presents several practical recommenda-772 tions for data scientists on using serverless for scalable and 773 cost-effective model serving.

774
The main challenge in serverless computing is repro-775 ducibility. Scheuner and Leitner [92] conducted a multivocal 776 literature review on the evaluation of function as a service 777 performance, covering 112 studies. They evaluated these 778 studies from the reproducibility perspective and found that 779 most studies do not follow reproducibility principles in 780 cloud experimentation. More challenges were discussed by 781 Sadaqat et al. [93]. They conducted a multivocal literature 782 review to define the core components of serverless comput-783 ing, its benefits, and its challenges. They found that serverless 784 computing is a solution that allows users to create functions 785 that intercept and operate on data flows in a scalable manner 786 without the need to manage a server, discussing that vendor 787 lock-in, skilled workers, testing complexity, and monitor-788 ing are the most recurrent challenges. They also presented 789 the expected evolution of serverless computing, such as the 790 adoption of serverless by companies and the expected market 791 growth.
[94] identified 32 patterns composing and 793 managing serverless functions by applying for a multivocal 794 literature review on 24 selected papers. They classified the 795 patterns as orchestration, aggregation, event management, 796 availability, communication, and authorization. They show 797 that depending on the serverless provider. The pattern may 798 not be the same, i.e.,AWS lambda adapted their queue service 799 (SQS) to enable FIFO messages. However, FIFO messages 800 still need to be manually managed in Azure. They present 801 their work as a pattern catalog that provides a valuable basis 802 for practitioners and researchers on serverless computing.

803
The different challenges identified in the literature related 804 to the serverless were discussed in our set of papers on 805 machine learning perspectives.  852 We applied Peterson guidelines to make our systematic map-853 ping study [14]. However, threats to validity are unavoidable.

854
This section presents the main threats to the validity of our 855 study and how we mitigated them.  have high-quality publications. We carefully defined the 865 11 https://polyaxon.com/ 12 https://mlflow.org/ inclusion/exclusion rules that respect the requirements of our 866 study with the agreement of all authors.

867
Internal validity. Internal validity relates to the exper-868 iment errors and biases. We mitigate the internal validity 869 threats caused by author bias when selecting and interpreting 870 data by applying well-assessed descriptive statistics of the 871 collected data. Several re-verification steps between authors 872 were performed to ensure a good classification dataset.

873
Construct validity. Construct validity is related to the 874 degree to which an evaluation measures what it claims. 875 We mitigated this potential bias by carefully defining the 876 research query on the Scopus database. This database was 877 preferred since it offers a more extensive list of modern 878 sources [100]. In the keywording process, we included differ-879 ent taxonomies that can be mentioned to refer to the server-880 less, i.e.,lambda architecture, function as a service. Also, 881 we are fairly confident about constructing the search string 882 since the automatic search has been followed by snowballing. 883 Also, we rigorously selected the potentially relevant stud-884 ies according to well-documented inclusion and exclusion 885 criteria. The first author performed this selection stage, and 886 randomly a sample set was verified by the second author and 887 agreement was ensured.

888
Conclusion validity. Conclusion validity is related to ran-889 dom variations and inappropriate use of statistics. To mitigate 890 it, we rigorously defined and iteratively refined our classifica-891 tion framework, such as suggested by [101], so that we could 892 reduce potential biases during the data extraction process. 893 In addition, we ensured that we aligned with our research 894 question and our main research objectives. We mitigated 895 potential threats to conclusion validity by applying the verifi-896 cation agreement between authors in case of disambiguating 897 cases. We provide a public repository for the reproducibility 898 of the study to determine whether other researchers could 899 obtain similar results from this study. 13 900

901
This study aims to provide a broader survey investigating 902 the relationships among research contributions on Machine 903 Learning usage on Serverless architecture. Specifically, 904 we performed a systematic mapping on 50 primary studies 905 and produced an overview of the state of the art on machine 906 learning applications on serverless architecture. We found 907 that (1) serverless usage on machine learning applications is 908 a growing field starting from 5 on 2018 until 20 published 909 papers on 2021, and more publication venues are interested 910 to the subject; (2) serverless was adopted on the different 911 ML pipeline, especially on ML model deployment with 33/53 912 papers. The most used serverless provider is usually AWS 913 lambda, and the used ML model was the neural network. 914 The main challenge of using serverless on ML was reducing 915 cost and pricing (37/53), ensuring enough scalable resources 916 (30/53), and reducing inference latency (22/53). There are 917 several potential challenges of adopting ML on serverless, 918 ''TrIMS: Transparent and isolated model sharing for low latency 1060 deep learning inference in function-as-a-service,'' in Proc. IEEE 12th tions,'' IEEE Access, vol. 8, pp. 18681-18692, 2020.