Multi-Objective Prioritized Task Scheduler Using Improved Asynchronous Advantage Actor Critic (a3c) Algorithm in Multi Cloud Environment

Task scheduling is a crucial challenge in cloud computing paradigm as variety of tasks with different runtime processing capacities generated from various heterogeneous devices are coming up to cloud application console which effects system performance in terms of makespan, resource utilization, resource cost. Therefore, traditional scheduling algorithms may not adapt to this paradigm efficiently. Many existing authors developed various task schedulers by using metaheuristic approaches to solve Task scheduling problem(TSP) to get near optimal solutions but still TSP is a highly dynamic challenging scenario as it is a NP hard problem. To tackle this challenge, this paper introduces a multi objective prioritized task scheduler using improved asynchronous advantage actor critic(a3c) algorithm which uses priorities of tasks based on length of tasks, runtime processing capacities and priorities of VMs based on electricity unit cost using multi cloud environment. Scheduling process carried out in two stages. In the first stage, all incoming tasks, VM priorities are calculated at the task manager level and in the second stage, Priorities are fed to (MOPTSA3C) scheduler to generate scheduling decisions to map tasks effectively onto VMs by considering priorities and schedule tasks based on cost, resource utilization, makespan in the available multi cloud environment. Extensive simulations are conducted on Cloudsim toolkit by giving input trace different fabricated data distributions and real time worklogs of HPC2N, NASA datasets to the scheduler. For evaluating the efficacy of proposed MOPTSA3C, it compared against existing techniques i.e. DQN, A2C, MOABCQ. From the results, it is evident that proposed MOPTSA3C outperforms existing algorithms for makespan, resource utilization, resource cost, reliability.


I. INTRODUCTION
Cloud Computing paradigm gives seamless access to compute, storage, network access in terms of various services to The associate editor coordinating the review of this manuscript and approving it for publication was Nitin Gupta .
all the users around the world by accessing them from their web browser with any type of device [1].These services provided by cloud service provider(CSP) through this paradigm mainly categorized as Infrastructure as a service in which virtual infrastructure to the user to be provided to deploy their applications directly on cloud environment and access it from anywhere in the world.Platform as a service in which CSP provides a platform to users to develop their applications by providing necessary software, run time, development environment as a service.This service gives a great relaxation to the users as they don't need to worry about setting the development environment and software licenses, patching of software and they can focus on development of the application by saving time and investment in infrastructure.Software as a service provides readymade software services provided to cloud users on demand based on the requirement [2], [3].All these services are to be provided to cloud users around the world on demand based on user requirement i.e.Service Level Agreement (SLA).These resources are to be made available to users with a technique known as virtualization.All these virtual resources should be made available to users around the clock without having down time.It is possible only when these virtual resources are properly managed by the CSP.Therefore, it is important to employ an efficient task scheduler which schedules variety of tasks onto existing resources provided by CSP.It plays a major role in cloud paradigm from both the facets of cloud provider, user.It will be helpful for CSP in a way that it schedules all the tasks/jobs from various users around the world to the available virtual resources in the cloud paradigm automatically but this is a difficult challenge for a CSP to choose an algorithm which automatically manages and schedules all tasks onto virtual resources because the generated tasks are of different in size, runtime processing capacities and moreover that all tasks from users cannot be processed on a same type of a virtual resource.Therefore, choosing a proper virtual resource for a task is a main challenge.Employing an efficient task scheduler helps user to execute their tasks on an appropriate virtual resource and thereby helps user to provide quality of service and not violating SLA.The importance of the task scheduler in cloud paradigm is it effects various parameters directly or indirectly and it effects both CSP and cloud users.Resource utilization is one of the important parameter to be effected in cloud paradigm if a scheduler is not properly employed by the CSP.It results directly either into overutilization or underutilization of resources.It directly effects both CSP and users.From the facet of cloud user, it will be a direct effect if the resource utilization is very much high and if tasks are not accommodatable in the existing infrastructure, CSP would require more number of virtual resources which results in increase of resource costs and it will also impact on availability of a virtual resource to the user.Therefore, it is very important to choose and employ a scheduling algorithm which should carefully checks type of tasks, run time capacity and accordingly it should map tasks to suitable virtual resources.All the types of tasks cannot be mapped to same type of virtual resources.Therefore, it is the responsibility of CSP to carefully employ a scheduling algorithm to get balance between CSP and user to compute and facilitate all the requests of users in an efficient way which gives benefits to both users and CSP.Many existing task scheduling algorithms are proposed using various metaheuristic approaches i.e.GA [4], PSO [5], ACO [6], HEFT [7] etc.These metaheuristic approaches generates near optimal solutions as the scheduling problem in cloud computing is NP-Hard.Existing authors also used various Machine learning and Deep learning techniques i.e.DRBTSA [8], MOABCQ [9], RATS-HM [10] and few authors used hybridized approaches combining AI and ML algorithms with metaheuristic approaches to tackle task scheduling i.e.AINN-BPSO [11], [12] but still all these generates near optimal solutions in their perspective and addressed parameters makespan, energy consumption, resource utilization but these algorithms still suffers from adopting to heterogeneous tasks as it is a dynamic environment and scheduling these variety of tasks to appropriate precise VM is a challenging scenario while balancing the resource utilization and resource cost in multi cloud environment.Therefore, to tackle this issue, in this paper, we formulated a multi objective task scheduling approach which considers priorities of tasks based on their size, runtime capacity and priorities of VMs based on unit electricity cost.Schedules will be generated by using a deep reinforcement learning technique asynchronous advantage actor critic (a3c) algorithm in multi cloud environment which minimizes makespan, resource cost and improves resource utilization.The reason to choose a multi cloud environment is that while scheduling tasks to virtual resources there may be a chance of unavailability of resources in cloud environment or there may be a chance of increase in cost of resources in the cloud environment.Therefore, to minimize resource cost and improve resource utilization while scheduling the task our proposed MOPTSA3C scheduler checks for the pricing of requested resource and availability in multiple cloud environments and schedule tasks into that respective cloud environment while minimizing resource cost.

A. MOTIVATIONS AND CONTRIBUTIONS
Task Scheduling problem (TSP) plays a major role in cloud computing paradigm as it effects quality of service renders to customers and CSP while improving resource utilization, minimizing makespan, resource cost.It is important to employ an efficient task scheduler in this environment as if an incoming task is not scheduled to a suitable/precise virtual resource without considering size of tasks, runtime capacity then that task scheduling algorithm generates schedules which results in increase in makespan, improper utilization of resources.Therefore, it causes a serious problem to CSP by not utilizing virtual resources thereby effecting the makespan which results in increase of resource cost which is a serious concern for the cloud users.Therefore, this motivates us to tackle this problem using a reinforcement learning approach (a3c) which takes priorities of tasks, VMs based on unit electricity cost and checks resource availability and the cost of virtual resource in multiple cloud environments and it generates schedules while addressing makespan, resource utilization and resource cost.The main objectives and highlights of this manuscript are presented below.
1.A multi objective prioritized task scheduling algorithm is formulated using reinforcement learning strategy.2. For effective scheduling process, we have incorporated priorities of tasks, VMs based on unit electricity cost to schedule tasks in multi cloud environment.

Improved Asynchronous advantage actor critic(a3c)
algorithm is used as methodology in this research to tackle task scheduling problem in cloud computing.4. Simulations are conducted on Cloudsim to generate schedules and it is compared against existing DRLBTSA, MOABCQ, RATSHM approaches. 5. Fabricated data distributions, HPC2N, NASA worklogs are used as input to this approach to evaluate its efficacy.6.Finally, we evaluated parameters makespan, resource utilization, resource cost, reliability by using MOPTSA3C.
Rest of the manuscript is organized as follows.Section II discusses related works, Section III discusses System architecture, Section IV discusses asynchronous advantage actor critic algorithm which is the methodology used in this research, Section V discusses results, Section VI discusses Conclusion & future works.

II. RELATED WORKS
This section clearly presents existing algorithms formulated by various authors to tackle task scheduling in cloud computing.For minimization of total cost, energy consumption, authors in [13] proposed a task scheduling algorithm based on bi directional GGCN to choose precise VMs to deploy jobs or requests from various users.Authors used a randomized dataset to evaluate scheduler capability.It was implemented on COSCO framework in which they used Defog benchmark for scheduling in this approach.Hunter plus model is evaluated with different variations of CNN and results shown huge impact over other variations by minimizing energy consumption, job completion rate.Authors in [14] and [15] proposed a task scheduling algorithm in multi cloud environment to tackle trust based parameters by using a hybrid approach FTTHDRL which is a combination of Harris hawk optimization and DQN model which is a reinforcement learning based approach.In this process, scheduling performed in two stages.In the first stage, task selection and mapping to the VMs are performed using Harris Hawk algorithm.In the second stage, scheduling optimization is performed by DQN model to adapt to dynamic nature of cloud paradigm as it is difficult to identify and schedule tasks precisely.It was implemented on Cloudsim and conducted rigorous simulations are done by using realtime worklog traces.Finally, it was evaluated over state of art approaches to check the efficacy of approach.Results proved that FTTHDRL improves trust on cloud provider through SLA based parameters.Authors in [16] proposed a hybrid task scheduling mechanism which addressed makespan, resource utilization, processing cost.They used three algorithms in total to perform scheduling process.Initially task collection and prioritization performed using HEFT, initial solution generated using GRASP, schedules generated using BABC algorithm with pareto front technique.It was implemented using workflowsim.It was evaluated over state of art algorithms and results of EBABC-PF shown dominance over them for the above specified parameters.
In [17], a hybrid workflow scheduling algorithm proposed using HEFT, BAT approaches.It was implemented on workflowsim by using random workload but authors considered various realtime scientific workflows to evaluate MOHBA.This approach was compared against contemporary approaches and results shown impact over existing algorithms for improvement of makespan, resource utilization.Minimizing energy consumption in datacenter by making green computing environment is the target of authors in [18] and [19] and this aim made them to develop a VM placement algorithm by taking the constraints VM dependency, type of topology.This VMP algorithm chooses the place of VM based on above said constraints by making the unused switches to become idle and by reducing resource waste by improving resource utilization.Modified discrete Jaya optimization was used as methodology in this approach.A customized simulation environment was developed by authors considering various scenarios by varying different number of VMs to evaluate energy consumption, total task time, makespan.
Authors in [20] proposed a hybrid task scheduling approach which combines wild horse optimization, levy flight operator.In the first stage of scheduling, task distribution model developed based on schedule length, time, cost.In the second stage, generated schedules will be optimized by levy flight operator to improve local search process and to avoid premature convergence.Cloudsim tool used to implement this approach.It compared over existing state of art approaches WOA, MSA,ALO, MALO for evaluating parameters makespan, energy consumption.Simulation results proved that IWHOLF-TSC dominated all existing approaches for above mentioned parameter.
In [21], authors proposed a hybrid approach HWACO which works based on weights imposed to converge towards solutions easily compared with conventional approaches.Cloudsim used as simulation toolkit in this research.It compared against conventional approaches ACO, QANA, BPSO, FCFS.Randomized workloads are given as input to the HWACO.Analysis of results shown that HWACO outperformed conventional approaches in view of cost, efficiency, makespan.A trust based task scheduling algorithm developed using firefly algorithm in [22] and [23] to address makespan, availability, turnaround efficiency which effects trust on CSP.They used prioritized scheduling in which they considered task priorities to carefully schedule tasks to map virtual resources.It was implemented on Cloudsim toolkit.TAFFA took an input trace from HPC2N, NASA worklogs.It evaluated over state of art approaches and observed that above mentioned parameters are greatly minimized.A fault tolerant aware scheduler with multiple objectives while considering QoS constraints is developed using GBFD which minimizes 11356 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
expenditure cost for users and success rate for CSP was proposed in [24].Simulation with real world cluster taken as input and performed on Cloudsim.It evaluated against existing task scheduling mechanisms i.e.FCFS, CGDPS, MBFD.Results proved that GBFD outperforms other algorithms in various scenarios for improvement of fault tolerance, user satisfaction.
Minimization of task execution time by assigning a suitable task to an appropriate virtual resource is discussed in [25].For this to happen, GA combined with map reduce architecture is proposed in [25] and [26].This scheduler works in two stages.In first stage, tasks are assigned to processor by scheduling it with GA.In the second stage, GA with map reduce is combined and assigns heterogeneous tasks to processors in parallel with the help of priority queues.Simulations conducted using MATLAB software by using random task generation.From results it proved that GA combined with MapReduce greatly minimizes task execution time over PSO, GA, IWD, MFO, GA algorithms.
In [27], a three layered task scheduling model which minimizes makespan in Cloud paradigm.In first layer, a model that uses opposition based learning technique which uses adaptive mobility factor to expand search strategy.In the second layer, a whale optimization based gaussian approach formulates multi objective task scheduling model which minimizes task completion time.Finally, in the third layer GCWOA strategy implemented to optimize scheduling process.This model was implemented on MATLAB software by using random workload.Finally, GCWOAS2 improves resource utilization, makespan over ACO, WOA, PSO algorithms.Scheduling cost, time plays a major role in task scheduling cloud paradigm from both facets of cloud user and CSP.These issues addressed by authors in [28] and [29] by developing a scheduling algorithm using improved whale optimization.Initially a task scheduling, distribution model was developed by considering scheduling time, cost constraints.After this phase, by using inertia weight strategy whale optimization algorithm applied on this model to choose best whale i.e. in this case it is best possible task to map on to a VM.MATLAB tool was used as simulation tool for simulation.Results of IWC greatly minimizes scheduling cost, time when it was compared over PSO, ACO, WOA.Energy consumption in datacenters is a crucial part as number of users are getting increased in this model, thereby difficulty arises in efficient distribution of tasks, balancing the load among different VMs.This problem was tackled by authors in [30] by using a hybrid approach by combining squirrel search with improved GA. Proposed hybrid method improves makespan, execution time and energy consumption when it was compared ACO, PSO, GA algorithms for the above specified parameters.
In [31] and [32], authors developed sub models to improve performance of task scheduler.They used reinforcement learning and queuing models to formulate sub models i.e.Task scheduling model, execution model, transmission model to identify repetitive processes to optimize performance of the scheduler by using aggregators.Experimentation conducted using MATLAB software.With this approach efficiency of task scheduling improved by taking server rate, arrival rate of tasks as constraints when it is compared against state of art approaches.Scheduling analytics jobs in cloud paradigm is difficult as those jobs confined with different computing characteristics.Therefore authors in [33] proposed a RL based framework spark deployed cloud cluster which consists of two RL based frameworks to schedule jobs which tackles multiple objectives VM usage cost, job duration.Results shown that there is a huge impact on improvement of VM usage cost, job duration on existing frameworks which are configured with conventional algorithms.
Resource utilization, task execution time gained importance in task scheduling in cloud computing as workloads in this model drastically increased and to automate this scheduling process is a must in cloud model.To handle this situation, authors in [34] proposed a task scheduler with different RL approaches i.e.RL, RLL-LSTM, DQN, DRL-LSTM.Out of these four DRL-LSTM improves memory usage, CPU usage, task execution time over SJF, RR, IPSO.
In [35] and [36], an energy efficient based task scheduling algorithm developed using Deep reinforcement learning.This scheduler was implemented using Cloudsim toolkit and assumed that entire architecture as public cloud because users in public cloud can generate any number of tasks at any time.It was compared over conventional heuristic approaches in view of energy consumption, response time.Results proved that DRL based scheduler improves above said parameters while scheduling jobs efficiently when it is evaluated over conventional approaches.Streaming applications causes lot of challenges in cloud paradigm as they need to be scheduled with specific virtual resources configured with streaming set of configuration of resources.This problem solved by authors in [37] by deriving a dynamic online task scheduler which need to schedule huge processing capacity tasks to limited virtual resources which need to render good QoS services.This scheduler modeled by using DDQN model which have adaptive learning especially required in cloud model.DDQN-TS evaluated over conventional metaheuristics with random, google workload traces, Alibaba benchmarks and observed the improvement in evaluated parameters task completion rate, average response time over state of art approaches.
In [38], a bi objective task scheduling algorithm developed using DQL.Initially Q-learning was combined with Deep neural network to gain advantages of Q-learning.Primary concerns in formulation of this scheduler is to improve resource utilization, makespan.DQL implemented on workflowsim and compared over MIN-MIN, FCFS, MAX-MIN, RR algorithms.Results revealed that DQL based scheduler improves resource utilization, makespan over existing algorithms.In [39], energy aware task scheduler concerned with addressing multiple objectives formulated by using an AINN model.Initially all tasks from various resources are scheduled accurately using AINN model by predicting suitable VM for all incoming tasks.Input dataset generated by using GA algorithm which consists of 18 million instances.MATLAB tool used to simulate this model.It evaluated over MIN-MIN, GA, Linear regression models and AINN scheduler revealed that improvement in average makespan, energy consumption, execution overhead, active racks by 59%, 45%, 88%, 70% respectively over compared approaches.
In [31], authors formulated a task scheduling mechanism which focused on energy consumption, SLA violations.Methodology chosen by authors is a Deep Reinforcement Learning model and it consists of two stages.In first stage, Deep learning model is deployed in which QoS features were extracted using autoencoders.In second stage, a reinforcement learning approach which uses collaborative learning through which characteristics of a task can be easily deduced and scheduled onto a virtual resource.Extensive set of simulations were conducted using MATLAB.From results, it shown that proposed approach outperformed over state of art algorithms in view of SLA violation, energy consumption, QoS.
Authors in [40] and [41] proposed a cost aware task scheduler to handle realtime workloads to be scheduled onto VMs which should minimize cost to run on VMs.DRL model i.e.DQN is used as methodology to implement this cost model.Pytorch is used as tool to train and evaluate parameters.DQN was compared over conventional mechanisms i.e.RR, Random, Earliest schedulers as they are used as conventional mechanisms to process batch workloads.Finally, results proved that DQN surpassed existing algorithms over parameters Success rate, Average Response time, Cost of execution of tasks on VMs.A Three level scheduling policy is designed in [42] by authors to address parameters makespan, cost.A Deep Q-network model is enhanced to adopt this procedure.In the first level, a dynamic adaptive coefficient procedure is adapted to precisely estimate target value among all diversified values i.e. in this case they need to estimate the precise VM for set of tasks.In second level, a pointer based agent network is deployed which selects set of tasks to identify and send them onto respective VMs for processing.In third level, a sensing mechanism was deployed to identify objectives of each task set and preserve the QoS in the environment.TensorFlow framework used as simulation tool.WDQDN-RL compared over existing approaches NSGA-II, MOPSO, DQN-RL.Finally, results revealed that WDQN-RL outperforms above algorithms in view of makespan, cost.Energy consumption, makespan, Migration time are measured in [43] by formulating a task scheduling algorithm by using a hybrid approach.This hybrid approach uses capuchin search as local search process and inverted ACO as global search process.Simulations are conducted using Cloudsim with input of realtime supercomputing worklogs.From the outcomes of CapSA, it was proved that above mentioned parameters are improved in a drastic manner when it was compared over CSO, PATS, FHCS approaches.
In [44], authors formulated a hybrid workflow scheduling algorithm (PCP-ACO) which is a combination of partial critical path and Ant colony optimization algorithms.In the initial stage, PCP heuristic calculates priorities based on sub tasks and deadlines involved in workflow.In the final stage, metaheuristic will select tasks based on priorities generated by heuristic in the initial stage.Simulations conducted on workflowsim and evaluated over state of art algorithms.Execution cost of PCP-ACO improved by 19%, 17%, 21% over L-ACO, HP-GA, IC-PCP approaches.
In [45], a multi objective task scheduling model formulated by authors using an improved a3c algorithm by incorporating RCNN which consists of multiple threaded training models which helps in assigning tasks to VMs in dynamic environment.All the experimentation conducted on edge-cloud-co simulator.It was evaluated over A3C, A3C+LSTM,GOBI algorithms and evaluated parameters Average response time, Energy consumption outperformed over existing approaches.Authors in [46] proposed an adaptive multi objective scheduling strategy proposed using a metaheuristic approach.PSO is the metaheuristic used in the algorithm which uses adaptive acceleration coefficient to explore diversity of search solution space and allot tasks to appropriate VMs based on generated solutions.Cloudsim toolkit used as simulation platform and evaluated over different metaheuristic approaches.Finally generated schedules using AMTS improves resource utilization, energy consumption over existing algorithms.
In [47], a two layered scheduling strategy developed using EDA, GA metaheuristic approaches.In the initial stage, task selection, assignment are done by using EDA and expandability of search process is enhanced by GA.Finally, optimization of scheduling process carried out by combining of both approaches.It simulated on Cloudsim and evaluated over classical GA, EDA algorithms.Results proved that task completion time greatly minimized, load balancing is improved over above classical approaches.Authors in [39] proposed a multi objective scheduling mechanism which preserves QoS while allocating tasks to suitable VMs.A hybrid metaheuristic approach HGA-ACO was used to formulate scheduling mechanism in which operators of GA are enhanced using ACO and initialization of ACO performed using GA approach.All simulations are conducted using Cloudsim toolkit and compared against classical GA, ACO algorithms.From results, it proved that HGA-ACO minimizes response time, task completion time over conventional mechanisms.Energy consumption is a crucial aspect for both CSP and user in cloud paradigm.Authors in [48] and [49] aimed at minimization of energy by formulation of a task scheduler using NSGA-II, AINN techniques.In this approach, initially characteristics of tasks and selection of tasks are identified using NSGA-II approach by incorporating DVFS technique into NSGA-II.For generated tasks, an AINN technique used to predict VM for selected tasks in cloud paradigm.Simulation results shown huge impact over existing approaches by minimizing energy consumption.
In [50], a task scheduling mechanism formulated using two folded biological heuristic approaches.These algorithms are GA, BF algorithms in which initial stage formulated using GA using different operators to explore search space and generated solutions are scheduled using BF approach.These generated solutions are compared over conventional algorithms.Results shown huge impact over these algorithms with respect to reduction of makespan, energy consumption in a huge manner.
In [51], a task scheduling strategy formulated using an improved ACO which considers constraints i.e. makespan, user budget and these two constraints are used as feedback mechanism in this approach.Improved ACO considers feedback of these constraints for every iteration and evaluates makespan, cost, deadline violations, utilization of resources.Simulation results shown IACO outperformed classical metaheuristics in terms of above specified parameters.Makespan is one of the primary concern in task scheduling as it effects QoS of CSP.Therefore, authors in [52] proposed three scheduling approaches, which considers various tasks from heterogeneous resources and schedules tasks based on peak load at that respective CSP.Three scheduling approaches are MCC, MEMAX, CMMNN where it considers makespan as primary criteria and thereby resource utilization by all these approaches.Finally these are evaluated using various synthetic datasets for checking efficacy of formulated approaches over conventional approaches.All the proposed formulated approaches outperforms makespan, utilization of resources over classical algorithms.
Authors in [53] designed a task scheduling approach which addresses datacenter infrastructure efficiency, utilization of CPU, SLA violations.This model formulated by using RL which looks at rewards for every iteration and make a decision based on corresponding rewards.It was implemented using Cloudsim.With the observations of results generated by RL-EERA approach surpassed on conventional approaches in view of above mentioned parameters by effectively allocating resources to heterogeneous tasks.In [54], a task scheduler is formulated in two stages using a queuing mechanism and Q-learning which is a reinforcement learning.In the first stage, a task dispatcher uses M/M/S queuing mechanism to assign tasks to virtual resources in cloud paradigm.In second stage, for generated assignment of tasks, Q-learning mechanism is applied which gives optimized schedules for each task assigned to appropriate cloud resources.It was implemented using Cloudsim and evaluated over classical approaches to minimize energy consumption.
Authors in [55] designed a scheduling algorithm which considers multiple objectives makespan, cost to be addressed.These issues are addressed by authors in [56] using Markov gaming model which is an AI approach.It takes number of requests from different workflows, available VMs in cloud model.Extensive simulations are conducted by taking AWS EC2 instances.From observing results of Markov model based algorithm makespan, cost are greatly reduced over conventional algorithms.
In [57], a workflow scheduling mechanism developed using DRL.It developed in two stages.In the first stage, task selection and assignment operations are performed using Markov decision model.In second stage, all these schedules generated are given as an input to DDQN model to predict failures.It was simulated using Workflowsim.It compared over classical approaches and observed that improvement in makespan, utilization of resources, fault tolerance.For achieving optimized makespan results in cloud paradigm, authors in [59] developed an ML-based task scheduling algorithm which uses Q-Learning and HEFT algorithms.This scheduling is divided into two phases.In first phase, using a HEFT approach with the help of upward rank task sorting phase performed and generates schedules according to the ranks.In Second phase, Q-learning applied on generated schedules to check whether they achieved better optimized results or not.Generated schedules may vary in this paradigm as Q-table updated with different values based on obtained rewards in previous iterations.QL-HEFT compared over HEFT_U, HEFT_D,CPOP approaches.From results, it proved that QL-HEFT minimizes makespan and improves speedup ratio of tasks in huge manner.
Authors in [59] and [60] proposed a multi objective task scheduling algorithm which uses enhanced version of multiverse optimizer which is a metaheuristic approach.Main aim of authors is to address execution time, cost, resource utilization.Adaptive coefficient used to explore search space.EMVO compares over MVO, PSO approaches.Results shown that EMVO minimizes cost, execution time while resource utilization improved significantly when it is evaluated against classical approaches.
Authors in [61] designed a task scheduling algorithm focuses on addressing task processing time, makespan.This framework designed based on Q-learning which is a RL based approach.In first phase, tasks are allocated to virtual servers based on server type.In second phase, Qlearning based scheduling performed based on past history and interactions of tasks with VMs by using a parameter upper confidence bound.It works based on RL mechanism which is totally reward based.It compared against classical PSO, RR algorithms.Upon observing results of QMTSF, above said parameters are significantly improved over classical approaches.A resource scheduling framework developed by authors in [62] using Q-learning which is a RL based approach.It was implemented in workflowsim.This approach mainly addresses time, cost, deadline analysis, load balance in scheduling.When it compared over PSO and CSO, resource utilization improved by 63%, rate of task acceptance is increased by 54% when it was compared over crow search mechanism.
In [63] and [67], a DRL based scheduling approach was developed to address makespan, energy consumption, throughput resource utilization.It was implemented using Cloudsim toolkit and compared with PSO, MVO, EMVO algorithms.DRL based approach takes the input of google cloud job traces and outperformed over all approaches for mentioned parameters.Authors in [64] and [68] proposed a container based task scheduling algorithm using two folded approach.In the first phase, to choose a virtual container, MMCO used as methodology for preserving SLA.For proper CPU allocation, MPIO approach used for task clustering and for allocating tasks accurately to suitable virtual server DCNN is used.Finally it was implemented using Kubernetes container to perform containerization and when compared over classical approaches DSTS shown improvement of makespan and efficient allocation of tasks to suitable virtual servers.In [65] and [69], an adaptive task scheduling algorithm based on Reinforcement learning proposed using gradient updation for different cloud environments to accelerate and quickly adapt to that respective environment.MRLCC compared over existing baseline algorithms and proved that resource utilization rate is improved in results of MRLCC.
From above Table 1 it is clearly observed that many authors used various metaheuristic, ML, DL based approaches in order to solve task scheduling problem in Cloud Computing.While addressing task scheduling in Cloud paradigm, authors addressed makespan, execution cost, task waiting time, energy consumption, power cost, fault tolerance, resource utilization and generated near optimal solutions but in cloud model still the problem of resource utilization i.e. over utilization and underutilization problem persists as it is an NP-hard problem.Many authors used various approaches to address utilization of resources and failed to get balance in between CSP and user as if overutilization occurs resource cost will be increased drastically.If resources are underutilized configured virtual resources will be wasted which incurs huge power consumption.This will create a burden on the CSP as well as on user.There may be chances that a virtual resource may not be available in cloud environment for a specific task or the cost of virtual resource service is high in cloud environment.Therefore, to tackle this situation and to address parameters initially, (MOPTSA3C) task scheduler by carefully calculates priorities of both tasks, VMs which are coming onto cloud application console and these tasks are sorted in task manager according to task priorities.Prioritized tasks are to be mapped to prioritized VMs i.e. in this case VM priorities are evaluated using highest electricity unit cost among datacenters to electricity unit cost at that respective datacenter.In the second stage, all these priorities are fed to scheduler which uses improved A3C mechanism which is a RL based approach generates schedules for the collected prioritized tasks.The main aspect we used in this research is that we have simulated our proposed MOPTSA3C in multi cloud environment to minimize resource cost and migrate tasks to respective VMs where that respective service cost is low.

III. MATHEMATICAL MODELING & SYSTEM ARCHITECTURE
This section discusses mathematical modeling and System architecture of proposed MOPTSA3C.Initially, for mathematical formulation of task scheduler, we consider k1 number of tasks indicated as t k1 = {t 1 ,t  indicated as DC j1 = DC 1 , DC 2 , DC 3 , . . ..DC j1 .In this research, we formulated problem statement as t k tasks should be mapped to v n VMs which resided in PM i physical machines which placed in DC j datacenters and assumed it as a multi cloud environment while minimizing makespan, resource cost, improves resource utilization.The below Fig. 1. indicates proposed system architecture of MOPTSA3C.Initially, various tasks are generated from heterogeneous resources and coming to cloud application console.These tasks are captured by brokers on behalf of CSP which is a software agent employed in cloud architecture.Brokers will submit all these tasks to task manager.We have induced a process in task manager to calculate priorities of tasks based on size of tasks and to which VM it need to be assigned.Therefore, VM priorities also to be calculated based on unit electricity cost of VMs.These two priorities are fed together to MOPTSA3C which is a Deep Reinforcement learning based scheduler captures these priorities and generates schedules according to resources available in multiple cloud environments.In this approach, if one task arrived at scheduler with certain priority i.e. if it is highest priority it should be mapped to a VM which is having highest priority i.e.VM with low electricity cost at respective datacenter in multi cloud environment.Initially scheduler looks for prioritized VM availability at the datacenter and if it is not available it looks for the same prioritized VM in datacenter in other cloud environment and it also looks for the pricing of the services requested by the user in both cloud environments and migrates tasks wherever the resource cost is less.If there is a case that if none of the datacenters are available with required prioritized VM then scheduler will assign a VM with next priority in the cloud model with least cost pertaining to that service.While scheduling tasks according to the procedure adapted by MOPTSA3C we are addressing parameters makespan, resource utilization and resource costs.The below Table 2 indicates all notations we have used in mathematical modeling.
In this mathematical modeling, we assumed all these resources are in multiple cloud environments.Initially in this mathematical modeling, to calculate priorities of tasks, it is important to know how much workload currently being run on the VMs.Therefore, current workload on VMs in multiple cloud environments to be calculated using below equation as (1).
After calculation of current running load on VMs, as these VMs are placed in Physical machines.Therefore, we also need to calculate current running workload on physical machines in all considered i1 physical machines in multiple cloud environments.It is calculated using below equation as (2).
It is necessary to know about processing capacities of VMs considered in different multiple cloud environments as priorities of tasks will be depends on VM capacities and it is calculated using equation as (3).
After calculating each capacity of a VM, total processing capacities of n1 VMs considered in multiple cloud environments calculated using equation as (4).
In our research, it is important to calculate priorities of tasks as we choose a specific VM for prioritized tasks.To evaluate priority, we need to know the size of tasks coming to cloud application console.It is evaluated using equation as (5).
t l k1 = t l mips * t pr k1 (5) After calculation of size of tasks from equation(5), priorities of tasks are calculated using equation as (6).
In our research, we are calculating priorities of VMs based on unit electricity cost which helps scheduler for the efficient mapping of tasks to VMs.It is calculated using below equation as (7).
From equation ( 6) we calculated priorities of tasks, equation (7) gives priorities of VMs using electricity cost at datacenters.These both priorities are fed to MOPTSA3C scheduler by task manager to generate schedules for incoming tasks.Our scheduler generates schedules with consideration of priorities by using A3C while minimization of parameters makespan, resource cost, utilization of resources.Before calculation of makespan, we are interested in identifying execution time of tasks as makespan depends on execution time.Execution time of tasks calculated using (8).(8) Every task will have finish time and we have posed a deadline constraint in our work through which every task should complete its execution before the deadline is completed.Therefore, finish time of a task always should be less than deadline.Initially, finish time of k1 tasks are calculated using (9).We have mentioned that finish time should always be less than deadline of considered k1 tasks and it is mentioned in (10).
time < dl t k1 (10) After mathematical formulation of task, VM priorities makespan is formulated in (11).Makespan is a primary concern of any task scheduler as it depends on execution time of all considered tasks.If value of makespan increases, it effects performance of task scheduler directly.Therefore it is considered as one of the parameter to be addressed in our research.It is calculated using (11).
In (12), where, i,j is a parameter which indicates when a task t k1 is assigned to a VM v n1 and it will be set to 1 otherwise it will be set to 0. After formulation of makespan, next parameter we considered is resource cost.The main reason to choose resource cost as a parameter is most of the cloud users facing issues with high resource cost which incurs high billing for the services consumed by cloud user.This is mainly due to the inefficient mapping of tasks/jobs requested by user in the cloud environment.To tackle this and to get benefit for both customer and CSP, we formulated a prioritized task mapping procedure which maps high prioritized tasks to a high prioritized VM by checking availability of that resource availability in multiple cloud environments where the resource cost is low.If that corresponding resource is not available and then it can be assigned to the next prioritized resource in cloud environment where the resource cost is less.It is calculated using below equation as (13).
running cost of t k1 * memory for t k1 v n1 * P M i1 (13) After evaluating resource cost carefully, we are interested in evaluating utilization of resources in the cloud environment as if tasks are suitably mapped to an efficient VM then makespan Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
is minimized which also effects parameters resource cost, utilization of resources in cloud paradigm.This motivates us to formulate utilization of resources using equation (14).
Resources in cloud are of two types i.e.CPU, I/O and bandwidth.In this research, we are focussed on load of CPU in considered i 1 Physical machines in cloud environment.Therefore, load on CPU in i 1 physical machines are calculated using below equation as (14).
where, s indicates number of active tasks on a Physical machine.cpu i 1 is the capacity of cpu, usage (k1) is the usage of cpu i 1 of i 1 physical machine.Reliability of the scheduler depends on decrease in number of faults.Generally, in any cloud model, there may be a chance to occur short term faults like system crash, bugs in software.These are common faults occurred in system.Probability of occurrence for transient failures is likely to be followed by Poisson distribution.We haven't focused on transient faults memory, network interfaces in this research.we have concentrated mainly on fault rate (τ ) which depends on computing node operational frequency freq op .The relation between operational frequency and fault rate is given in equation as (15).
1−freq op min (15) where freq op indicates operational frequency, τ o is initial fault rate, F freq op is a decreasing function, where d > 0 is constant.Reliability of the system is defined in equation as (16) Re t k1 freq op = e −τ freq op .extk1 /freq op (16)

IV. METHODOLOGY USED IN PROPOSED MOPTSA3C
This section discusses methodology used in proposed MOPTSA3C which is a reinforcement learning approach i.e. improved Asynchronous Advantage Actor Critic (A3C) algorithm.It is composed with two components i.e.Actor network which is used to map your incoming state of tasks to action space where tasks need to be mapped and executed whereas on the other hand critic network evaluates action which is performed by actor network.It is an asynchronous approach in which each actor network evaluated parallelly on different threads and each thread after completion of running it evaluates loss in actor network and interacts with global network by accumulating gradients.In this research an improved A3C approach used because conventional A3C suffers with learning features in a dynamic policy based complex environments.Therefore, improved A3C which uses residual convolutional neural network which can helpful to draw complex relationship between set of tasks and hosts which improves acceleration of training to make appropriate decisions in scheduling environment.In this approach, initially all the data which is two dimensional folded is to be fed to actor critic network and it is flattened as one dimensional form and in turn which should be passed to hidden layer which is fully connected and hidden layer neurons are set to 256, kernel size is set to 2, step size is set as 1.All the data passed through hidden layers and output of that network is connected to a SoftMax activation function to keep the range of values are between 0,1.In Asynchrous advantage actor critic each of the agent runs with different threads as it is a multi-threading network where each agent employs a thread independently and based on the outcome evaluated at each node subsequently submits outcome to a global network which gives the reward.When multi thread agents are running in parallel, training speed of algorithm improves as data given as state space into every actor network.In our research, for scheduling interval at time T is represented as i T , state space is represented as s T , action space is represented as a T .Next sequence of state space is represented as s T +1 , after evaluation of input state sequences on it generates a reward which is represented as Rew T .The reward function should give either it give positive or negative results.Therefore, a policy µ should observe the results and guide it and adjust the reward to be maximized.The reward should be maximized and learned on its own by repetitive process of iterations in the model.It should be expressed as µ < s, a, , Rew,val fn >.

A. STATE SPACE
In the above tuple s represents state space which consists of set of states named as s = {s 1 ,s 2 , . ..sT which consists of different tasks.Assume stat T = {(ftin T PM i1 , ftin T t k1 } in which ftin T PM i1 indicates feature information of Physical hosts.It is represented as a matrix.ftin T t k1 indicates feature information of tasks computed on physical machines which is also represented as a matrix.

B. ACTION SPACE
In the action space, we represent all the actions to be done for all the possible states which map tasks to the concerned virtual resources.It is represented as a = {a 0 ,a 1 ,a 2 . ..aT .In a T = {d ij where d ij is mapping action or decision variable in time interval T .Entire task mapping process depends on decision variable.

C. POLICY
In the improved a3c approach, the policy µ have to control the results obtained from reward function to adjust the result in maximum optimized manner.It is represented using µ(a T |s T ) [66].This is characterized as policy function by neural network which is represented as µ(a T |s T ; £ a ).

D. REWARD FUNCTION
This reward function is most important aspect in this reinforcement learning approach as it will give the outcome of mapping of tasks.It is indicated as Rew T (s T ,a T ) .It should be calculated using equation as (17).
Reward function should give an outcome and if reward is negative then it need to give cumulative discount reward and indicated as g T .It is calculated using equation as (18).
Value function represents expectation of state and action sequences performed by action, state spaces.State value function is represented as V µ (s T ) and it is calculated using equation as (19).State-action value function is indicated as q µ (s T ,a T ) and it is calculated using equation as (20).Expectation from these two functions is indicated using expect{}.
Value function can also be calculated using neural network with the help of network parameter θ b .It is calculated using equation as (21).

F. TRAINING NETWORK
In this improved A3C approach, all threads runs simultaneously which consists of different agents and updates their decisions to global network.This process continuous until all iterations are completed and returns maximum reward value.Initially agents in multiple thread networks run with their sample data and observe the rewards and cumulative gradient will be collected and submitted to global network which checks the expected and actual values and guides each agent in runs in different threads to take a good decision for scheduling process.In the training process, for every iteration policy function to be given to the thread as µ(a T |s T ; £ a ∼ ), and value function as q(s T ,a T ; θ b ∼ ).£ a ∼ and θ b ∼ are control parameters in network and after every iteration for state s T , an action to be done with a T and a reward should be generated as Rew T which should be maximum.For every iteration, if the same policy function if applied there may be a chance of getting different gradient values.Therefore, gradient ascent method is used to get cumulative gradient and it is calculated using equation as (22).(22) After calculation of cumulative gradient using gradient ascent, there may be a chance that more action causes increase in gradient value.For every iteration, the probability of gradient value should be greater than equal to zero.Increase in gradient value slow down the learning rate.It should guide actor network to make optimized schedules but not to slow down the learning process.This is the reason A3C uses advantage function which is indicated as adv(s T ,a T ) improves calculation of gradient by subtracting from baseline function base(t).It helps algorithm to maintain unbiasedness in the process and helps it to converge in an efficient manner.It is calculated using equation as (23).
In the process for every agent at state s T , calculates reward with Rew T and value function is calculated as V (s T ; θ b ∼ ), for the next state s T +1 , value function is updated as After calculation of value function updated for every iteration temporal check point error is calculated using equation (25).

G. UPDATING PARAMETERS
This process have to be repeated by collecting data and map the tasks to suitable VMs by using improved A3C and to get maximum reward.After collecting all the gradients, it need to be submitted to global network by updating parameters.It is calculated using equation (26).

H. PROPOSED MULTI OBJECTIVE PRIORITIZED TASK SCHEDULER BY USING IMPROVED A3C
The below Fig. 2. indicates flow of proposed MOPTSA3C.Initially, it starts with initialization of Global network and network specific actor-critic parameters.After initialization, priorities of task, VMs are evaluated using eqns.6,7.Input state space, action space values, apply the policy and observe the reward using value function using eqn.17.After observing reward, check how far the values of parameters are optimized and if they produce scheduling decisions according to the expectation in training update them as best optimized values and update global and local network parameters.If not calculate the accumulated or cumulative gradient value and suggest the better scheduling decision to the policy function we used in the approach.Repeat this process until all the iterations are completed.

V. SIMULATION AND RESULTS
This section discusses about Simulation and results of proposed MOPTSA3C(Multi Objective Prioritized Task scheduler using improved A3C) algorithm.Entire simulation of the proposed approach using Cloudsim toolkit.This proposed approach uses various data distributions of fabricated datasets represented as u01, n02, l03, r04 i.e. uniform, normal, left, right skewed distributions and realtime supercomputing worklogs which are represented as h05 for HPC2N, T strt  = T repeat input status information to state space as s T , action space a T .Apply policy µ(a T |s T ; £ a ∼ ) Get reward as Rew T and move to next state s T +1 Increment global shared counter, step counter.until t−t strt  == t max or t == t end Evaluate value function µV (s T ) using equation (19).for i = t − 1, . . ., t strt do calculate value function using equation (24).calculate dθ b using equation (25).calculate d£ a using equation (23).na06 for NASA respectively.Subsection A discusses Simulation and configuration settings, Subsection B discusses calculation of makespan using MOPTSA3C, Subsection C discusses calculation of Resource cost using MOPTSA3C, Subsection D discusses calculation of Resource utilization using MOPTSA3C, Subsection E discusses calculation of Reliability using MOPTSA3C, Subsection F discusses Analysis of results and discussion.Entire simulation ran for 100 iterations.Finally proposed approach evaluated over existing approaches DQN, MOABCQ, A2C algorithms for evaluating parameters makespan, resource cost, resource utilization.

A. SIMULATION SETTINGS USED IN MOPTSA3C
The below subsection discusses simulation and configuration settings used in proposed MOPTSA3C.This below Table 3 indicates simulation settings used in our simulation.
The MOPTSA3C algorithm's total time complexity encompasses several components: first, the task priority calculation with a complexity of O(k1 log k1) for k1 tasks.Second, the computation of VM priorities among n1 VMs, taking O(n1 log n1) time.Third, establishing the mapping between k1 tasks and prioritized VMs, which demands O(k1 * n1) due to reward calculation.Finally, executing k1 tasks incurs a complexity of O(k1).In the context of the MOPTSA3C algorithm, the dominating factor for the total time complexity emerges from the mapping step (O(k1 * n1)), which significantly influences the algorithm's computational load.Consequently, while individual operations like task and VM priority calculations (O(k1 log k1) and O(n1 log n1) respectively) and task execution (O(k1)) are relevant, the algorithm's overall complexity predominantly aligns with the mapping process, specifically due to its dependence on both the number of tasks and virtual machines.Therefore, Overall Complexity is O(k1 * n1).
The below Table 4 indicates parameter settings for MOPTSA3C which is used for training.

B. MAKESPAN EVALUATION BY MOPTSA3C
This subsection discusses evaluation of makespan for MOPTSA3C.The reason to evaluate makespan is that it  directly affect scheduling process in cloud paradigm.An inefficient task scheduler increases makespan and thereby effects QoS of cloud service provider.This motivates us to evaluate makespan of MOPTSA3C scheduler in multi cloud environment by using different statistical distributions and realtime worklogs.The below Fig. 3 and Table 5 represents evaluated makespan for MOPTSA3C using uniform distribution.
Initially our proposed MOPTSA3C evaluated over baseline approaches DQN, MOABCQ, A2C algorithms to check efficacy of MOPTSA3C in view of makespan.We considered 100-1000 tasks for evaluating makespan with fabricated uniform distribution of tasks(u01).Generated makespan for DQN for 100,500, 1000 tasks is 735.21,828.57, 912.35 respectively.Generated makespan for MOABCQ with 100,500, 1000 tasks is 802.66,836.75, 926.77 respectively.Makespan generated for A2C with 100,500,1000 tasks is 712.08,809.26, 887.12 respectively.Makespan generated for MOPTSA3C with 100,500,1000 tasks is 688.18,709.27, 723.38 respectively.From the above Fig. 3 and Table 5 it  is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing makespan for uniform distribution of tasks.
The above Table 6 and Fig. 4 indicates evaluated makespan using normal distribution.Generated makespan for DQN for 100,500, 1000 tasks is 935.78,1326.77,1524.17respectively.Generated makespan for MOABCQ with 100,500, 1000 tasks is 912.67,1245.71,1609.87 respectively.Makespan generated for A2C with 100,500,1000 tasks is 824.18,1408.36,1527.15respectively.Makespan generated for MOPTSA3C with 100,500,1000 tasks is 705.26,832.11, 1096.36 respectively.From the above Figure 4 and Table 6 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing makespan for Normal distribution of tasks.
The above Table 7 and Fig. 5 indicates evaluated makespan using left skewed distribution.Generated makespan for DQN for 100,500, 1000 tasks is 824.56,978.16, 1413.22 respectively.Generated makespan for MOABCQ with 100,500, 1000 tasks is 736.06,1343.22,1487.35respectively.Makespan generated for A2C with 100,500,1000 tasks is 718.66,1098.43,1267.18respectively.Makespan generated for MOPTSA3C with 100,500,1000 tasks is 678.19,725.32, 1104.36 respectively.From the above Figure 5 and Table 7 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing makespan for left skewed distribution of tasks.
Generated makespan for DQN for 100,500, 1000 tasks is 643.39,757.68, 1426.18respectively.Generated makespan 11366 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.for MOABCQ with 100,500, 1000 tasks is 728.34,851.18, 1538.17respectively.Makespan generated for A2C with 100,500,1000 tasks is 638.18,732.07, 1387.19 respectively.Makespan generated for MOPTSA3C with 100,500,1000 tasks is 544.38,612.21, 1146.09respectively.From the above Fig.6 and Table 8 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing makespan for right skewed distribution of tasks.
Generated makespan for DQN for 100,500, 1000 tasks is 924.14, 1089.26,1437.58 respectively.Generated makespan for MOABCQ with 100,500, 1000 tasks is 853.07,1107.26,1756.93 respectively.Makespan generated for A2C with 100,500,1000 tasks is 765.64,1082.15,1643.62 respectively.Makespan generated for MOPTSA3C with 100,500,1000 tasks is 627.09,876.33, 1347.22 respectively.From the above Fig. 8 and Table 10 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing makespan for NASA worklogs.

C. RESOURCE COST EVALUATION BY MOPTSA3C
This subsection discusses clearly about evaluation of Resource cost using our proposed MOPTSA3C.The reason   for evaluating resource cost in scheduling in multi cloud environment is an effective scheduler chooses precise VM to generate optimize schedules while effecting resource cost.Ineffective scheduling leads to increase in resource cost which causes a burden to CSP and as well as to cloud users.This motivates us to evaluate resource cost using MOPTSA3C in multi cloud environment.It is evaluated over existing baseline approaches DQN, MOABCQ, A2C algorithms using different statistical distributions and realtime worklogs.The below Fig. 9 and Table 11 shows evaluated resource cost using uniform distribution for MOPTSA3C.Generated Resource cost for DQN for 100,500, 1000 tasks is 5.27, 7.08, 8.14 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 6.12, 7.26, 8.28 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 4.98, 5.87, 6.22 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 4.41, 5.25, 6.72 respectively.From the above Fig. 9 and Table 11 it is clearly shown that when tasks are increased  from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for uniform distribution.
Generated Resource cost for DQN for 100,500, 1000 tasks is 7.26, 6.87, 5.98 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 6.23, 7.88, 8.24 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 5.83, 6.84, 7.36 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 5.43, 6.37, 7.25 respectively.From the above Fig. 10 and Table 12 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for Normal distribution.
Generated Resource cost for DQN for 100,500, 1000 tasks is 8.56, 9.35, 10.47 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 7.98, 9.08, 10.06 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 7.34, 8.36, 9.46 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 7.02, 7.94, 9.12 respectively.From the above Fig.11 and Table 13 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for left skewed distribution.Generated Resource cost for DQN for 100,500, 1000 tasks is 9.78, 8.94, 10.27 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 8.57, 9.22,  10.02 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 7.87, 8.54, 9.51 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 7.29, 8.07, 9.23 respectively.From the above Fig. 12 and Table 14 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for right skewed distribution.
Generated Resource cost for DQN for 100,500, 1000 tasks is 12.57, 13.36, 15.47 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 13.45, 11.32, 12.35 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 10.37, 11.27, 12.74 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 9.07, 10.46, 11.33 respectively.From the above Fig. 13 and Table 15 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns  the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for HPC2N Workload.
Generated Resource cost for DQN for 100,500, 1000 tasks is 14.22, 12.87, 13.21 respectively.Generated Resource cost for MOABCQ with 100,500, 1000 tasks is 12.86, 11.53, 10.67 respectively.Resource cost generated for A2C with 100,500,1000 tasks is 10.26, 11.08, 12.25 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 9.57, 10.09, 11.29 respectively.From the above Fig.14 and Table 16 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by minimizing resource cost for HPC2N Workload.

D. RESOURCE UTILIZATION EVALUATION BY MOPTSA3C
This subsection discusses clearly about evaluation of Resource utilization using our proposed MOPTSA3C.The reason for evaluating resource utilization because improper assignment of tasks to VMs in cloud paradigm leads to over utilization or underutilization.It mainly effects CSP adversely which leads to high energy consumption and power cost.Therefore, in this proposed MOPTSA3C scheduler we evaluated utilization of resources over DQN, MOABCQ, A2C algorithms using different statistical distributions and realtime worklogs.The below Fig. 15 and Table 17 shows evaluated resource utilization using uniform distribution for MOPTSA3C.
Generated Resource utilization for DQN for 100,500, 1000 tasks is 60.07, 70.09, 78.74 respectively.Generated Resource utilization for MOABCQ with 100,500, 1000 tasks is 62.12, 67.28, 69.44 respectively.Resource utilization generated for A2C with 100,500,1000 tasks is 71.64, 76.38, 80.12 respectively.Resource utilization generated for MOPTSA3C with 100,500,1000 tasks is 82.09, 85.36, 88.47 respectively.From the above Fig.15 and Table 17 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving resource utilization for uniform distribution.Generated Resource utilization for DQN for 100,500, 1000 tasks is 64.28, 68.17, 70.62 respectively.Generated Resource utilization for MOABCQ with 100,500, 1000 tasks     respectively.Resource utilization generated for MOPTSA3C with 100,500,1000 tasks is 85.27, 91.64, 93.32 respectively.From the above Fig.18 and Table 20 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving resource utilization for Right Skewed distribution.Generated Resource utilization for DQN for 100,500, 1000 tasks is 52.26, 63.18, 70.23 respectively.Generated Resource utilization for MOABCQ with 100,500, 1000 tasks is 54.06, 61.76, 67.65 respectively.Resource utilization generated for A2C with 100,500,1000 tasks is 62.4, 69.98, 71.43 respectively.Resource utilization generated for MOPTSA3C with 100,500,1000 tasks is 82.32, 87.71, 90.26 respectively.From the above Fig.19 and Table 21 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all   From the above Fig.20 and Table 22 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving resource utilization for NASA Workload.

E. RELIABILITY EVALUATION BY MOPTSA3C
This subsection discusses evaluation of Reliability of scheduler using MOPTSA3C.The main reason to evaluate Reliability of the scheduler is it will directly impacts QoS of Cloud Service Provider through which users are choosing the services of that vendor.Reliability directly depends on fault rate of system i.e. in this case for Scheduler, it will be depends on fault rate of tasks which are not executed properly in the model.With this reason we have calculated Reliability using MOPTSA3C.We ran simulation of MOPTSA3C with 100, 500, 1000 tasks.Proposed Scheduler is evaluated over existing DQN, MOABCQ, A2C algorithms with both fabricated workloads and realtime supercomputing worklogs.
Initially, we evaluated Reliability of MOPTSA3C using uniform workload distribution.Generated Reliability for DQN for 100,500, 1000 tasks is 0.2,0.15,0.23 respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.134, 0.08, 0.136 respectively.Reliability generated for A2C with 100,500,1000 tasks is 0.52,0.27,0.38 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.89, 0.91, 0.92 respectively.From the below Fig. 21 and Table 23 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for uniform Workload.
Reliability of MOPTSA3C using Normal workload distribution is calculated below.Generated Reliability for DQN for 100,500, 1000 tasks is 0.52,0.36,0.49respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.167, 0.135, 0.179 respectively.Reliability generated for A2C with 100,500,1000 tasks is 0.64, 0.48, 0.73 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.85, 0.96, 0.97 respectively.From the below Fig. 22 and Table 24 it is clearly shown that when tasks are increased   from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for Normal Workload.
Reliability of MOPTSA3C using left skewed workload distribution is calculated below.Generated Reliability for DQN for 100,500, 1000 tasks is 0.35,0.78,0.21respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.154, 0.178, 0.127 respectively.Reliability generated for A2C with 100,500,1000 tasks is 0.73, 0.56, 0.81 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.91, 0.94, 0.98 respectively.From the below Fig. 23 and Table 25 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for left skewed Workload.
Reliability of MOPTSA3C using right skewed workload distribution is calculated below.Generated Reliability for DQN for 100,500, 1000 tasks is 0.47, 0.81, 0.38 respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.127, 0.142, 0.156 respectively.Reliability generated for A2C with 100,500,1000 tasks is 0.68, 0.54, 0.73 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.913, 0.925, 0.978 respectively.From the below Fig. 24 and Table 26 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for Right skewed Workload.
Reliability of MOPTSA3C using parallel computing workload (HPC2N) is calculated below.Generated Reliability for DQN for 100,500, 1000 tasks is 0.43, 0.52, 0.73 respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.163, 0.157, 0.131 respectively.Reliability generated for A2C with 100,500,1000 tasks is 0.87, 0.78, 0.79 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.926, 0.941, 0.987 respectively.From the below Fig. 25 and Table 27 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for HPC2N Workload.
Reliability of MOPTSA3C using parallel computing workload (NASA) is calculated below.Generated Reliability for DQN for 100,500, 1000 tasks is 0.81, 0.59, 0.79 respectively.Generated Reliability for MOABCQ with 100,500, 1000 tasks is 0.168, 0.149, 0.182 respectively.Reliability  generated for A2C with 100,500,1000 tasks is 0.88, 0.91, 0.93 respectively.Resource cost generated for MOPTSA3C with 100,500,1000 tasks is 0.946, 0.972, 0.99 respectively.From the below Fig. 26 and Table 28 it is clearly shown that when tasks are increased from 100 to 1000 still MOPTSA3C learns the policies posed in scheduler and outperforms all existing approaches by improving reliability for HPC2N Workload.

F. ANALYSIS OF SIMULATION RESULTS
This subsection discusses about analysis of simulation results of MOPTSA3C.Extensive simulations are conducted on Cloudsim toolkit and evaluated proposed approach using state of art algorithms by DQN, MOABCQ, A2C algorithms with different fabricated workload distributions and HPC2N, NASA realtime worklogs.In the above results mentioned in subsections of V all the parameters evaluated are outperformed over existing approaches.In this subsection, detailed analysis performed in view of different parameters.The below Tables 29, 30, 31 indicates improvement of makespan, resource cost, resource utilization respectively for proposed MOPTSA3C approach over state of art algorithms.From the above Table 29, it is clearly observed that proposed MOPTSA3C approach clearly improved makespan over existing algorithms.
From the above Table 30, it is clearly observed that proposed MOPTSA3C approach clearly improved resource cost over existing algorithms.
From the above Table 31, it is clearly observed that proposed MOPTSA3C approach clearly improved resource 11374 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
utilization over existing algorithms.From the above section in result analysis, we have observed that improved A3C in multi cloud environment learns features very fast even tasks are drastically increased or decreased.We evaluated MOPTSA3C with different fabricated statistical distributions and realtime workloads HPC2N, NASA.
From the above Table 32, it is clearly observed that proposed MOPTSA3C approach clearly improved reliability over existing algorithms.From the above section in result analysis, we have observed that improved A3C in multi cloud environment learns features very fast even tasks are drastically increased or decreased.We evaluated MOPTSA3C with different fabricated statistical distributions and realtime workloads HPC2N, NASA.All the evaluated results of MOPTSA3C outperformed DQN, MOABCQ, A2C approaches in view of makespan, Resource cost, utilization of Resources, Reliability.

VI. CONCLUSION AND FUTURE WORK
Task scheduling problem (TSP) is a prodigious challenge in cloud computing due to variable incoming tasks comes up to cloud application console.It is an important concern for CSP to employ a dynamic and effective task scheduler which take care of suitability of tasks mapped to VMs in cloud environment.An ineffective task scheduler in cloud paradigm effects various parameters i.e. makespan, resource cost, resource utilization.Many existing authors used metaheuristic approaches and developed task schedulers through which they got near optimal approximated scheduling decisions which may not always fit for all the conditions as it is a dynamic environment.Therefore, to tackle this situation, In this research, we used a reinforcement learning technique named as Improved Asynchronous Advantage Actor critic (A3C) algorithm to model MOPTSA3C scheduler in multicloud environment There are two phases in this scheduling approach.
In the First stage, all tasks coming to cloud application console are captured and their priorities are evaluated and VM priorities also evaluated based on unit electricity cost at datacenters.In the second stage, these priorities are fed to scheduler which integrated with Reinforcement learning model, generates scheduling decisions and generates reward based on multiple thread workers running on actor networks.After that critic network evaluates generated rewards and evaluate cumulative gradient based on applied policy in the network and guides it to move towards a better scheduling decisions according to the training given for the agent.Finally, we compared MOPTSA3C with existing state of art algorithms DQN, MOABCQ, A2C approaches by variating tasks from 100 t0 1000.In all the cases, MOPTSA3C minimizes makespan, resource cost, improved utilization of resources and reliability over existing approaches.In future, we are planning to deploy this scheduler in realtime cloud environment such as OpenStack to check the efficacy of the scheduler.

FIGURE 9 .
FIGURE 9. Evaluation of resource cost using u01.

FIGURE 10 .
FIGURE 10.Evaluation of resource cost using n02.

FIGURE 11 .
FIGURE 11.Evaluation of resource cost using l03.

FIGURE 14 .
FIGURE 14. Evaluation of resource cost using na06.

TABLE 1 .
Task scheduling algorithms proposed by existing authors.

TABLE 1 .
(Continued.) Task scheduling algorithms proposed by existing authors.

TABLE 2 .
Notations used in mathematical modeling of MOPTSA3C.

TABLE 3 .
Configuration settings for simulation.

TABLE 6 .
Evaluation of Makespan using normal distribution.

TABLE 11 .
Evaluation of resource cost using uniform distribution.

TABLE 12 .
Evaluation of resource cost using normal distribution.

TABLE 13 .
Evaluation of resource cost using left skewed distribution.

TABLE 16 .
Evaluation of resource cost using NASA workload.

FIGURE 15 .
Evaluation of resource utilization using u01.

TABLE 17 .
Evaluation of resource utilization using uniform distribution.

TABLE 18 .
Evaluation of resource utilization using normal distribution.

TABLE 22 .
Evaluation of resource utilization using NASA workload.

TABLE 23 .
Evaluation of reliability using uniform distribution.

TABLE 24 .
Evaluation of reliability using normal distribution.

TABLE 25 .
Evaluation of reliability using left skewed distribution.FIGURE 24.Evaluation of reliability using r04.

TABLE 26 .
Evaluation of reliability using right skewed distribution.

TABLE 27 .
Evaluation of reliability using HPC2N workload.FIGURE 26.Evaluation of reliability using na06.

TABLE 28 .
Evaluation of reliability using NASA workload.

TABLE 30 .
Improvement of resource cost (%) over existing algorithms.

TABLE 31 .
Improvement of resource utilization (%) over existing algorithms.