To Migrate or not to Migrate: An Analysis of Operator Migration in Distributed Stream Processing

One of the most important issues in data stream processing systems is to use operator migration to handle highly variable workloads in a cost-efficient manner and adapt to the needs at any given time on demand. Operator migration is a complex process that involves changes in the state and stream management of a running query, typically without any loss of data, and with as little disruption to the execution as possible. This survey provides an overview of solutions for operator migration from a historical perspective as well as the perspective of the goal of migration. It introduces a conceptual model of operator migration to establish a unified terminology and classify existing solutions. Existing work in the area is analyzed to separate the mechanism of migration from the decision to migrate the data. In case of the latter, a cost-benefit analysis is emphasized that is important for operator migration but is often only implicitly addressed, or is neglected altogether. A description of the available solutions provides the reader with a good understanding of the design alternatives from an algorithmic viewpoint. We complement this with an empirical study to provide quantitative insights on the impact of different design alternatives on the mechanisms of migration.


INTRODUCTION
Stream processing has been researched for more than 20 years, and is becoming ubiquitous in domains of application where just-in-time decision-making is essential [1], like the Internet of Things (IoT), fraud and anomaly detection, smart cities, and autonomic systems.An indicator of the wide use and importance of stream processing is that most cloud vendors offer support for deploying managed stream processing pipelines [2], and a sign of its future relevance is the estimated economic impact of the IoT industry, estimated to be between $3.9 trillion and $11.1 trillion a year by 2025, around 11% of the global economy [3].
Stream processing engines (SPE) come in several flavors, are deployed in different environments (i.e., cloud, fog, edge, in-network), and are called data stream management systems, real-time stream analytics, event stream processing, and complex event processing (CEP).The common denominator in all these systems is that data arrive continuously (generally as tuples) from multiple sources, and need to be processed as soon as they arrive (in memory) to enable immediate decision-making.Thus, the response time must be short even in case of large loads.
SPEs take queries as input and compile them into operator graphs.These are directed acyclic graphs (DAGs) that represent the logical execution of a query, which includes the state management of sub-queries and stream management, i.e., the dependencies between them.If these operators are mapped to several physical hosts and form an overlay network, a distributed stream processing system (DSPS) is established.Incoming tuples to an operator are processed in some way, by transforming, filtering, aggregating, or running a user-defined function.
A key requirement for a DSPS is the ability to handle system dynamics, like changes in workload or resource availability, and potentially mobility.The key mechanism for handling such changes through load balancing and elasticity is operator migration.It is used, for example, to react when applications do not get the guaranteed quality of service (QoS), optimize hardware utilization in data centers, react to failures, and to move operators closer to the data sources when they move.Operator migration entails (1) state management to move the state of the operator from Author's address: Espen Volnes, Thomas Plagemann and Vera Goebel, University of Oslo, Department of Informatics, P.O box 1080, Gaustadalléen 23 B, Oslo, 0316 Oslo, Norway, {espenvol,plageman,goebel}@ifi.uio.no.
an old host to a new host, and (2) stream management to change data stream routing in the overlay network and, potentially, to buffer data tuples during migration.Decisions on when to migrate the data and where to migrate them to are key aspects of operator migration.The potential approaches to state management, stream management, and decision-making as well as their combinations result in a large design space for operator migration algorithms.
The goal of this work is to give the reader a good understanding of existing solutions to operator migration and the effects of the relevant design decisions.To this end, we develop a conceptual model that captures the fundamental components of operator migration, i.e., the components on which all solutions are based.This model should contribute to operator migration algorithms by providing a unified terminology and establishing a taxonomy.Based on this, we review the literature on operator migration.
Operator migration introduces some form of cost, like freeze time during migration or increased resource consumption to move the state of the operator.Thus, keeping these costs low is a core requirement in the design of operator migration algorithms.Furthermore, during decision-making, it is important to balance the costs of migration against its benefits.
There is a general awareness of this trade-off, but surprisingly, few studies have explicitly described how costs and benefits are considered in the migration algorithm and decision-making.Therefore, we place particular emphasis on costs and benefits in our analysis of work in the area.This leads us to two research questions that structure our survey of studies on operator migration: • Which mechanisms are used to perform migration?
• How is the migration decision executed?
These research questions are generic, and can be applied to all applications that perform operator migration.However, we focus in this survey on streaming applications to categorize solutions for operator migration based on the conceptual migration model.The criteria for inclusion in the search of the literature are that the studies must be related to operator migration, where the cost of migration is significant, and the QoS must be handled or evaluated with regard to migration.
Works where migration is just handled peripherally and the QoS is not important are excluded.
In addition to this functional view of operator migration, we perform an empirical study to gain quantitative insights into different operator migration and decision models.This empirical quantification demonstrates the need for a comprehensive migration model beyond the contribution of the literature.We use Apache Flink [4] and Siddhi [5], two operator migration algorithms, and apply part of the NEXMark benchmark [6] as workload to measure performance and resource consumption.The aim is to illustrate the quantitative effect of different design decisions.
Available surveys.The available surveys on data stream processing that include operator migration [7][8][9][10][11][12][13][14] do not answer the above research questions, and do not give the reader an insight into the quantitative impact of certain design decisions.Although they do investigate the issue of online reconfigurations of SPEs, none of the studied surveys has adequately investigated different types of operator migration algorithms as well as the relationships among the cost of migration, the decision to migrate, and the migration algorithm.
Lakshmanan et al. [7] studied reconfigurations in stream processing and defined a migration model that has similarities to the one developed here.They distinguished between solutions based on where the change is made: either in the network, data, or flow graph.Moreover, different triggers for migrations were studied, such as thresholds, constraint violations, and periodic re-evaluations.However, they did not investigate the different varieties of operator migration in any detail.Hummer et al. [8] focused on challenges and techniques of high-throughput streaming applications.Hirzel et al. [9] cataloged different types of stream processing optimizations, and To et al. [11] studied state management in stream processing systems but did not examine state migration in much detail.Similarly, De et al. [10] explored migration in relation to stream processing and edge computing but did not conduct a critical comparison of the different solutions.We suggest that studying solutions using our conceptual migration model makes such a comparison possible, and deem it necessary to fully answer our research questions.Röger et al. [12] investigated operator migration but focused on elasticity, whereas we investigate operator migration as a whole.Qin et al. [13] defined a taxonomy for different live reconfigurations in SPEs, including operator migration, but performed only a superficial analysis.Bergui et al. [14] surveyed geo-distributed frameworks and discussed several challenges pertaining to geo-distributed data analytics, where operator migration plays only a minor role in some of the solutions.
The main contributions of this work are as follows: • We propose a conceptual model of operator migration that provides a unified terminology and leads to a taxonomy of operator migration.Moreover, this model facilitates the development of new operator migration algorithms.
• We provide a survey of work on operator migration that analyzes not only current stream management and state management solutions, but also emphasizes a cost-benefit analysis of the migration decision.
• We report an experimental study involving two migration algorithms on Apache Flink and Siddhi to gain an insight into the quantitative aspects of operator migration.
The remainder of this survey is structured as follows: In Section 2, we introduce the conceptual model of operator migration.In Section 3, we analyze the literature according to the above research questions.In Section 4, we present the experimental evaluation of two migration algorithms for Apache Flink and Siddhi.In Section 5, we discuss future directions of research in the area, and Section 6 provides the conclusions of this survey.

A CONCEPTUAL MODEL OF OPERATOR MIGRATION
In this section, we establish a conceptual model of operator migration to capture the basic concepts and elements on which consensus has been achieved in the literature, and form a unified terminology for operator migration.To keep the presentation of the model concise, we largely refrain, in this section, from referring to the original studies and the terminologies that they use.This is extensively done in Section 3. Since operator migration is the means to change the placement of at least one operator in a DSPS, we start with a description of the initial placement problem before analyzing operator migration.This analysis identifies the basic components of operator migration.These are grouped into two major concerns: (1) stream management to stop, buffer, redirect, and start streams; and (2) state management to establish the current state of the operator at the new operator host, which may require moving the state from the old host to the new one, and starting a replica for the operator on the latter before the former finishes.The cost of operator migration plays an important role in the design of migration mechanisms and the decision on whether to migrate.Therefore, we cover in Section 2.3 the common cost parameters, and, in Section 2.4, the migration decision.
The latter is triggered, for example, by a change in the system load, and comprises the calculation of a new operator placement as well as a comparison of the benefits of the new placement versus the costs of the migration itself.
Table 1 lists the studies considered in this work that form the foundation of the conceptual model.It classifies them according to the environment of their deployment and the goal of migration, which are important factors for the migration decision and placement.The most common deployment environments for distributed stream processing (DSP) are cloud, fog, and edge networks.Cloud has been used to classify data center applications that might handle very high throughputs, and can scale the stems both horizontally and vertically to handle variable traffic loads.The concepts of fog and edge are relatively new terms that seem similar, but have some significant differences.Edge computing often focuses on offloading heavy tasks from local resource-constrained devices to either a close base station or a data center.With edge computing, heavy tasks, like machine learning-based inference and videogames, can be executed using smartphones and laptops.Fog is an extension of the cloud in which the computing tasks of an application are distributed on multiple devices, including end devices, edge resources, and the cloud itself [15].As such, clients may send most information to a server close to them instead of a centralized data center to reduce energy consumption, congestion on the Internet, and response times for clients.[18,29,46,47,58,71] QoS [19, 37, 38, 41, 42, 44, 47, 49, 51-53, 55, 56, 59, 60, 62, 63, 65, 67, 68, 73] Table 1.Overview of studies on the categories of operator migration

Initial placement
A DSP can be considered to be a set of collaborating SPEs that form an overlay network to process queries over data streams.Consider Figure 1 as an example of such an application.SPEs run on network nodes that provide the computational and networking resources for the DSP overlay.The objective of the initial operator placement task is to distribute the processing of a query over network nodes such that the goals of the system can be met as adequately as possible.The first step is to transform a query into an operator graph.An operator graph can be modeled as a DAG in which the operators derived from the query are represented as vertices.The placement of these operators in a network, i.e., finding appropriate network nodes to host the operators, is typically driven by an objective function.Such an objective function typically includes (contradicting) criteria of optimization, like low latency of event delivery, low resource consumption (e.g., bandwidth and energy), reliability, and fault tolerance.Typically, a placement function is used to calculate a score for a placement based on the criteria of optimization.To find the optimal placement is an NP-hard problem, and heuristics are often used to find close to optimal solutions.To simplify the discussion, we use the term "optimal" to represent results that are also "close to optimal" based on heuristics.Both centralized and decentralized versions of operator placement can be used to establish an operator network, and are generally implemented as an overlay for the DSP.Nodes in an operator network can be static or mobile, and have one or more of the following roles: • Data producer: Examples of this include sensors that convert analog signals into data tuples, often with a fixed sampling rate, and software monitors that might create data tuples at a dynamic rate.Obviously, it is important that all produced tuples can be forwarded into the operator network for further processing.• Data consumer: These are nodes that request a service, and typically have some QoS requirement, such as tuple latency.
• Hosts: They process at least one operator and contribute to event forwarding in the operator network, i.e., map the input events (from upstream nodes) of the operators they process to output events, and forward them to downstream nodes in the operator network.
DSP is typically performed in a dynamic context involving variable workload, resource availability, and mobility.As such, initial placement might, after some time, become sub-optimal and the operator network should be adapted by migrating one or several operators to a new host.

Migration mechanism
The two major concerns of migration mechanisms are state management and stream management.State management is relevant to operators that derive their output based on multiple tuples, e.g., looking for a sequence of tuples using the CEP, joining streams, or aggregating tuples over windows.The state can be thought of as tuples.The internal state of the operator is, in practice, typically optimized to include only the necessary information for the given operator, such as the given aggregate value for the extent of a window, or as a finite state machine in the CEP.Therefore, operator migration distinguishes itself markedly from virtual machine (VM) migration, where the entire VM must be transferred to the new host.While some solutions to operator migration, such as the MCEP [62], do include VM migration, VM migration is not the focus of this survey.The simplest method of operator migration for a stateful operator is to move it to the new host and replay all necessary historical tuples from the upstream nodes.This technique is used in current publish-subscribe systems, such as Kafka [74], to achieve fault tolerance in stream processing systems like Flink [4].
Using this technique also makes it possible to migrate the data to a different stream processing system, which is usually not possible when extracting the state from the system because the internal state is system specific.However, as the state can grow to become very large, it is often undesirable to replay all tuples.Therefore, this survey focuses on operator migration techniques that extract the state from the stream processing system and move it to the new host.
The task of state management is to establish, at the new host, an operator with the state of the operator at the old host when switching the processing from the old host to the new host.In a moving state algorithm, the old host extracts the state of the operator and sends it to the new host.Some algorithms do not need to perform this task, either because they manage stateless operators, e.g., filter operators, or because the old and the new host can schedule a handover.
In a parallel-track algorithm, originally a term used by Zhu et al. [75], both the old and new hosts receive the same tuples for some time during migration.The handover from the old to the new host is carried out gradually such that the downtime of the operator is minimized.The cost of this approach is that upstream nodes must send twice as many tuples during some part of the migration.A parallel-track algorithm with moving state is called state-recreation, and one without moving state is called window-recreation.These terms are inspired by StreamCloud [64].In a single-track algorithm, the upstream nodes send tuples either to the old host or to the new host.
Stream management deals with notifying upstream and downstream nodes of changes made to the topology.
Typically, nodes have to update their routing table to reflect the new topology at the upstream node, and this results in a redirection of the outgoing stream to the new operator host.To prevent tuples from getting lost when the operator is down, streams might be stopped and tuples need to be buffered.There are three locations at which tuples can be buffered: upstream nodes, the old host, and the new host.These tasks of redirecting streams, stopping streams, buffering streams, and restarting streams are coordinated among the hosts involved through control messages.Both centralized and decentralized coordination are possible.As such, there are several design options that can be implemented for a particular operator migration solution.
Figure 2 illustrates the concepts and building blocks that constitute migration algorithms and the relevant decisionmaking.All algorithms require some stream management functions, such as stop, start, buffer, and redirect.State management differs in that there are many more variations among approaches.Migration decisions first require a trigger for when to make a migration decision and then a placement mechanism to determine the placement that yields the best performance.The cost of the migration must be weighed against its benefit.The degree to which the migration decision should be proactive (e.g., before a host becomes overloaded) or reactive (e.g., when a host is overloaded) must be determined, where the former decisions have higher uncertainty and the latter have a higher cost of migration.Figure 3 shows the different migration algorithms that are formed using the above building blocks.The small text in brackets under some of the categories denotes a term for the given type of algorithm.For instance, a pause-drain-resume algorithm is a single-track algorithm without moving state, and, as described earlier, a parallel-track algorithm with moving state is a state-recreation algorithm.The most basic operator migration algorithm is a pause-drain-resume algorithm, and works only with stateless operators or in cases where some state inconsistency is permitted.The operator to migrate is first started on the new host while the old host is also running it.Then, upstream nodes redirect their output streams from the old to the new host.After this, the old host can stop the execution of the operator.Since no state needs to be moved, migration occurs without any downtime.A few control messages must be sent (1) from the controller to the old host, (2) from the old host to the new host, and (3) from the old host to the upstream nodes.As there is no downtime for the operator, any delay caused by these messages is negligible.
When state must be moved and a single-track is used, the operator has some downtime.Specifically, tuples can neither be processed on the old nor the new host while the state is in transmission.In this process, tuples from the upstream nodes must be buffered before they can be processed by the new host.The buffering can be carried out on the upstream nodes, the old host, or the new host.In many cases, tuples can be received after query processing has been stopped.These tuples need to be forwarded from the old host to the new host.
Partial state movement involves splitting the state to be migrated into several parts and moving these parts to the new host while the operator is still processing on the old host.This approach avoids having to stop operator processing for the entire state transfer.If the state is periodically checkpointed and distributed on different nodes, this is called  [71] No DCR [32,60] NMS (Window-recreation) [32,63,64,75,77] Fig. 3. Migration algorithms checkpoint-assisted migration, and can substantially reduce or eliminate the downtime of the operator.Either the entire state already exists on the new host or an incremental checkpoint is extracted before the operator shuts down, and then is sent to the new host.While the last checkpoint is sent to the new host, the operator stops for a much shorter time than when the entire checkpoint is sent at once.A single-track moving state solution can never avoid downtime, because of which parallel-track solutions have been developed.
Parallel-track algorithms differ from single-track algorithms in a fundamental way.They can achieve zero downtime, but at the cost of running the old and new hosts with duplicate input streams and, sometimes, duplicate output streams.
A moving-state parallel-track algorithm performs state-recreation, which means that the new host receives the state from the old host while also receiving the same tuples as it does from upstream.A parallel-track algorithm without state migration performs window-recreation, which means that the new host receives the same tuples as the old host until they both have the same tuples in their windows.At this point, the upstream nodes redirect their streams to the new host, and it takes over without the tuples being buffered or any waiting time.
An important motivation for establishing the terminology and building blocks in Figure 3 is that existing work has described the same concepts by different names.For instance, what Zhu et al. [75] call parallel-track is described as window-recreation in StreamCloud [64], smooth migration in Enorm [32], and the seamless minimal state in TCEP [63].In StreamCloud, a different algorithm called state-recreation is also parallel-track, but also involves moving state.
In contrast to parallel-track, single-track with moving state is called disruptive migration in Enorm [32] and Pause & Resume in [78] because it leads to downtime, as opposed to smooth migration that eliminates downtime.Instant migration is single-track migration without moving state.Checkpoint-assisted algorithms have been described in [28,47,71].Even though these algorithms feature partial state movement, they are different from the Megaphone [45] as they maintain distributed copies of the state on replicas.As a result, only the minimal state needs to be sent during migration.TCEP has a single-track moving state algorithm called the moving fine-grained state, but it does not use partial state movement as the name suggests.Instead, operators in the operator graph are migrated, one operator at a time, each moving its state all at once.

Cost
Operations performed as part of state management and stream management lead to two classes of the cost of migration, related to resource consumption and temporal aspects.The latter are caused by the fact that the operator is not operational during state extraction, state serialization, state movement, state deserialization, and runtime initialization.
The bandwidth required to move the state from the old to the new host is the most commonly considered resource in geo-distributed cases.The computational requirements of extracting the state from the old host and the messages needed to coordinate stream management are more commonly considered in centralized data centers.Stream management messages may also have an impact on the duration for which an operator cannot work, e.g., streams from upstream nodes are stopped and no events arrive at the operator until they have been re-directed and started again.It should be noted that latency spikes reflect the cost much better than freeze time since it is possible for the operator not to produce any event during the freeze period.Examples of such a case are incoming events during this period that do not match the pattern that can trigger the operator to produce an event, or if a tumbling window implemented by the operator is much larger than the migration time such that all delayed incoming events can be processed on the new host before the window expires.
From the descriptions of the different types of algorithms above, it is easy to see that they differ in the cost of migration.Operator downtime or the latency of the output tuple can be considered a reasonable definition of the cost of migration for single-track state movement algorithms.However, for parallel-track algorithms, this definition of cost can result in excessively frequent migrations because the cost is always close to zero.Therefore, it is necessary to define the cost of migration in such a way that the migrations do not become too frequent.
With parallel-track, operator replicas need to be executed during migration, and upstream nodes must send duplicate streams to the old and new hosts.They may also take a significant amount of time to execute when using windowrecreation, which, in addition to using operator replicas during this time, might result in a significant increase in monetary costs.Therefore, it makes sense to consider the monetary cost when using parallel-track algorithms.

Migration decision
Most studies have handled migration as part of an adaptation mechanism, where the goal is to improve execution or recover it in case of node failure.Regularly collected metrics can be used to indicate the need for an adaptation.Recent surveys have focused on adaptation mechanisms [2,13], and in this work, migration is the main objective of study.
Migration is usually the most costly aspect of an adaptation, and this perspective can be useful for better understanding adaptations.Even though adaptations differ in some aspects, they share major parts in terms of cost.
2.4.1 Migration goal.This section describes four of the most common goals of migration: load balancing to distribute the load evenly on the available nodes, elasticity to efficiently leverage computational resources, fault tolerance to ensure that the DSPS can continue processing in the event of failures, and improving the QoS.This is not a comprehensive set of migration goals but constitutes categories of fitting for the surveyed papers.While all goals of migration can be applied to any deployment environment, the surveyed solutions with load balancing and elasticity are mainly aimed at cloud-based DSPSs and executed within a single data center, whereas QoS optimization is normally carried out when an operator undergoes backpressure and needs an adaptation to improve the QoS, which can happen in any deployment environment.While migration is relevant for fault tolerance, few solutions describe it as a mechanism to facilitate reliable execution.Instead, solutions often use an upstream rollback approach [79] that replays to the new host tuples that are part of the failed operator.This method is not covered in this survey.
Table 2 lists the goals of migration and the overlap between studies in the area in terms of percentage.For instance, 45% of the surveyed papers on elasticity also consider load balancing.This is a common combination because load balancing can be used after performing a scaling operation to redistribute the load.Few fault tolerance-based solutions describe migration mechanisms, but it is natural that fault tolerance overlaps with load balancing or elasticity as they are often cloud-based solutions, and steps to restore the number of states of a node are similar to those of a scale-in operation.Approaches using QoS constraints on operators to determine when to migrate are often combined with load balancing, as one way to know that the workload must be rebalanced is if the QoS guarantees of an operator have been violated.Below, we analyze all the goals of migration except fault tolerance.QoS-driven migration is used to improve the QoS.Typically, the most important QoS parameters that need to be optimized are the bandwidth, availability, and latency.In a mobile setting, the goal of the placement of operators is naturally to remain close to the data producers due to the requirement of low latency, and operators attempt to do so by migrating when a new placement better fulfills goals related to latency and bandwidth than the given one.Another important parameter in mobile settings is energy preservation for resource-constrained nodes.In a cluster setting, the goal is often to ensure that the nodes are not overloaded and that the latency of the tuple is not too high.If the latency of an operator increases significantly, it might be migrated to a node that can provide lower latency.
The basic goal of migration is to improve the QoS.Most solutions are more specific about the goal of migration because finding the optimal solution is usually an NP-hard problem, which is unfeasible to solve for networks of most sizes.A simpler approach is to add constraints to the operator.If the operator cannot fulfill these constraints, it must be relocated.This is typically a much more scalable solution that looks for a placement that is good enough, instead of looking for the optimal solution.It is a push-based manner of letting the coordinator know when the operator needs to be relocated.It should be noted that constraints or thresholds are also often used to achieve the other goals of migration.
One characteristic of QoS-based migration is that it is mainly related to the migration of individual operators.
Load balancing is a necessity in distributed streaming systems because many network nodes are available and in use.Because streaming systems often have variable workloads, the coordinator should monitor the resource usage on nodes to ensure that neither the network nor the CPU cores become bottlenecks for the performance of the operators.
If resource usage on the nodes is unbalanced, the coordinator moves some of the tasks among the nodes.If these are stateful processes, the tasks to be moved must be paused, moved, and restarted on the new node.Load balancing-driven migration differs from QoS-based migration in the sense that the data consumers do not necessarily benefit much from the balancing, and in that multiple operators are usually migrated through load balancing.However, the decision on when to perform load balancing and where to migrate operators must still take into account the same concerns as for constraint violation, i.e., whether the cost of migration is worth the benefit of the new placement.
Elasticity refers to adding or removing operator replicas that facilitate parallel processing (also called operator scaling).For instance, a query with stateful windows that are grouped by a key can be run in parallel in different threads, where each thread is responsible for a subset of the keys.Four scaling operations are commonly used: • Scale up: Create a new process and migrate some partitions of existing threads to it.
• Scale down: Migrate all partitions of a thread to the other threads and shut it down.
• Scale out: Create a new worker to which some threads can be moved.
• Scale in: Remove a worker and move its threads to existing workers.
In a cloud setting, the scaling out of a streaming system means adding more servers to a cluster.The streaming system then automatically decides which operators to move to it and potentially scale up.Scaling in means the opposite: A server is removed from the cluster.First, all the server's operators are moved to other servers and some scale-down operations might be performed.Scaling in and out can be modeled as special cases of load balancing.When scaling out, a new container or virtual machine is started on a new machine, which is then added to a load balancing pool.The load balancer can then use this new machine for load balancing.When scaling in, a machine is eliminated from the load balancing pool, and at least its own state must be migrated to the other nodes.

Starting the migration decision process.
To determine whether migration should be performed, it is necessary to compare the current placement with an alternative placement to estimate the benefits of migration.If these benefits are significantly greater than the costs, migration is beneficial.However, the calculation of a new placement, its benefits, and the related costs might require a non-negligible amount of resources.As such, the naive approach to scheduling a migration decision with a fixed frequency might be too costly.Instead, some form of context awareness needs to be supported to detect changes in the system (e.g., related to workload, resource availability, or mobility) that indicate that there might be a good chance of determining a better placement.The relevance of such changes is generally implied by the goal of migration.Monitoring the runtime system is an important task in the context of detecting such changes.
The DSPS can also perform some book-keeping, like the number of operators a node hosts, and trigger a migration decision if a threshold is reached.
Load balancing systems make balancing decisions when the load imbalance of the systems is above a certain threshold.For elasticity-based solutions, checks on whether to automatically scale in or out are similarly performed using thresholds.If the system has a balanced load and its use is still above a given threshold, the system might decide to scale out.If the utilization is below a threshold, the system can scale in.If an operator has latency constraints that are not fulfilled, the coordinator can be notified that migration must occur.The coordinator can either be the node hosting the operator in a decentralized solution or a centralized controller in a data center.In all scenarios, a unit collects metrics from the runtime system in order to make a decision.
2.4.3Proactive versus reactive.Migration decisions can be made reactively or proactively.In the former case, a system migrates when the given situation calls for a change to be made, such as when QoS guarantees for an operator are not fulfilled.Proactive migration decisions rely on predictions about future changes that require migrations.
In several cases, the need for migration scales with its cost.For instance, if the migration is triggered when the tuple rate exceeds a limit and causes QoS violations, more tuples are affected by operator downtime when the need for migration is more pressing.In other words, the more pressing is the need to migrate, the higher is the cost of migration.
If a node is over-provisioned, and cannot handle a higher input rate for a given operator, the operator benefits from being migrated to another node.If this situation is detected when the input rate is already too high, a potential migration results in latency spikes for the affected tuples.However, if it is possible to predict that the tuple rate will increase, one can reduce the cost of migration by proactively migrating before the tuple rate becomes too high.
The cost-benefit analysis for making migration decisions is not trivial as the cost of migration is a one-time investment and the benefit from better performance is accumulated over time.When confronted with dynamic surroundings in stream processing scenarios, it makes sense to consider a given placement only for a given amount of time.This time can be regarded as the horizon for which predictions are made.Migration decisions are then made in such a way that the new placement amortizes cost during that time.As such, this time horizon is called amortization time.The notion of working with a limited future horizon for making optimization decisions is also used in model predictive control (MPC), and has been applied by De et al. [68] to make proactive scaling decisions.
The more the number of tuples that are impacted, the greater is the extent to which the given option is penalized.
However, the number of tuples impacted is an estimate that depends on the accuracy of the prediction.It is possible to assume that tuples are sent evenly across the time window of the horizon, in a single burst, as fast as possible, or a mix between the two.To make such predictions, it is necessary to collect metrics from upstream nodes to determine the density of distribution of the transmitted tuples.

Cost versus benefit.
Once the decision process has been initiated, it is necessary to determine a better placement and relate its benefits to the costs of the migration to determine whether to migrate.One clear approach to calculating a new placement is to re-run the original placement algorithm with the same objective function.Some of the data needed for calculating a new placement might be available from the monitoring component that triggers the migration decision.In most cases, additional live data must be collected, where this represents a substantial part of the overhead of making the migration decision.
The gain in performance owing to a new placement is generally reflected in the output of the objective function of the old placement versus that of the new placement.By optimizing the objective function during placement, a new placement that delivers the best performance is identified.The problem with simply migrating to the host with the best performance is that the cost of migration might be so high that it is not worth migrating.It might be that a sub-optimal placement is preferred in terms of the objective function owing to a lower migration cost, or maybe that no migration is worth it at all.What makes the comparison of cost and placement performance challenging is that they are not directly comparable.On the one hand, different metrics can be used to determine cost and performance, and on the other, the cost of migration is a one-time investment while placement performance continuously increases the overall benefit as long as there are no changes in the system.As such, there is a need to distinguish between the benefits of placement and migration.The benefit of placement simply expresses the difference in placement scores between a new host and the given host, while the benefit of migration is calculated based on (1) the cost of migration, (2) the placement performance, and (3) the amortization time.
Three common ways to avoid excessively frequent migrations have been discussed by Lakshmanan et al. [7]: (1) A threshold to ensure that the score of the new placement is significantly better than that of the current placement.(2) If the QoS guarantees of an operator are violated, it triggers a migration, which means that migrations are performed only when necessary.(3) Periodic re-evaluation of the objective function where the interval is set to be reasonably high.
In a more recent example, Buddhika et al. [36] regularly calculated interference scores of operators that describe the need for migration, and migrated them to a node where they were subjected to less interference.However, neither Lakshmanan et al. [7] nor Buddhika et al. [36] performed an explicit cost-benefit analysis.This is of interest to us not only to avoid excessively frequent migrations, but to understand why migration is worth it in some cases and not in others based on its costs and benefits.If a placement is an improvement over the given placement, we want to be able to state exactly why the migration is worth performing (or not) in a meaningful and understandable way.The amortization of the cost of migration is a simple goal to understand as long as one weighs the one-time cost of migration against the benefit of the continuous performance of the new placement, but this deliberation is often not presented explicitly.

EXISTING SOLUTIONS
We explore existing work on operator migration, and focus on their migration solutions, and the calculation of the costs and benefits of migration.We use the two research questions posed in Section 1 to guide the literature search.
In general, the investigated solutions assume complete consistency of the state of the operator.This means that after migration, the new host runs the operator with exactly the same state as the old host did before the migration without any loss of data.

Which mechanisms are used to perform migration?
As the volume and velocity of data have increased with the emergence of big data [80], the simple single-track moving state algorithm has become inadequate.Specialized and innovative solutions that provide no downtime and solutions that leverage fault tolerance mechanisms, such as periodically performed back-ups, have been designed.In this section, we explore the state-of-the-art migration algorithms and provide a historical perspective on innovations proposed.
Migration algorithms are characterized by their state and stream management.This involves executing certain tasks, such as redirecting, buffering, pausing streams, and moving states between nodes.Moreover, it is important to specify whether these tasks can be executed in parallel and where it is most beneficial to execute them.The most important properties identified in Section 2.3 are whether the algorithms require state migration and how this is performed, and whether they are single-track or parallel-track.Most of the investigated migration algorithms can be derived from these properties.For instance, some algorithms are centralized, and rely on a coordinator, such as [16,47,64], whereas others are decentralized and initiate migration on the operator host, e.g., [50,60].In some cases, multiple dependent migrations are planned and performed in sequence, but the details of managing multiple migrations are not investigated in this survey.Examples of such algorithms include load balancing, where many keys of an operator may be moved to a new location, and when an operator graph is distributed geographically and several operators are migrated, e.g., in TCEP [63].
The most basic algorithms are single-track without state migration, single-track with state migration, and paralleltrack without state migration, i.e., window-recreation.These were introduced together by Zhu et al. [75] and were later applied to the SPE CAPE [26].The authors discussed the steps of migration and cost models of the different algorithms.
They called them moving state, parallel-track, and pause-drain-resume migration algorithms.Using the terminology established in Section 2, the moving state algorithm is a single-track moving state, the parallel-track algorithm is a parallel-track without state migration, and the pause-drain-resume algorithm is single-track without state migration.
The paper by Shah et al. [16] forms the basis for load balancing, and presented a means of repartitioning keys in a key-value-partitioned operator state, which is relevant for cluster-based systems.We characterize this algorithm as a single-track moving state algorithm, but in which the operators are already running on the destination node.In contrast to some studies, e.g., by Qin et al. [13], we do not consider state movements in load balancing and operator migration to be fundamentally different, and posit that only the entities being migrated are different, i.e., keys are moved instead of operators.In load balancing, the entities being migrated are often a set of keys and their associated states, whereas in operator migration, the entities are usually an operator and its associated state.Control messages used for migration are typically embedded into the data streams.These tell the nodes that a migration will be performed, and might be used for other coordination tasks.Sometimes, this message is sent only to the old or the new host; at other times, it is sent to the old and the new host, or to upstream nodes, old and new hosts, and downstream nodes.There may be many reasons for notifying different nodes about migration, such as updating the view of where key partitions are maintained, and routing streams.We describe only a subset of control messages for each algorithm, and they describe the essential tasks that should be executed to perform stream and state management tasks.In the illustrations, the control messages described are shown in blue whereas the other messages are shown in red.In addition to the visual representation of the topology and messages sent among nodes, the essential tasks to execute during migration are shown in a listing.The first blue control message from the coordinator, which features in step one of each algorithm, is shown in this listing.All other control messages represent subsequent steps in the algorithm that are described in the first blue control message.
3.1.2Standard moving state.The standard moving state algorithm (see Figure 3) uses direct state movement between the old and the new hosts, and the entire state is sent all at once.Aside from moving the state, migration requires changing the stream routing.Figure 4 shows the steps involved in the standard moving state algorithm developed by Shah et al. [16].They proposed an operator called flux that can adapt the state partitioning of the pipelines of dataflow using a state movement algorithm.Other state movement algorithms perform mostly the same steps, but may differ in how stream management is performed or the functions of particular nodes.
Our interpretation of the moving state algorithm's blue migration control message from Step 1 in Figure 4 [16] the task of migrating the state from the old host to the new host is issued to the former, after which the streams are resumed.Instead of stopping the upstream nodes, other solutions [47,64] redirect streams from the upstream nodes to the new host.The new host buffers the streams and starts to process them when the state from the old host has been received and installed.Other solutions send the control message to the old host instead of the upstream nodes [50], or even to the new host [20].[32] and a checkpoint-assisted state-recreation algorithm in [28].ChronoStream [71] performs a checkpoint-assisted state-recreation migration of state slices to provide horizontal elasticity.UniMiCo [77] (uninterruptable migration of continuous queries) is a direct window-recreation algorithm that can handle both time-based and tuple-based window semantics.
StreamCloud's [64] state-recreation and window-recreation algorithms are shown in Figure 5 and Figure 6.In both algorithms, a handover between the old and new hosts is scheduled using a timestamp.In window-recreation, the handover is performed in a way such that the old host empties its windows and the new host fills them in parallel, resulting in a smooth handover.For this purpose, the upstream nodes send tuples to both the old and new hosts.In state-recreation, the old host sets the handover timestamp immediately before serializing and transmitting the state to the new host.Any subsequent tuples with a timestamp lower than the handover timestamp are processed by the old host, and the other tuples are processed by the new host.Operator downtime can be avoided here if the handover timestamp is set to a time after the new host is expected to have received the state and started its execution.When the state is received by the new host, it processes all tuples it receives from the upstream nodes in parallel with the old host, but produces only tuples caused by input tuples with a timestamp higher than the handover timestamp.Our interpretation of the window-recreation algorithm for the blue migration control message from Step 1 in Figure 5 is described in Listing 2. The control message is sent by the coordinator to the upstream nodes.From there, it is forwarded to the old host, which schedules the takeover time for the new host and sends it to the upstream nodes.From then on, the upstream nodes send tuples to both the old and the new hosts.The new host processes the same tuples as the old host but only according to the extent of windows newer than the ones in the old host.It can be assumed that the new host knows that it should not produce any tuple until the extent of the windows have been filled up, i.e., after the old host has stopped processing.
Our interpretation of the state-recreation algorithm for the blue migration control message from Step 1 in Figure 6 is described in Listing 3. The control message is sent by the coordinator to the upstream nodes.This algorithm requires slightly greater coordination between the old and the new host than in case of window-recreation because the old host must move its state to the new host, and the latter needs to know the takeover time.S t a r t Q u e r y ( q u e r y ) S c h e d u l e ( TakeoverTime ( q u e r y ) S t a r t S t r e a m s ( S t r e a m s ( q u e r y ) ) ) ) C o n t r o l M e s s a g e ( Upstream , S c h e d u l e ( RemoveNextHop ( S t r e a m s ( q u e r y ) , OH) , TakeoverTime ( q u e r y ) ) AddNextHop ( S t r e a m s ( Upstream ) , NH ) ) ) M o v e S t a t e ( query , NH ) ) 3.1.4Indirect state movement.Gedik et al. [76] described a state migration algorithm for load balancing that has been used as the basis in several studies [31,33,68].They proposed an operator that outputs to multiple replicas partitioned by keys, called a splitter, that can decide to change the distribution of the keys, which requires state migration between replicas.Moreover, they introduced a two-phase approach to migration: donate and collect.In the donate phase, the state to be migrated is moved from the old host's in-memory store to a backing store.In the collect phase, the new host retrieves the state from the backing store.This method was subsequently used by Cardellini et al. [31] and Li et al. [33] to implement features of elasticity in migration in Apache Storm.The drawback of this method is that streams from the upstream nodes are paused during execution.De et al. [66,68] defined a similar state migration algorithm.However, their implementation contains a number of improvements, e.g., the splitter can send new tuples during state movement instead of blocking until migration is complete.The benefit of the two-phase approach is that it involves an API where an operator simply requires implementing methods to extract the state, and sends it to a backing store instead of requiring intricate communication among operators.Moreover, it can use existing fault tolerance mechanisms that periodically create checkpoints of states for the backing store.
In the donate phase of the algorithm proposed by Gedik et al. [76], replicas place the state to be moved into packages, one for each replica that takes over the state.The data are moved away from the in-memory store of the replicas to a backing store.A vertical barrier is used across the replicas to ensure that they do not progress to the next phase until all packages have been donated.In the collect phase, the replicas check the backing store for any packages that contain the state that they take over and restore it.Following this, a horizontal barrier is used to prevent the splitter from sending any tuples until the migration process has been completed.

Partial state movement.
With partial state movement, the state is partitioned and each partition is moved individually.The aim is to minimize operator downtime in state movement algorithms.MigCEP [60] is an algorithm designed for frequent migrations to minimize downtime.The state is split up into two parts: immutable and mutable.An immutable or static state includes the operator and, possibly, databases whose data have not changed during migration.
A mutable state consists of tuples that are being processed in the operator.
A further improvement involves sending the last checkpoint of the state to the new host before the operator goes down.This is the case in ChronoStream [71] and Rhino [47], where the state is split into a state before the operator is migrated, and an incremental checkpoint that includes the new state after the first part has been extracted.This can be seen as analogous to the immutable and mutable states described in MigCEP [60].
Hoffman et al. [45] introduced another technique called Megaphone for migrating many keys in an efficient way to minimize latency spikes.In this case, the state is split into many equal-sized parts.Each causes some downtime for the system.However, while the total migration time increases, the spikes due to tuple latency are substantially reduced compared with when the entire state is sent all at once.Fragkoulis et al. [2] distinguished between all-at-once and continuous state movements, which are identified in this survey as all-at-once and partial state movements, respectively.Megaphone, Rhino, and ChronoStream are characterized in this survey as exemplars of partial state movement, while Fragkoulis et al. categorized Megaphone as using continuous state movement, and Rhino and ChronoStream as using all-at-once state movement.The reason for this difference is that ChronoStream and Rhino rely on distributed checkpoint replication, and need to only send the state that has been built up since the last checkpoint.In this survey, migration is further divided by distinguishing between solutions that use distributed checkpoint replication and ones that do not.Megaphone does not use it, and sends the entire state directly from the old host to the new host, whereas ChronoStream and Rhino depend on distributed checkpoint replication.If Rhino and ChronoStream do not use distributed checkpoint replication, this means that the initial checkpoint is sent from the old host to the new host instead of existing on the new host already.Therefore, using partial state movement is not necessarily an indication that multiple states are sent during migration.This is demonstrated in Section 4, where an implementation inspired by Rhino is presented that uses partial state movement without distributed checkpoint replication.When migration starts, the newest checkpoint is sent before an incremental checkpoint.
3.1.6Distributed checkpoint replication.Some solutions leverage fault tolerance mechanisms to improve the scalability and performance of migration using periodically updated, and distributed and replicated checkpoints of the state of stream processing.Since these algorithms use checkpoint solutions that may already exist, they are called checkpointassisted algorithms.If the target of migration is a host that already contains the state, a migration algorithm can be as simple as one that loads the checkpoint in memory and replays the upstream tuples to the new host.This requires exactly-once guarantees, as provided by pub-sub systems such as Kafka [74].A parallel-track algorithm can work similarly, but, instead of stopping the old host before replaying tuples on the new host, runs both the old host and the new host until the latter takes over.In this process, output tuples need to be filtered to remove duplicate tuples.
ChronoStream [71] uses distributed checkpoint replication to implement a parallel-track algorithm, Rhino [47] to realize a single-track algorithm, and the proposal of Madsen et al. [28] to carry out both.The algorithms often also use partial state movement when updating checkpoint replicas to send as little state as possible.
Del et al. [47] introduce a checkpoint-assisted single-track migration mechanism that can migrate state sizes of up to terabytes 15 times faster than the state-of-the-art solutions by using incremental checkpointing.Their algorithm is shown in Figure 7.Most of the state is sent before the old host is stopped.Afterward, it sends an incremental checkpoint that represents a change in the original state.In this way, only tuples that arrive after migration has started need to be migrated in the incremental checkpoint.This algorithm is a cluster-based migration algorithm that is executed by a handover manager (HM).The HM informs all workers about the migration and about what will happen, by injecting a control message into the source streams, which is a functionality inspired by Chi [41].Afterward, the source nodes are redirected.When the old host has received a control message on all of its incoming streams, it sends the state to the new host.The intermediary hosts, including the old and the new host, send control messages to their next hop nodes.When the nodes have completed their tasks, including the redirection of streams and the migration of state, they acknowledge the HM.The migration is complete when all nodes have acknowledged the HM.Our interpretation of the checkpoint-assisted moving state algorithm's blue migration control message from Step 1 in Figure 7 is described in Listing 4. The control message is issued to the upstream nodes, which forward a control message to all downstream nodes.We describe tasks that the old host might be assigned.The main difference between this algorithm and the standard moving state algorithm is that most of the state is assumed to be on the new host before the migration starts.As such, when the state is moved, it is moved using the partial state movement task MoveIncrementalState instead of MoveState.C o n t r o l M e s s a g e (NH, S t a r t S t r e a m s ( S t r e a m s ( q u e r y ) ) ) ) ) Wu et al. proposed ChronoStream [71], a checkpoint-assisted state-recreation migration algorithm that provides horizontal elasticity, as illustrated in Figure 8.The states of all tasks on a node are periodically backed up and sent to the other nodes.As a result, migration only involves updating a subset of the backed-up state, which significantly reduces the number of states to be moved.This process is split into four phases: migration preparation, state rebuilding, dataflow rerouting, and resource release.The first phase sets up a container for the operator on the destination node if this has not been already done.In the second phase, the new host fetches the operator's state locally or remotely and rebuilds it, and notifies the master node when finished.The third phase involves the master telling the data sources to send tuples to the new host as well, including any tuple that is not included in the state that the new host received.At this point, the new host participates in the processing and produces the same tuples as the old host, and duplicate output tuples are filtered out by downstream operators based on the sequence numbers of the tuples.Finally, the controller tells the old host to release the resources such that the new host is the only node running the operator.Our interpretation of the checkpoint-assisted parallel-track algorithm for the blue migration control message in Figure 8 is described in Listing 5.The main difference between the parallel-track checkpoint-assisted algorithm and a non-checkpoint-assisted algorithm is that the immutable state is sent before anything else happens.

Listing 5. Checkpoint-assisted parallel-track # B o o t s t r a p p i n g C o n t r o l M e s s a g e (OH, R e p l i c a t e C h e c k p o i n t (NH ) )
# M i g r a t i o n C o n t r o l M e s s a g e ( US , AddNextHop ( S t r e a m s ( q u e r y ) , NH) RemoveNextHop ( S t r e a m s ( q u e r y ) , OH ) ) C o n t r o l M e s s a g e (NH, C o n t r o l M e s s a g e (OH, M o v e I m m u t a b l e S t a t e ( query , NH ) ) ) C o n t r o l M e s s a g e (OH, S t o p Q u e ry ( q u e r y ) )

How is the migration decision executed?
We first provide an overview of the parameters of optimization, and the cost and benefit metrics used in existing work (Table 3 and Table 4).We then describe (1) how cost values are estimated and measured, (2) approaches for optimization to increase benefits, and (3) reactive and proactive methods.
Even though there are many different definitions of the parameters of optimization, they are often related.Therefore, we group them in Table 3 into five categories: network performance (e.g., bandwidth, bandwidth latency product), tuple performance (e.g., tuple latency, tuple rate), load, costs of migration, and monetary costs.Since the goal of migration is important for optimization, we differentiate among the categories of parameters of optimization in research according to the goals of migration.The most prominent goal is load balancing, and load is the most commonly used optimization parameter.While monetary cost is the least commonly used optimization parameter, migration is often used to avoid the need for over-provisioning, and thus, indirectly saves money.Table 4 gives an overview of the metrics used to define the estimated and measured costs of migration, the estimated placement, and the measured benefit of migration.Ideally, the measured and estimated metrics should be identical but they are not.One reason for this mismatch is that it is much easier to measure values for certain metrics than to predict them, like tuple latency and tuple rate.Values for costs and benefits need to be estimated for each decision to migrate before the migration is performed, whereas values of the evaluation metrics are measured during migration.The mismatch between the estimated cost and the benefit, and the measured evaluation metrics might also help complement future migration decisions and the assessment of migration using further metrics.The most commonly used parameters to determine the cost of migration are the migration time and state size, and few systems use more precise cost   [20] Tuple perf.

Migration cost.
Accurately defining the cost of migration is essential for making the correct migration decisions.
The column for the modeled cost of migration in Table 4 describes the metrics used to represent cost in existing solutions.
The column of the measured cost of migration to the right describes the metrics used in the evaluation of existing solutions.It can be manifested in any kind of a degradation of execution, such as decreased throughput or increased tuple latency.The table shows that the metrics used to measure the cost of migration are not the same as the ones used to model it.
The vast majority of solutions use migration-specific metrics to model its cost, as opposed to metrics that are used for measuring its cost and benefit.Tuple processing performance is used in many cases to model and measure benefit, but very few approaches have used it to calculate the cost of migration.Heinze et al. [25] modeled and predicted tuple latency as part of the cost of migration but provided no solution to model the tuple rate because it is much easier to measure tuple processing performance than to predict it.Operator downtime is an indicator of spikes in latency but also depends on the tuple rate.
The migration time and the size of the state to be moved are the most common costs of migration metrics.Migration time is typically calculated as a function of state size, bandwidth, and latency.In environments where the bandwidth and latency are stable, such as within data centers, the state size is often interchangeable with migration time.In most cases, the migration time is assumed to be easy to model and no calculation for it is given.Some solutions can migrate multiple operators at a time, and thus define the migration time as the maximum time it takes to move any of the operators [42,56].Cardellini et al. [42] used a data center-based solution to define the operator downtime based on the type of adaptation made, size of the state to be moved, and the round-trip delay between the nodes and the resources.WASP [56] is a wide area network solution that defines the time it takes to move an operator based on the state size and bandwidth between links, the latter of which is significantly more limited and variable in a wide area network than a data center.Zhu et al. [75] focused on the time needed for each step of migration, such as the time spent cleaning the accumulated tuples, state matching, moving the state, and recomputing it.
Using state size as the cost of migration is among the easiest ways of defining cost because it requires only looking at the size of the state to be migrated.The solutions that we survey that use state size as cost metric all cloud based, which makes sense since data centers feature a high and stable bandwidth between nodes, in contrast to geo-distributed environments.The state size is frequently used as part of the objective function when making migration decisions [20,27] as part of a constraint to prevent costly solutions from being selected [37], and can even be the only criterion to minimize when making load balancing decisions [40,43].
Luthra et al. [63] used the number of control messages during migration as part of the definition of the cost of migration.This parameter is significant because if nodes have to wait for acknowledgments for these messages, the total migration time then depends on the distance between nodes.When the cost of migration is defined in terms of migration time, only the time taken to move the state is generally included in the equation, and might result in an inaccurate view of the cost.
The bandwidth delay product is a measure of how much data can be sent in a given duration.As part of the cost of migration, it represents the amount of data that can be sent when a migration is underway.The more tuples that can be sent, the higher is the cost of migration, and the less desirable a migration is.MigCEP [60,62] uses the average bandwidth delay product during migration as its cost.This represents the utilization of the network due to migration.

Benefit.
In this section, we discuss the goals of optimization of different migration solutions.This includes the most important metrics used for determining the benefit of migration.The benefit of migration is based on performance in terms of the placement, amortization time, and the cost of migration (as explained in Section 2.3).This is either explicitly defined or implicit in the decision-making, where the goal is to maximize performance in terms of the placement and minimize the cost of migration.
One could argue that all goals of optimization are relevant to all goals of migration.However, some are more tightly coupled than others.For instance, load balancing involves using the load of a system to make balancing decisions.
QoS solutions, on the contrary, are not bound specifically to any goal of optimization.Elasticity-based solutions aim to minimize resource usage while maintaining the QoS.In other words, they use as few resources as possible for an application, and trigger a scaling operation when the load is above or below a given threshold.Fault tolerance-based solutions involve migrations when nodes fail and the operators must be migrated to new or existing nodes.
Network performance as a goal of optimization means using the quality of the network links to determine performance in terms of placement.Important metrics in this context include the bandwidth between links in the overlay topology, the latency between nodes, and the bandwidth delay product.Tuple processing performance in query processing is the most popular indicator of the quality of an adaptation, as shown by the number of studies that have measured the benefit of migration in terms of tuple latency or rate.If a node is overloaded in a data center, the latency of the tuple might exceed acceptable levels, leading to QoS violations.A long migration time might temporarily worsen performance, but if the general gain in performance outweighs the degradation in it, the migration is considered worth it.The load of a system is an important goal of optimization that makes it possible to run as many operators on a node as it can handle, and to make changes when the workload is above or below a given threshold.The cost of migration is essential to consider when making migration decisions to avoid excessively frequent migrations and ensure that the benefit of the new placement outweighs the cost of migration.When the cost of migration is used to calculate its benefit, the result is the modeled benefit of migration.Monetary cost can be useful as a goal of optimization to make a tradeoff between the cost of resources and the performance of the system.
Network.In decentralized fog and edge computing solutions, network usage as well as bandwidth and latency between links are crucial metrics.Pietzuch et al. [73] developed an overlay network that can make network-aware placement and migration decisions.Parameters, like the latency and bandwidth of overlay links and the load on nodes are used as criteria of optimization when placing and migrating operators.Rizou et al. [53] implemented a similar method that converges to the optimal placement in fewer migrations than Pietzuch's solution.
Tuple performance.In a resource-constrained environment, tuple latency can be an indicator of the energy consumption and the goal to minimize latency can implicitly lead to energy reduction.For most surveyed approaches, the goal of migration directly or indirectly involves improving performance.Most elasticity-based and load balancing-based solutions are cluster-based, and are more concerned with the load on the system than the bandwidth of or latency between links.The tuple rate of a data stream is an indicator of the load on the nodes, and can be used to calculate the variance in load.For instance, Buddhika et al. [36] proposed a methodology to reduce the interference between stream processing operators using migration.To achieve this, the interference score of an operator is calculated, where the higher the score is, the greater the need is for migration.This interference score is based on the prediction of future packet load.Repantis et al. [51] defined latency constraints on the operators and used tuple latency to determine when an operator must be migrated.
Load.Unsurprisingly, all load balancing solutions use either load as a parameter when making decisions or tuple performance to estimate load.Gedik et al. [27] propose two methods.First method minimizes the variance in load between nodes in a cluster.In this case, a coordinator monitors the load on the systems and, when a balancing decision has to be made, selects the configuration with the least variation in load.The second method triggers a load balance when the imbalance has crossed over a given threshold to re-balance the load to at least below another threshold.In other words, load balancing is used as a constraint.In this case, the goal is to minimize the cost of migration by redistributing the minimum amount of load to achieve an acceptable load balance.The benefit of the latter over the former method is that the former might require expensive migrations of large loads among many nodes, and redistributing loads that are not the cause of the imbalance.On the contrary, the latter method achieves an acceptable load balance while moving the smallest load.
Elasticity-based solutions increase or reduce the number of resources used by an application based on its variable workload.If a cluster is overloaded after load balancing, this is a sign that the system should scale out [65].In decentralized fog-based solutions, the load of a system is not known beforehand, and therefore, there might be a tradeoff between latency and load.Pietzuch et al. [73] introduced a cost space model in which a topology of systems is constructed based on the latency and bandwidth between nodes as well as the load on systems.If the load of a system is large, the relevant node appears farther in the cost space when mapping an operator graph to a physical topology, and thus is less likely to be selected.
Monetary costs.In the cloud-based model followed by fog and edge models, users mostly pay based on usage.Users can allocate a certain amount of resources and scale out or in whenever more or less resources are needed, respectively, and this is paid for based on usage.A complicated issue in this case is balancing the monetary costs with the benefits of improved placement.None of the load balancing solutions uses monetary cost as an optimization criterion.This makes sense as the load balancing problem involves evenly distributing the load over a fixed amount of resources, whereas elasticity can increase or reduce the amount of resources.In terms of hardware resources, there is nothing to optimize as they are already paid for.The monetary cost of moving states during migration can thus be minimized.
Typically, this is implicitly done by designing the objective function to minimize the number of state that need to be moved.Elasticity-based solutions require a tradeoff between resource usage and monetary costs [30,35,54].An elastic solution might use a threshold for the load to determine when to scale out.However, deciding when to scale in might be more complex, considering that it requires a certain downtime for the worker to be removed.

Costs of migration.
Most studies prevent the cost of migration from affecting the QoS by implicitly minimizing the number of migrations, their frequency, or their magnitude.Zhou et al. [19] emphasized the need to minimize the time needed for query migration but did not describe a means of implementing this in their solution.Lombardi et al. [38] defined the cost of migration in terms of the time it takes to perform different steps but did not attempt to minimize it.The cost of migration can be minimized by either using single-objective optimization [27,43,46,56], or simple additive weighting (SAW) with multiple objectives [20,22,42,54,59].If only the cost of migration is minimized, constraints have to be placed on the quality of the placement to ensure that the selected placement is acceptable.With load balancing, minimizing the cost of migration while maintaining constraints on the load imbalance is a good way to ensure a balanced load that minimally affects the performance of the system.
Minimizing the number of migrations is a similar goal to minimizing the cost of migration.Repantis et al. [51] proposed a hotspot alleviation-based solution with the goal of minimizing the number of migrations that leads to an acceptable QoS for the operators.Rizou et al. [53] implemented a similar relaxation algorithm to the one in [73], and showed that it requires fewer migrations before converging to the optimal placement and fewer control messages.The easiest way to prevent needless migrations is to use a threshold that ensures that they are beneficial.Load balancing systems commonly use thresholds of load imbalance to ensure that the load is redistributed only when the load imbalance is above a threshold.A different type of threshold targets the migration itself to ensure that its benefit is worth its cost.
Pietzuch et al. [73] proposed a method that migrates data only when the benefit in terms of network capacity is higher than a threshold based on the cost of migration.
Using the cost of migration as a goal of optimization means penalizing a placement alternative based on it.Even if a placement is better than the given placement, it might not be preferred because the cost of the reconfiguration is too high.In load balancing-based approaches, the cost of migration is commonly minimized but most often as an implicit goal rather than as part of the objective function.The goal is generally to achieve an acceptable load distribution as quickly as possible, and the redistribution itself constitutes the highest cost.The cost of migration can be minimized while maintaining the load balance [43].The number of migrations can be minimized while fulfilling QoS requirements [51].Another way is to maximize the improvement in a query plan and divide the improvement in performance by the cost of migration [22].Gedik et al. [27] explored three ways of making load balancing decisions using load and the cost of migration.First, they minimized the cost of migration while the conditions on load balancing were used as constraints.Second, the ideal cost of migration was used as a constraint to minimize load imbalance.Finally, a flexible solution was proposed that combines both load imbalance and the cost of migration as part of an objective function.
In elasticity-based approaches, the cost of migration is often considered in the same way as in load balancing because scaling can be considered to be an extension of load balancing.Zacheilas et al. [30] minimized the monetary costs of computational resources, the cost of migration, and the cost of missing tuples.In this approach, a tradeoff is made between the cost of resources, the cost of missing tuples, and the migration time.A reinforcement learning-based approach was used in [54] that minimizes the cost of reconfiguration, the performance penalty due to QoS constraints, and the cost of resources for using the computational resources.

Proactive migration decisions.
Current migration solutions generally use reactive approaches to make migration decisions.For instance, a migration might be triggered if a node is overloaded and QoS guarantees are violated, such as when the tuple latency increases excessively.Most proactive solutions predict whether the node can sustain the workload.
Some solutions predict the adaptability of QoS violations [51,65].Repantis et al. [51] used linear regression and the incoming tuple rate to predict QoS violations of the end-to-end execution time.They predicted QoS violations to prevent them.Lohrmann et al. [65] built a predictive latency model using queuing models to make scaling decisions.
Zacheilas et al. [30] estimated the load and expected latency of Esper to make scaling decisions by using Gaussian processes [82] because they can help estimate the uncertainty in predictions.However, this method has a cubic computational complexity due to the use of matrix inversion.Wang et al. [39] predicted resource usage in real time to choose the configuration that can minimize CPU and memory resources while fulfilling QoS guarantees.This is done using incremental learning techniques based on Weka [83] and MOA [84].De et al. [68] used the MPC to predict optimal scaling decisions, called the future horizon.Buddhika et al. [36] used prediction rings to forecast the interference score that expresses the degree to which a system is expected to be overloaded.Lombardi et al. [38] used a reactive and a proactive mode for their Elysium system.In the former case, the tuple rate is used as the basis for decisions, and in the latter, the input load is predicted over a certain time, called the prediction horizon.Liu et al. [44] predicted the load of operators as the number of tuples that operators need to process during a prediction horizon.In WASP [56], the expected input and output rates of the operators are estimated as an alternative to backpressure monitoring for estimating load.Backpressure is weaker as it is based on the observed load instead of the actual workload, and this may lead to less accurate adaptation decisions [85].A composition of reactive, proactive, and delayed migrations was presented in [86].The results of this empirical study indicated that knowledge of the window state can be used to schedule a migration when the state is minimal (i.e., after completing a tumbling window, as in [63]), or when no output tuple is affected by the migration.

EMPIRICAL QUANTIFICATION OF CORE CONCEPTS OF MIGRATION
In this section, we apply the core concepts defined in Section 2 and surveyed in Section 3 to gain empirical insights.This is useful for two aspects of migration: the migration algorithm and the decision model for migration.We argue that it is important to model the tradeoff between the cost of migration and its benefit, and that this can be quantified and evaluated.Minor tweaks to the migration algorithm can result in significant changes in its performance and the correctness of its queries, such as an inability to correctly buffering tuples.This highlights the importance of using a common language when defining and using migration algorithms.
We first define two direct moving state migration algorithms: (1) one that uses partial state movement, and (2) another that sends the entire state at once.They are defined in an abstract way such that they can be implemented in the Apache Flink and Siddhi SPEs.We then define decision models to determine when and where to migrate the data.
We conducted a real migration experiment to analyze the migration algorithms based on the NEXMark benchmark [6].
We also show a use case of the decision models for migration to illustrate their effect on decision-making.

Migration algorithms
The difference between the partial state movement algorithm and the all-at-once state movement algorithm is that the former splits the state into a large static state and a small dynamic state.The static state is transmitted while the operator is still running and processing tuples, followed by the extraction of the dynamic state.As such, static state transmission involves little or no overhead in query processing, and constitutes only one additional step in the algorithm.Note that a partial state movement algorithm might split the state into more than two parts, such as in Megaphone [45].
We use the algorithms described in Section 3 as basis.In particular, we divide the algorithms into functions that are executed by different nodes participating in the network according to their roles.When moving the state, the old host provides the next hops for the query.Thus, there is no need to add them explicitly in these tasks.These tasks follow a similar format to that used in Expose [87], which is a framework and toolset for efficiently defining and executing DSPS experiments.Wrappers for different SPEs are provided such that all SPEs support a common set of tasks.Expose has been extended with additional tasks to enable operator migration.
Listings 6 and 7 describe the tasks we use to define the all-at-once state movement algorithm and the partial state movement algorithm, respectively.They differ slightly from similar algorithms in Section 3.1 in some respects, such as the ways in which streams are managed.It is possible to send a batch of tasks to upstream nodes, as in Flux [16], to the new host as in [20], and to the old host as in [50].The only difference between the all-at-once state movement and partial state movement algorithms is that the latter involves sending the state of the static query before redirecting the upstream nodes.
Listing 6. All-at-once state movement there are limitations on how large the state can be.On the contrary, Flink writes the state as a set of checkpoint files, each of which does not exceed a configurable size.Therefore, the state to migrate with Flink can be larger than in Siddhi.
The implementation of the other tasks, including BufferStreams, StopStreams, ControlMessage, and AddNextHop, is supported through simple tasks defined in the SPE wrapper in Expose [87].
The standard moving state algorithm is implemented in Flink and Siddhi, but only Flink supports partial state movement since this requires the ability to split a given state into a large, immutable state and smaller incremental checkpoints.This feature is supported by one of the state backends in Flink called RocksDB [88].Flink with RocksDB is also used for the checkpoint-assisted algorithm in Rhino [47], which uses partial state movement.Another benefit of RocksDB is that it does not store the entire state in memory while the system is running, but instead writes it to file and minimizes its size based on multiple criteria.

Decision models
As discussed in Section 3.2, there are many different ways of making the migration decision.Our solution is to make the decision process as transparent and meaningful as possible by optimizing the QoS.The goal is to maximize the performance of a placement while penalizing it based on the cost of migration, which varies for different nodes and is zero for the current host.In this way, it is clear why a new placement is selected over the old one, for reasons other than simply that the old host is over-provisioned or the new placement delivers better performance.
The amortization time () varies depending on the reliability of a placement score for the operator on a given host.
If the placement score is stable over time, the amortization time increases since it is less likely that the placement becomes suboptimal shortly after the migration.For instance, a mobile node might have less consistent placement score than a server located in a data center, and as such, it is even more important that the migration is worth the cost of it.
When defining the cost of migration, operator downtime alone is not sufficient because it does not reveal how many tuples, if any, are affected by the downtime.Therefore, we use the tuple rate during the migration as a foundation for the cost of migration.Since the data sink waits for tuples from the operator, we consider the number of expected output tuples   (, ) that are affected by the migration to calculate its cost.Buddhika et al. [36] describes a tuple prediction method that can be applied here.
where   (, ) is the predicted number of input tuples for operator  during amortization time  and  () is the selectivity of operator .
The cost of migration can be calculated as the operator downtime divided by the amortization time.Since we focus on output tuples from the query, the cost of migration  (, ℎ, ℎ) is defined as the ratio of the predicted output tuples (  ( (ℎ, ℎ, )) from a query during migration to the output tuples predicted from it during the amortization time (  ( (ℎ, ))).
(, ℎ, ℎ) =   *   ( (ℎ, ℎ, ))   ( (ℎ, )) The cost has a weight associated with it, meaning that the system can dynamically change how much the cost of migration matters based on the selected policy.If   is set to one, this suggests that the performance of a placement should be reduced in proportion to the number of tuples that are received during operator downtime.If   is set to 1.5, the placement is penalized further.This makes sense as buffered tuples may take some time to process, during which no new tuples may be processed.
Given the amortization time, the benefit of the migration   (, ℎ, ℎ) of a placement is its finite performance penalized by the cost of migration, instead of it being a general placement score.Of two placements with the same migration cost, the one with the higher placement score is selected.The only difference arises when two placements have different costs of migration, for instance, when comparing the given placement with zero cost of migration with another placement that requires a migration.
The benefit of migration can be calculated as where  (ℎ, ) is the estimated placement score for the new host ℎ running operator .
The above functions show how the migration decisions are made.Migration checks are periodically performed by calculating the placement score.Following this, the benefit of the migration of placements is calculated by penalizing the placement score based on the cost of migration.We define  (ℎ, ℎ, ) as the potential host with the maximum benefit for the given operator.This host is selected as the future host for the operator, and triggers migration if it is not the given placement.

Empirical evaluation
We quantitatively analyzed our proposed decision models for migration through a use case and our migration algorithm through experiments.The goal was to show the usefulness of incorporating the cost of migration into the process.We considered a use case for the decision models because it makes the analysis and discussion of the results easier.On the contrary, implementing and running the migration algorithms on SPEs is necessary to understand the impact of migration.
Figure 9 illustrates our evaluation scenario, Figure 9a is the operator graph used for both the use case and the migration experiment, and Figure 9b is the DSP overlay topology.The mapping from the operator graph to the physical topology is demonstrated using the decision models in Section 4.3.1, and an experiment involving the migration of state from the join operator on one node to another is described in Section 4.3.2.

Decision model use case.
The decision models for migration were assessed in this use case.They were applied using a prediction model oracle with 100% accuracy to make migration decisions.We expect that the migration time can be predicted based on periodically updated topological information and network statistics.By using knowledge of the number of tuples sent in the time window and the migration time, we can predict the total end-to-end latency of the tuples during a given time window.The parameters of the use case are provided in Table 5.We considered two source nodes A and B, three potential hosts C, D, and E, and a sink node F.
Results.Table 6 shows the results of the use case for the configurations given in Table 5.The predicted tuples (PT) during the migration kept increasing through the runs, leading to more disruptions to potential migrations.The cost of Therefore, the benefit of migration of Node D was also the only one that changed significantly in the different runs.
Node E had a better QoS than Node C, at 2.4-2.7 compared with 1.4-1.7.The cost of migration reduced the score of Node E by 50% because it did not have enough time during the amortization window to pay off the cost of migration.
The two right-most columns show the optimal host when the cost model (CM) was considered and when no cost model (NCM) was considered.Node E was the optimal host in all cases in which the cost model was ignored, and was never optimal when the cost model was used because the cost of migration was so high.As Node D incurred a relatively low cost of migration and eventually reached a high QoS, it became the optimal host in Scenarios 3 and 4 when the cost model was used.Because the QoS was stable for Node E, and was significantly better than that for Node C, it was possible to dynamically increase the amortization time for nodes that had demonstrated their stability in terms of the predicted QoS.Node D, on the contrary, has a significantly variable QoS, which increased above that of Node C with a value of 1.6 in the second row, but yielded a lower benefit of migration of 1.36, and was not selected as the new host.With a score of 2.5 that was reduced to 2.125 given the migration cost, it beat the given host, and was selected as the new host.

Migration experiment.
In this experiment, we evaluated the proposed migration algorithm by analyzing its execution and comparing its results with those obtained when executed with two SPEs.The all-at-once state movement runs were used to send 100,000, 1,000,000, and 5,000,000 tuples.The partial state movement runs were used to send 1,100,000 and 5,100,000 tuples.The additional 100,000 tuples with partial state movement were sent during migration, and were part of the dynamic state to be sent.The experiment used a simplified version of the topology in Figure 9b in two ways.First, there was only one upstream node.Second, there were only two hosts: the old and the new host.
The experiment tested the cost of migration by varying the size of the state to be moved.For runs of the partial state movement, the number of tuples that were migrated during static state migration and dynamic state migration were varied.The dataset of the NEXMark stream processing benchmark [6] was used in the experiment.NEXMark is based on an auction scenario, where three streams are used: a person, a bid, and an auction item stream.For this experiment, only one of the queries was used, one that joined the person and bid streams.We used this query because a join query makes it easier to test the migration algorithm and adjust the size of the state to migrate.One can simply send a given number of tuples of the first stream, migrate it to the new host, and send a single tuple of the second stream to the new host.If this triggered the correct number of output tuples to be produced, the migration was considered to have been successful.
Four processes with different roles were used in the experiment: a data producer node, a host running the operator to be migrated, a new host that contained the operator after migration, and the data sink that consumed the output tuples of the operator.We used two machines for the experiment, one for the old host, and the other to run the data producer, data consumer, and new host.The specifications of the machines are shown in Table 7.In the experiment, the data producer generated a certain amount of auction tuples that were sent to the old host.The state was then migrated to the new host, and the data producer sent a single person tuple that joined with all the auction tuples to trigger the same number of output tuples to be sent to the data sink as auction tuples that were sent prior to the migration.The query we used was a modification of NEXMark's [6] Query 8. Originally, this query does not select the itemName of the auction, but chooses the person's name.Each auction tuple was augmented with 1 kB of a randomized string to increase the size of the state to be migrated.In all runs, we counted the number of tuples that were migrated, the state size, the state extraction time, the state transfer time, and the state loading time.For the partial state movement algorithm, the same parameters were used for the static and dynamic states.The state to be migrated was ranged from 1 to 5 GB; however, Siddhi has a limit of 1 GB because it extracts the entire state into a single byte array, whereas Flink's state backend RocksDB splits the state into multiple files.
Results.Tables 8 and 9 show the results of the experiments, where the former shows the outcomes of the all-at-once state movement algorithm and the latter those of the partial state movement algorithm.Siddhi and Flink migrated operator states of different sizes depending on the query and the number of tuples that were processed.The state transfer times of Siddhi and Flink were similar because they used similar implementations of the TCP socket.Siddhi performed slightly better, where this can be explained by the fact that Flink read the checkpoint from multiple files, and state transfer in it was executed in parallel with reading the files.State extraction appeared to scale relatively poorly for both Siddhi and Flink with the all-at-once state movement algorithm, but with the partial moving state, Flink had a significantly lower state extraction overhead.Moreover, state loading using partial state movement was much faster than without it.Note that these results do not represent the general performance of the SPEs, but the outcomes for a specific join query that was used for a specific systems.Another query might have yielded different results.For instance, this query was very write heavy.All the received auction tuples were used in the migration, and only once they had been read, i.e., after the person tuple had been received on the new host.In this case, the partial state movement algorithm performed better in all respects.
One might think that the all-at-once state movement algorithm would have had faster state loading as it has a monolithic checkpoint, but this was not the case.We think this result is obtained because the incremental checkpointing uses RocksDB's native checkpoint files whereas Flink's full snapshot approach iterates through the RocksDB state and creates its own files.RocksDB is designed to be efficient, and performs indexing to increase its efficiency.This benefit was lost in the full snapshot approach.
If we assume that the number of tuples that were received during the freeze time arrived at a fixed rate, the average additional tuple latency as a result of the migration was equal to half the freeze time.The maximum additional tuple latency was approximately equal to the freeze time and the minimum was close to zero.The number of affected tuples could vary significantly, ranging from zero to hundreds of thousands per second.
The partial state movement algorithm performed much better than the all-at-once algorithm in terms of freeze time, even without counting the poor state loading performance of the all-at-once algorithm.The reason is that most of the state was moved before the operator was shut down.This difference in performance was especially significant when considering how similar the algorithms were in terms of how they were described in Listings 6 and 7.Only one task was added to 7, which was to migrate the immutable state before the streams were redirected by the upstream nodes.
This leads to the important conclusion that the literature can benefit from a common language when defining or using a migration algorithm.Exactly what tasks are executed during the migration, in particular, those that increase the freeze time, can be described using, e.g., the concepts described in the migration model in Section 2.
The proposed migration model can foster the development of new migration algorithms, and can help avoid duplicate solutions.One explanation for why multiple, similar migration algorithms have been developed is that the algorithm is typically only part of the contribution of approaches, along with decision models for migration, the system itself, and its evaluation.

REFLECTIONS AND FUTURE DIRECTIONS
The historical development in data migration, from the early single-track moving all-at-once state migration solutions to checkpoint-assisted partial state movement and parallel-track solutions without state movement, has been driven by the deployment of SPEs to the cloud environment, and improvements to them to achieve fault tolerance and dynamic scalability.Cloud environments provide large amounts of computational resources (even though at different scales), and their servers are interconnected with low-latency high-bandwidth networks.Therefore, advanced state management solutions in cloud-based systems might work well in fog environments.
However, in fog environments that are geo-distributed, the connections between hosts have substantially lower available bandwidth and higher latencies that can impact the cost and benefit of operator migration, and require adapted migration mechanisms.The periodic checkpointing and replication of checkpoints are used in some cluster-based SPEs to facilitate fault tolerance and fast migrations, but it is not always feasible to replicate and distribute checkpoints, especially in resource-constrained IoT devices.For future in-network processing solutions with mobile platforms, e.g., advanced crowd-sensing applications, energy is an important factor to consider for operator placement, the design of migration mechanisms, and the calculation of cost and benefit.This survey shows that energy is not yet an issue for state-of-the-art operator migration approaches.
It is clear that the smaller the size of the data to be migrated is, the less is the energy that is consumed.Therefore, scheduling operator migration at a point in time when the state is small or even zero is important.This can be achieved, for example, by delayed migration by waiting until a tumbling window is emptied [62], and through proactive migration.
Another alternative is to allow for some inconsistent state where not all of it is migrated to the new host.In some cases, aggregation operators can be moved without the state, resulting in zero freeze time.Alternatively, load shedding techniques can be applied to send some of the state, or components of it can be assigned a priority such that only the most important state is migrated, while the less significant part of it is omitted.However, a thorough investigation of the pros and cons of reactive, delayed, and proactive migrations in different environments with different workloads and guarantees of consistency is still elusive.
Another gap in research is an analysis and comparison of stream management techniques.Several aspects are important for such an investigation: (1) the sequence of tasks like the stopping, buffering, redirecting, and starting of streams, (2) the locations where streams are buffered, (3) the delivery semantics, i.e., at least once, at most once, exactly once as well as ordered or out-of-order delivery, (4) and tasks related to buffer management and transport protocols.
The quality of decision-making on migration depends on the data available to calculate its cost and benefit, as well as the freshness of the data.The continuous collection and dissemination of monitoring data in a DSPS can be expensive.
Efficient monitoring solutions, and leveraging other sources of monitoring data that are, for example, used for network and system management have the potential to reduce the overall cost of a DSPS and ensure good decision-making.
Leveraging historical data to perform predictions with advanced statistics or modern machine learning solutions, as is done for traffic prediction in network management [89] and data prediction in wireless sensor networks [90], is another subject that deserves attention in research.Both proactive migration and the use of amortization time in the cost model require some form of prediction.The oxymoron of operator migration, i.e., that the need for migration occurs when the cost of migration is high, can be avoided with proactive migration.Furthermore, proactive migration can be used to schedule a migration when the state is still small in size.However, both traffic and data patterns might be changing during the deployment of DSPSs, and appropriate and efficient online learning solutions need to be investigated for operator migration.

CONCLUSIONS
DSP is becoming increasingly important for handling data with high velocity and a large variety.The variety of data from different sources and over time as well as other system dynamics, e.g., resource availability, require adapting distributed stream processing accordingly.Operator migration is the mechanism for keeping a DSPS in an "optimal" configuration over its lifetime.This survey provided an overview of solutions for operator migration from a historical perspective and that of the goal of migration.Both perspectives show that the deployment environments and the purpose of the system have a strong impact on the design of migration mechanisms.Unfortunately, the terminology in this area is not always consistent.Therefore, we introduced a conceptual model of operator migration based on the largest common denominator in the literature to introduce a common/unified terminology.The model facilitated the classification of existing solutions and structured their description with respect to two research questions: (1) Which mechanisms are used to perform migration?(2) How is the migration decision executed?Emphasis was placed on its costs and benefits.These aspects are important for operator migration, but are often only implicitly addressed or are neglected altogether.The description of existing solutions shall provide the reader with a good understanding of the design alternatives from an algorithmic viewpoint.We complemented this with an empirical study to give the reader some quantitative insights into the impact of different design alternatives for migration mechanisms (i.e., all-at-once and partial state movements), and the impact of the choice of data stream processing (i.e., Siddhi and Apache Flink).

3. 1 . 1
Algorithm descriptions.This section describes the relevant algorithms in a concise and systematic manner.Since details of what happens in migration algorithms are typically omitted from research papers, our descriptions may deviate to some extent from the original implementations of the migration algorithms considered.For the most significant variations of these algorithms, we show how migration is performed using a figure that illustrates the topology of stream processing and the communication between nodes.The following types of node are used in them: old host (OH), new host (NH), upstream nodes (US), and downstream nodes (DS).The US and DS can both represent one or more nodes, but for the sake of simplicity, only one of them is shown in the figures.Each figure is accompanied by an enumerated description of the steps of the relevant algorithm on the right-hand side, and each step is provided in the figure.The contents of the control messages sent are formatted as a list of tasks that must be executed (shown in subsequent listings).

Listing 1 .
Single-track moving stateC o n t r o l M e s s a g e (OH C o n t r o l M e s s a g e ( Upstream B u f f e r S t r e a m s ( S t r e a m s ( q u e r y ) ) S t o p S t r e a m s ( S t r e a m s ( q u e r y ) ) R e d i r e c t ( S t r e a m s ( q u e r y ) , OH, NH) C o n t r o l M e s s a g e (OH, M o v e S t a t e ( query , NH ) ) Resume ( S t r e a m s ( q u e r y ) ) ) )3.1.3Parallel-track.There are two types of parallel-track algorithms: state-recreation and window-recreation algorithms.The difference between them is that state-recreation involves state migration and window-recreation does not.Zhu et al.[75] introduced the window-recreation parallel-track migration algorithm.Gulisano et al. [64] presented both a window-recreation and a state-recreation algorithm, and Ottenwalder et al. [60] performed state-recreation migrations based on changes in mobility.Madsen et al. proposed a direct window-recreation algorithm in Enorm

Listing 3 .
State-recreationC o n t r o l M e s s a g e ( Upstream C o n t r o l M e s s a g e (OH C o n t r o l M e s s a g e (NH, S t o p S t r e a m s ( O u t p u t S t r e a m s ( q u e r y ) )

Listing 4 .
Checkpoint-assisted single-track # B o o t s t r a p p i n g C o n t r o l M e s s a g e (OH, R e p l i c a t e C h e c k p o i n t (NH ) ) # M i g r a t i o n C o n t r o l M e s s a g e ( Upstream C o n t r o l M e s s a g e (NH, B u f f e r S t r e a m s (NH, S t r e a m s ( q u e r y ) ) S t o p S t r e a m s (NH, S t r e a m s ( q u e r y ) ) ) C o n t r o l M e s s a g e (OH, R e d i r e c t ( S t r e a m s ( q u e r y ) , OH, NH) M o v e I n c r e m e n t a l S t a t e ( query , NH)

Table 5 .
Fig. 9. Evaluation scenarioParameter name Parameter value Amortization time 5 s Bandwidth C<->D 200 mbit/s Bandwidth C<->E 100 mbit/s Bandwidth D<->E 100 mbit/s Bandwidth Leader<->Hosts 200 mbit/s Latency between all links 1 ms Control message size 168 bytes Migration cost (C) 0 Migration cost (D) 0.1 Migration cost (E) 0.5 Table 5. Parameters of the use case The operator is further suspended during state extraction at the old host, moving the state from the old to the new host, and installing it at the new host.Two metrics are used to assess temporal cost: freeze time and latency spikes.Freeze time quantifies the duration for which an operator cannot work, i.e., freeze time =   -  , where   is the point in time that the old host stops the operator and   is when the new host resumes it.Latency spikes quantify the increased latency of event delivery caused by a non-working operator.It is often approximated by the time needed for state movement, which is the duration for which the state is in transit between the old and the new host, i.e., state movement time =   -  , where   is the time at which the old host starts sending the state, and   is the time at which the entire state has been received at the new host.The state movement time depends on the size of the state and the available bandwidth between old and new host.Thus, state size can be seen as related to the costs of both resources and time.

Table 2 .
Goals of migration and the overlap in them in studies in the area

Table 3 .
[22]s of optimization grouped by the goal of migration parameters, such as latency spike and performance penalty.While the columns in the table below may indicate the goals of optimization used, it is different from Table3in that it looks only at what metrics are used.Approaches like[22]are listed as those for placement optimization based on the cost of migration, but are not located anywhere in the modeled cost of migration column as no metric is used to describe the cost of migration in the paper.

Table 6 .
Results of the use case

Table 7 .
Server specification

Table 8 .
Results of all-at-once moving state experiment

Table 9 .
Partial moving state experiment results