Learning Performance Models of Distributed Computer Vision Methods for Decision Making in Detection and Tracking Algorithms in UAVs

Unmanned aerial vehicles (UAVs) are getting more and more uses in recent times. However, low-cost commercial UAVs may not possess enough computational power to run state-of-the-art algorithms in order to perform certain tasks, negatively affecting performance. Remote computational systems, where heavy processing tasks can be offloaded emerge as a solution. However, they introduce latency, which can be undesirable for real-time tasks. Furthermore, if the task is simple, using a local algorithm with worse performance may be acceptable to avoid latency. As such, a method to decide which algorithm to use is of great importance. We consider the use case of computer vision tasks, in particular detection and tracking. In these tasks, image properties, such as brightness, contrast, motion blur, and clutter affect the algorithm performance. Our proposed methods use a combination of neural networks and kernel machines to estimate the performance of the algorithm given the input image. An appropriate cost function is then used to identify the best algorithm for the task given the input image, the task deadline, and the uncertainty in the variables of the algorithm, in particular computing time and error rate. Results show that our method matches or outperforms similar state-of-the-art methods, complying with time restrictions while delivering increased performance.


I. INTRODUCTION
T HE USES of unmanned aerial vehicles, or UAVs, as they are also known, have grown significantly in the last years. In most scenarios, they are used to accomplish tasks that must be done in real time and require computationally heavy algorithms to be performed effectively. This is a challenge for low-cost, low-power UAVs, which will often have to resort to simpler algorithms, which can result in decreased performance. To deal with this issue, remote computation has emerged as a possible alternative, but this in turn can add some undesired latency, and can also be a problem for some tasks.
Several past works propose frameworks to solve the offloading problem, usually formulated as an optimization problem [1], [2]. However, these fail to deal with the issue of uncertainty associated with communication latency, algorithm computation times, and algorithm performance. Besides this, most frameworks also assume that the same algorithm is running locally and remotely. But in general, it is possible to run different algorithms, that can have different degrees of effectiveness and performance. Besides this, algorithms almost always have a tradeoff between speed and performance. Usually there is not an algorithm better than all others, so even choosing the correct algorithm for the task, while disregarding the offloading decision, can be difficult. In this work, we aim to deal with these shortcomings and provide an improved decision method, not just choosing whether or not to offload the data, but also choosing the algorithm itself.
In short, the problem we want to solve is the following: Given a set of algorithms for a task, each with different computation times and error rates in both local and remote instantiations, how to choose the best one? The error rate is essentially a performance metric, measuring how well the algorithm accomplishes the task. Our method chooses the algorithm that minimizes the computing time and error rate, taking into account the possibility of using remote systems to perform computations. Of course, minimizing both the computing time and error rate are very often contradictory goals, so most of the time we want the best tradeoff between them, using an appropriate cost function. The most important step, and something that distinguishes this work from others, is the use of an estimation of the distribution of the error rate and the computing time, conditioned on the input, obtained using a combination of neural networks and kernel machines. This allows us to leverage more powerful algorithms and remote resources only when required, and as a result, obtain better performance given the same computing time.
As an application, we consider the use of our methods for computer vision tasks, in particular, detection and tracking. In these tasks, image properties, such as brightness, contrast, motion blur, and clutter affect the algorithm performance, making the case for the estimation of the error rate and computing time based on the input image.
Our contributions include introducing. 1) A new general method to select the best algorithm for a certain task and where to run it. This new method matches or outperforms other state-of-the-art methods where a tradeoff between error rate and computing time is critical.
2) A problem formulation that considers the possibility of using a set of different algorithms, either locally or remotely. 3) A strategy to quantify the uncertainty of both the error rate and the computing time; 4) An algorithm to determine the best relative weight of the error rate and computing time, when faced with a computing time deadline.

II. STATE OF THE ART
Several works propose methods that exploit remote resources to perform various computer vision tasks in a UAV. In most cases, the methods are highly dependent on the tasks themselves.
The work in [3] uses a convolutional neural network (CNN) to detect upcoming intersections and dead-ends, helping with the navigation process of a UAV. Considering the long computation times, they perform the computation on a remote server using a Wifi link. Similarly, in [4] the computations are also performed remotely. A CNN is used for object detection, trained to detect a single class. They also introduce a simple tracking algorithm, so that the UAV can follow a target over time. The work in [5] uses a Wi-Fi link to send a JPEG compressed stream to a preconfigured graphics processing unit (GPU) enabled cloud virtual machine for object detection, using deep learning algorithms. The use of compression was proven to be important, leading to overall smaller latency.
The performance of three different architectures is analyzed in [6]. The architectures are: 1) on-board embedded GPU system; 2) on-board GPU-constrained system; and 3) offboard GPU-based ground station. In this last architecture, the local system of the drone only performs the streaming of the frames, and all the computation is done in the ground station, equipped with a powerful GPU to do the processing. The computer vision tasks that are at stake here are target detection and tracking. For detection, they consider using you only look once (YOLO) [7] and several improved versions, as well as single shot detector (SSD) [8], regions with CNN features (R-CNN) [9], Faster R-CNN [10], and finally Mask R-CNN [11]. For each architecture, they choose a list of these algorithms and later show the results. For tracking, they used the deep SORT algorithm [12] applied to YOLOv3 [13]. The best results were obtained with Mask R-CNN and YOLOv3. When comparing the architectures, the one with remote components offers better results in terms of frames per second (FPS).
In [14], there is a better division of computation between local and remote resources. Once again, the proposed detection algorithm is based on CNNs. In particular, the Faster R-CNN algorithm was chosen. This algorithm runs on a remote system, but unlike the previous papers, not all frames are sent to this system and analyzed. Locally, the system does the necessary computations for navigation, which involve position estimation and control. The most interesting local computation is the use of an algorithm to evaluate the objectness of a frame, that is, the likelihood that there is an object in the frame, whatever the object is, and only frames with a high likelihood are sent to the remote system. The algorithm used is binarized normed gradients (BINGs) [15]. The relevant results are in terms of the accuracy of object detection and computation time. In terms of accuracy, Faster R-CNN is compared with YOLO and SSD, and obtained better results than both, when trained and tested in the same way. In terms of computing time, even with the extra latency associated with communication, the total remote computing time is less than the local one, even if considering a lighter algorithm. It should be noted that the computational capabilities of the remote computer are very high, which is a key factor to these results.
While all the previous works show that exploiting a remote computation system can offer great results for UAVs, they do so for particular tasks and do not offer a framework for general tasks. The work in [16] tries to do just that for tasks that use neural networks, where a task distribution framework is proposed. An optimization problem is formulated to determine the optimal data offload ratio β, the fraction of times a task is offloaded to a remote server. The goal of the optimization problem is to minimize the value of the weighted sum of the error rate and the computing time, with the error rate being weighted by a relative importance factor ρ. This way, we can obtain the best tradeoff between the two values, weighted by a factor we deem adequate. A closed-form solution is also proposed. The result is a single constant offloading ratio, that depends on factors, such as the average error rate of the algorithm, the average computing speeds of the local and remote components, the average communication time, and the relative importance factor ρ. Performance evaluation based on simulations confirm that the solution offers better performance than no offloading or full offloading.
The work in [17] is based on both [16] and [18] and introduces the idea that low-quality data will lead to a higher error rate. It adds to the optimization problem the probability of low-quality data, which will condition the expected error rate for each method, but this probability is still considered to be a constant value, not dynamic to each input, and the optimal offloading rate is still constant. Furthermore, what defines low versus high quality is not clear. The work hints at the idea of giving priority in remote computation to tasks based on the data quality but does not yet propose a method to do so.

A. Literature Gaps
Throughout this section, we reviewed relevant state-ofthe-art works. The most recent works in the topic of UAV distributed computing already introduce a general method for the offloading choice. This general method assumes that a different algorithm can be run locally or remote, but the choice of algorithm is tied to the choice of offloading. It does not provide a general problem formulation for more than one option in the local and remote case.
Recent works hint that image quality should be taken into account, but do not yet propose a method to do so. In these works, the decision does not depend on the input at all, only on average properties of the algorithms and the data. We believe this is a major factor that holds backs these methods, as we can intuitively understand that different cases can require algorithms with a different degree of effectiveness.
Finally, uncertainty in the computing time and error rate are not taken into account. These are the gaps that we aim to address in this work.

III. METHODOLOGY
Throughout this section, we will go over the methods we propose to create a new general task offloading framework, which chooses the algorithm to use for the task, taking into account both algorithm information and the properties of the input data.

A. Decision Method
Formally, we can define the task offloading problem as follows: A task T is a 3-tuple, defined as (I, S, (A 1 , . . . , A N )), where I is the input data, with size S. This task can be accomplished by either one of N algorithms, namely, (A 1 , . . . , A N ).
An algorithm A is defined by a 3-tuple (p(t l ), p(t r ), p( )), respectively, the local computing time probability density function (PDF) p(t l ), the remote computing time PDF p(t r ), and the error rate PDF p( ). This formulation is flexible in allowing algorithms with both local and remote components, but also allows algorithms that have one component. In those cases, the computing time of the component that is not used is 0. The same underlying algorithm (e.g., YOLO) run only locally or only remotely, is considered as two different algorithms under this formulation.
We need to consider the communication channel bit period R, representing the speed of said channel, and the compression ratio γ , representing the ratio between the size of the data that is sent to a remote system and the size S of the input data. We also define the indicator variable ζ , that should be 1 if the algorithm has a remote component or 0 if it does not. These values, alongside those introduced in the previous two paragraphs, constitute the information values, which will be used to make a decision.
A simple diagram exemplifying the proposed architecture is shown in Fig. 1.
Since the goal is to choose the algorithm to use, we can define the action space as the set A = {a 1 , . . . , a n }, where a i corresponds to the choice of algorithm A i . Now we need to define the cost function for each action, considering the information values. This cost function should both penalize long computation times and high error rates.
The computation time τ of an algorithm is given by respectively, the local computation time t l , the communication time SRζ γ , and the remote computation time t r . To create a cost function, we can incorporate the error rate by introducing a weighting factor ρ, similar to the work done in [16]. This way we obtain an appropriate cost given by where both long times and high error rates are penalized, and that can be used as a proposal for the cost function. The expected cost of action a i is given by which can, for the cost function (2), be simplified as follows: We choose action a * such that

B. Distribution Prediction
Given that we have the option of using any of a set of N algorithms, and since the error rate and the computation time of an algorithm not only depends on the algorithm but also depends on the data, we need a way to estimate the PDF of the computing times and error rates for each algorithm given the input data I. Taking the computing times and error rates as y, the problem can be formulated as estimating the conditional probability p(y|I).
The work in [19] reviews several techniques to solve the problem of estimating conditional probabilities. First, we consider that we have access to a class probability estimator that was trained and can provide class probabilities p(c|X), where c is the class value and X is a set of relevant features extracted from I. The basic idea is to discretize the continuous target values y into several intervals that can be treated as class values c. We then use p(c|X) to obtain a weight for each y, conditioned on the X features.
Consider c y to be the bin (i.e., the class) that contains the target value y and let p(c y |X) be the predicted probability of that class given X, which is obtained from the class probability estimator. A simple univariate density estimator, and perhaps the most obvious one in this context, is a histogram estimator based on the bins provided by the discretization, where the density is assumed to be constant in each bin c. Based on the bin's width r c (the difference between the upper and lower boundary of the bin) and the class probability assigned to it, the final conditional PDF is given by Using this methodology, we need to define two things. First, what will be the X features and how they can be extracted from Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Graphical representation of the proposed distribution prediction method.
the input I, and also the class probability estimator, which will have the features X as inputs and provide estimates of p(c|X).
A graphical representation of the general method architecture can be seen in Fig. 2. For the sake of generality, aside from image features that can be extracted from the image using a given feature extractor, we also consider extra context features, related to information that can be obtained immediately from the image (metadata such as image dimensions), or from other external sources.
1) Feature Extractor: Since we deal with image data in our use case, a potential choice of feature extractor are the lower layers of a pretrained CNN, similar to the work done in [20]. Since this feature extractor is arguably the heaviest step of our method, and we want to minimize the overhead, we looked for a light network, especially for resource-constrained environments. After some research, we decided to use the MobileNetV2 network [21].
The idea is to drop the last two layers of the network, so that we obtain a 1×1280 image feature vector from the input image.
2) Context Features: Aside from image features that can be extracted directly from the image, there could be other available relevant context information not captured with just the input image, as well as data about the image that can be immediately obtained.
A seemingly useful feature that can be immediately obtained are the dimensions of the image, since they can affect the computation time (i.e., a larger image will usually take more time to be processed). Depending on the situation, other features can be added. For example, if we have information on the weather, we could also use it as a feature, since for example foggy weather and sunny weather will most likely influence the tasks in a different way. In this particular use case, image dimensions are expected to be constant, meaning information about them will not be very useful. Furthermore, because extra information is not available when training on the chosen data sets, we will not be using any context features, but in a more general case these deserve consideration.
3) Class Probability Estimator: We aim to use a simple class probability estimator, since most of the heavy work will be done by the feature extractor, and because we will need one estimator for each variable that we want to model. Examples of classifiers that could be used include decision trees, naive Bayes, support vector machines (SVMs), and a single-layer perceptron. We decided to use the latter, with a softmax activation, taking the image feature vector obtained in Section III-B1 as input. 4) Label Discretization: As was mentioned before, for the distribution prediction we estimate discrete classes, also known as bins, instead of estimating the values directly. This implies that aside from training the classifier itself, we also need to choose the most adequate number of bins, based on the training data. The number of bins also has to be chosen in advance, and can be though of as a hyper-parameter that must also be chosen depending of the data.
The are two main strategies for discretization. The first is uniform discretization, where the bins are chosen so that all the bins have the same width. The other is quantile discretization, where the bins are chosen so that the number of instances in each bin is approximately similar.
The quantile strategy may lead to incredibly small bins if the values are too close to each other, and the classifier might not be able to differentiate between them, leading to poor performance. In even more extreme cases, if a sufficient number of values are exactly the same, this strategy is simply incapable of discretization. One example of this would be if we choose two bins (the lowest we can go), and more than 50% of the values are the same. We will see from the distribution examples in Section IV-E that this situation does in fact happen. For these reasons, we use the uniform strategy.
During training, for each variable, we tried between 2 and 5 bins, beginning with a high number of bins and lowering the number if the model performs poorly.

C. Data Augmentation
Data augmentation is a powerful technique that greatly helps the generalization ability of models. Particularly for image data, there are several possible transformations that can be used for this goal. The work in [22] reviews several possibilities.
Considering the use case, we look for transformations that can somehow influence the image quality, since this is an important factor that affects performance. With that in mind, we used contrast and brightness changes, as well as kernel filtering to obtain more images for the model training.
For contrast and brightness changes, we simply apply where I and I correspond to the before and after pixel intensities and where i and j indicate that the pixel is located in the ith row and jth column. The parameters α > 0 and can be used to control contrast and brightness, respectively. We considered α = 0.2 and = 80. Kernel filtering is done by convolving the image with a filter kernel. A 5 × 5 Gaussian kernel was chosen to perform the filtering.
Examples of both these operations are shown in Fig. 3.

D. Kernel Estimation
An alternative to the proposed method in Section III-B consists of using kernel methods to fit a distribution, considering all the possible values for the said variable y. In this case, we need to estimate p(y), instead of p(y|I), and so we make the assumption that y does not depend on the image, which might not be true. However, it is still a way to model uncertainty in the methods. We will test this method as a way to understand which variables truly depend on the input image I. For those that do not, the kernel estimation is a considerably simpler alternative.
Kernel methods allow the estimation of distributions that do not fit into well-known cases, such as a normal distribution. This is especially useful for distributions that have two or more peaks (multimodal distribution).
The specific method used is the kernel density estimation (KDE) [23], using a Gaussian kernel.

E. Additional Constraints
Aside from minimizing the cost function in (2), we might want to consider additional constrains regarding the average computing time during a time window, expressed by a deadline τ max .
In a sequence of tasks this can be written as a constrain on the average of the computing time. Considering a sequence of M tasks, the constraint can be formulated as follows: where (τ * ρ ) m is the true computing time for task m, using the algorithm that was chosen as a result of solving (5), that depends on parameter ρ. A large ρ will choose more precise but slow algorithms, whereas a small ρ will lead to faster but less precise algorithms. Since (τ * ρ ) m cannot be known for future tasks in advance, we can instead tune the ρ parameter on a training data set, until these restrictions are followed, so that at runtime, the decision (5) will indirectly take the restrictions into account.
1) Choosing Parameters: While ρ can be tuned directly, we propose as a possible alternative choosing the parameter ρ based on a set of training data, such that the constraints on the time are followed, while the error is minimized, formulated as follows: Furthermore, since typically faster algorithms are more prone to errors and vice-versa (algorithms present a tradeoff between error rate and computing time), a decrease in computing time will lead to an increase in error rate and vice-versa. As such, typically the solution that minimizes the error rate is the one that maximizes the computing time, while still following the restriction. As such, problem (9) can be simplified as follows: meaning, the computing time should be equal to the deadline.
In practice, since we deal with a finite set of numbers, we want to computing time to be as close as possible to the deadline. However, two factors still need to be taken into account. First off, since there will be inevitably some deviations of the computing time when applying the decision method to a new scenario, choosing the average computing time to match exactly the deadline can be too risky, as in some new cases the average time will exceed the deadline. To remedy this we can be more restrictive during training, considering a smaller deadline given by δτ max , where δ is a value smaller than 1.0 called the margin. The second factor is that it is entirely possible that the time constraint is not in any way feasible. In this scenario, we still want to get as close as possible to the deadline. To incorporate both these factors, we can obtain a final optimization problem for the training, formulated as follows: Since raising ρ will increase the weight of the error rate, it will lead to a higher average computing time and a lower average error rate, while lowering ρ will have the opposite effect. As such, we can tune the parameter iteratively until we reach the desired values. We can consider a number of sequences, each with length M. For each sequence, we can find the best value of ρ and average the obtained values to obtain the final ρ.
To find the best value of ρ for a given sequence, we can use a simple variation of the bisection method for a certain number of iterations. The pseudocode is shown in Algorithm 1, with N iterations.

IV. EXPERIMENTAL SETUP
In this section, we will go over the steps to test our methods, including the specific algorithms we will consider for detection and tracking, the data sets for training and testing, relevant metrics for training and evaluating our methods, as well as some parameters. We also present some computation time and error distributions.

A. Detector and Tracker
The focus of this work is not to improve any existing detection or tracking algorithms. Instead, several known stateof-the-art algorithms will be tested. For more diversity, we use both a simple and a more complex but slower algorithm for each task. For detection, we propose to use YOLO and Tiny-YOLO [7]. For tracking we propose to use minimum output sum of squared error (MOSSE) [24] and kernelized correlation filter (KCF) [25].

B. Data Sets
Regarding data sets, necessary both for training and testing, we decided to use the PASCAL visual object classes (VOCs) [26], the Microsoft common objects in context (COCO) [27] and the ImageNet [28] data sets for detection, as well as the large-scale single object tracking (LaSOT) [29] and the VOT [30] data sets for tracking.

C. Error Metrics
An important evaluation metric for both detection and tracking is the bounding box overlap [31], known as intersection over union (IOU), essentially measuring how close the predicted bounding box is to the ground truth bounding box. For tracking, we can simply consider (1 − IOU) as the error rate. For detection, we also consider the confidence ξ , that is, the estimated probability for the correct class. For detection, we can define the error rate as (1 − IOU * ξ), penalizing both low confidence and low IOU.
Since the error rate in Section III-B has to correspond to a single frame, during training, if a frame has more than one object, we consider the median of the error rates. During testing, however, we consider that a detection was successful if the IOU is higher than a given threshold. This applies to both tasks, detection and tracking.

D. Success Metrics
To evaluate our methods and compare them with other stateof-the-art works, we propose using two metrics.
1) Average Algorithm Success Rate: For a certain sequence, we consider the ratio of objects where the detection/tracking was successful. The average algorithm success rate is the average of this ratio over all sequences. 2) Time Constraint Success Rate: The ratio of sequences where the computing time restriction (8) was followed.

E. Distribution Examples
In Figs. 4 and 5 we present a few overall distributions for the error rate and computation time, for both a detection and a tracking algorithm, directly using the images from the data sets in Section IV-B. Overall, we can see that the computation time distribution is quite sharp, while the error distribution has a much wider spread, with two distinct peaks far way from each other. This will help us understand some results we obtain in the next section.

F. Considered Methods
We consider the following methods.

G. Parameters
Unless otherwise stated, we consider a sequence length M of 200. This value should be high enough so that there is enough variety on the sequence. We found that this value is enough for our experiments.   When applicable, for Algorithm 1 we consider 12 training iterations, which in our experiments in enough for the algorithm to converge.
The deadline τ max is considered as 100 ms for tracking and 400 ms for detection. The deadlines τ max are chosen with the goal of motivating an intelligent choice of algorithm. The chosen deadlines are around the average of the computation times of the possible algorithms to highlight the need to tradeoff between multiple options. Regarding IOU thresholds, we consider 0.5 for both detection and tracking, which is a common value for this metric [32].

V. RESULTS AND DISCUSSION
In this section we present the results of our methods, alongside the state-of-the-art method presented in [17], which works by intelligently choosing an optimal data offload ratio β for all sequences, based on the average properties of the input data and the algorithms.
We begin by evaluating examples of the estimated conditioned distributions, using the trained deep learning model, particularly for the error rate, shown in Figs. 6 and 7. To exemplify the effect that the image properties have on the prediction, we consider the original image, as well the same image after a change of contrast and brightness, and also the same image after Gaussian filtering, similar to Fig. 3.
We can see that the error rate is more likely to be higher when the quality drops, as we intuitively expected. This is a good indication that our method will be able to differentiate images with lower quality and prioritize these when making the decision of which algorithms to use and where to run them.
We now examine the results using the overall decision methods, regarding the tradeoff between algorithm performance success and the average computing time for all sequences. To this end, we present in Figs. 8 and 9 the relationship between these two variables, obtained by sweeping over several values of ρ, also using bands to illustrate the standard deviation.
Our methods require heavier computation than the state-ofthe-art methods, primarily during the feature extraction step. This overhead is directly included on the computing time for all our methods, in order to present a fair comparison of results.  On detection, looking at Fig. 8, we can first see that the full deep learning method has similar performance to the time kernel method. Compared to the method in [17], we can see that it tends to perform better where there is need for a more intelligent choice, that is, in the cases where we want to find a better balance between the error and the time. In these cases, it outperforms or at least matches the state-of-the-art method. In the edge cases, where the same algorithm is always chosen, because we either want to only minimize time or only minimize the error, there is no advantage in using our method, in fact, it becomes a clear disadvantage due to the overhead for what is a simple decision. It also seems that in some data sets, in these edge cases, our method is not always capable of deciding to always use the same algorithm. Finally, the error kernel method is always worse than the state-of-the-art method, since it seems to lead to similar decisions, except it has added overhead, making it always disadvantageous to use.
We can see that using kernels for time has almost no effect on the performance of the method, while using kernels for error causes a dramatic drop in performance. It stands to reason that the main improvement of the methods comes from being able to obtain a conditioned estimate of the error, while obtaining a conditioned estimate of the computing time does not seem to help much. The explanation for this behavior is that the computing time does not vary much depending on the image, as was seen in Fig. 5. And intuitively, we expect that the image quality will have a much higher impact on the error than on the time.
On tracking, looking at Fig. 9, the results are similar, however it seems that the time kernel method has a slight edge over the full deep learning method. Also, looking at Fig. 9(a), on the LaSOT tracking data set, the time kernel method and the full deep learning method outperform the state-of-the-art method, even when we try to only minimize the error.
Looking at the standard deviation, we can see that it is much higher in the tracking task, but less so when using the state-ofthe-art method. The reason for this is that the state-of-the-art method works by intelligently choosing a fixed offload ratio for all sequences, which leads to greater consistency across all sequences. The overall higher standard deviation in the tracking task is owed to the fact that, unlike in detection, there is a significant computing time disparity between the two possible tracking algorithms.
To evaluate the effectiveness of Algorithm 1 in complying with time restrictions, we evaluate the effect of the margin on the time restriction success. As we can see in Fig. 10, on detection, with all methods except the error kernel, the margin proves to be an effective way to comply with the restriction, as even a small margin is enough to get 100% time restriction success. On the other hand, looking at Fig. 11, on tracking, except for the state-of-the-art method, the margin needs to be more significant. This goes back to the fact that the standard deviation of the time in tracking is higher with our methods, and as such, the margin has to be higher in order to compensate for this. This is especially the case looking at Fig. 11(a), on the LaSOT tracking data set, where the standard deviation is very high. Looking at Fig. 11(b), it is also noteworthy that on the VOT tracking data set, the error kernel method does seems to work well, but it is far too unstable on the other data sets to be considered a good method.

VI. CONCLUSION
We presented new methods to solve the offloading problem. Looking at the results, the main takeaway is that predicting the error rate for each image leads to an increase in the ability of the methods to tradeoff error rate and computing time. On the other hand, predicting the computing time does not  lead to an improvement, as it almost does not depend on the image. Instead, it depends mostly on the algorithm that is used. With this improvement, the main advantage of our method is that it outperforms or matches state-of-the-art methods when a tradeoff between error rate and computing time is necessary. However, the overhead costs of predicting the error rate make it unsuitable for simpler decisions, where only either the error rate or the computing time needs to be minimized. In these cases, simpler methods perform better. Our methods also have the advantage of estimating the full distribution of the time and the error. Even though right now we only use the mean, for future work we could explore further the use of the uncertainty of the time and error. More complex cost functions could be conceptualized for this end. The overhead of our methods comes mainly from the feature extraction stage. For future work, we could research more efficient feature extraction methods. This would close the gap on the edge cases where the overhead causes our methods to perform worse than the state-of-the-art methods. Finally, the proposed algorithm to select the relative weight of the time and error based on a chosen deadline was proven to be effective.

M
Sequence length. β Optimal offload ratio.