Dynamics of targeted ransomware negotiation

In this paper, we consider how the development of targeted ransomware has affected the dynamics of ransomware negotiations to better understand how to respond to ransomware attacks. We construct a model of ransomware negotiations as an asymmetric non-cooperative two-player game. In particular, our model considers the investments that a malicious actor must make in order to conduct a successful targeted ransomware attack. We demonstrate how imperfect information is a crucial feature for replicating observed real-world behaviour. Furthermore, we present optimal strategies for both the malicious actor and the target, and demonstrate how imperfect information results in a non-trivial optimal strategy for the malicious actor.


I. INTRODUCTION
Computer security is a rapidly developing field, with new threats emerging and evolving constantly.As computer security providers develop their methods for detecting malware (malicious software), the malicious actors behind the various strains of malware are forced to refine their techniques for avoiding detection, prompting further development from the computer security industry.As a result of the interaction between these competing agendas, problems in computer security can give rise to rich dynamical behaviour.By analysing these dynamics, we can provide insights that assist in understanding phenomena observed in computer security.The current paper seeks to provide insight on recent developments in ransomware by using game theory to explore the dynamics they have introduced.
Ransomware is a type of malware designed to extort a ransom from the victim [1], [2], usually by denying the victim access to their computer or data until the ransom has been paid.In the past, ransomware relied on extorting a small amount of money from a large number of victims.The ransom itself would be fixed at a price low enough that nearly anyone could pay, but was typically non-negotiable, as negotiating a small ransom with many victims would not be worth the effort.In recent years, a new phenomenon known as targeted ransomware has emerged [3], [4].Malicious actors operating targeted ransomware specifically target large organisations, which can be extorted for significantly higher ransoms [5], [6].However, they are also likely to have a higher level of computer security, and so the malicious actors are forced to make significant investment into breaching their security.Given the effort involved in breaching security, the malicious actors will invest further in calculating the highest ransom they believe their target will pay.As a result of the larger sums of money at stake, malicious actors are willing to negotiate the ransom demand to facilitate payment.We consider these negotiations to be a crucial feature in targeted ransomware, as their outcome has an enormous impact on both the malicious actors and the targeted organisation.In this paper, we develop a model of targeted ransomware negotiations based on game theory.Game theory is a branch of mathematics which studies strategic interactions between rational decision-makers [7].A game is a mathematical model with a clearly defined set of rules where two or more players make strategic decisions to influence the outcome of the game to their own personal benefit.By analysing the decisions available to players, optimal decision-making strategies can be determined which offer the best outcome for each player.Game theory can be applied to many real-life scenarios that involve competing interests by formulating a game as a mathematical abstraction of the given scenario.By analysing the decisions taken in the game logically, optimal strategies can be determined, granting insight into real-world behaviour.As such, game theory is highly applicable to fields such as ecology [8], [9], [10], economics [11], [12], [13], [14] and politics [15], [16].Very recently, ideas from game theory have been found to be useful in the study of ransomware.While all ransomware follows the same fundamental principles, there is sufficient variety in observed phenomena to merit a variety of models, such as defence and deterrence [17], [18], [19], [20], [21], iterative negotiations [22], price discrimination [23], incentive to return encrypted data data [24], and sale of stolen data [25].In this paper, we examine how game theory can be applied to the growing threat of targeted ransomware.We construct a model of ransom negotiations as a game played between a malicious actor and their target.The game focuses on the strategic behaviour and investments that the malicious actor must commit to in order to implement a successful targeted ransomware attack.By analysing our model, we demonstrate how imperfect information is crucial for replicating observed real-world behaviour, and provide new insights into the real-world behaviour and strategy of malicious actors operating strains of targeted ransomware.

II. BACKGROUND
In order to construct a model of targeted ransomware negotiations, we first consider the key features of targeted ransomware.On a technical level, one of the main differences between untargeted and targeted ransomware is how it spreads [26].Untargeted ransomware is distributed indiscriminately, relying on victims with weak security to make a mistake that allows the ransomware to infect their computer.Such a strategy may be appealing to a malicious actor, as it requires a low investment of effort to infect victims.However, this indiscriminate strategy is relatively easy to defend against.A potential victim can reduce their chances of being infected through practices such as maintaining good security, careful internet usage, and maintaining up-to-date data backups.Large organisations with valuable data are likely to have such practices implemented across a complex network of computers.In order to target such organisations, a malicious actor must invest significant effort in circumventing their security and spreading their ransomware across the computer network.This typically involves multiple attack vectors, such as targeted phishing, remote desktop protocol (RDP), and searching for weak passwords [27].The malicious actor may spend days or weeks escalating their level of access to the target's network; in 2019, the mean dwell time (the duration a threat is present in a system before it is detected) of ransomware was 43 days [28].Only when the malicious actor has a level of access high enough to compromise the target's backups will they start encrypting files, ensuring that the only way to restore the data is by paying the ransom.While this process of circumventing security requires a large investment of time and effort, it enables the malicious actor to extort organisations for ransoms far larger than previously possible [29].
A key feature in any ransomware negotiation is the reliability of the malicious actor; the victim's willingness to pay the ransom demand must be affected by the likelihood of getting their data back afterwards.There are two major reasons for a victim to not get their data after paying.The first is that the malicious actor chooses not to return the data, perhaps to avoid the cost incurred by doing so.This is not a sound business practice, as it results in a loss of perceived reliability that reduces a victim's willingness to pay, and hence, the malicious actor's profits [24], which does not fit with the business-like stance that malicious actors operating ransomware have adopted [30].Therefore, we assume that our malicious actor will always attempt to return the victim's data.But how does one unintentionally not return data?Modern ransomware operates by encrypting the victim's data using public-private key encryption, rendering the data unreadable until it has been decrypted [31].In order to decrypt their data, the victim must pay the malicious actor in exchange for the decryption key.As the encryption phase of the attack must proceed rapidly to avoid notice, flaws may arise during the process.If such flaws occur, then the decryption key will fail to decrypt the affected data.This results in some or all of the victim's files becoming permanently irretrievable, which is unlikely to be discovered until after the victim has paid the ransom.If a strain of ransomware is known to have a history of failure, the target's willingness to pay is reduced, and so the reliability of the malicious actor's ransomware is an important factor in the negotiations.Ransomware strains can vary quite significantly in their reliability, depending on how heavily the malicious actor invested in the development of the ransomware.They may even invest in providing "customer service" to their victims, walking them through the decryption process to further improve their image of reliability [32].
The potential for negotiation adds a significant feature from game theory to the dynamics of targeted ransomware; information asymmetry.The two parties to the negotiation have different degrees of information about each other, information that is crucial to determining the outcome of the negotiation.The malicious actor does not know exactly how much the target's data is worth to them.This is a significant factor, as the malicious actor seeks to set the ransom as high as possible in order to justify the significant investment they make in their attack.Therefore, estimating the value of the target's files accurately is of great importance.To aid them in this, the malicious actor can make use of both publicly available data, including shareholder's reports and valuations, and private data found on the target's computer network, such as up-to-date finances and business plans.However, uncovering and interpreting an organisation's private documents is not effortless, and so producing an accurate estimate for the value of the target's data requires further investment from the malicious actor.While investing greater effort can lead to a more accurate estimate, it is highly unlikely that the malicious actor will ever achieve perfect accuracy.Even so, we might expect the target to be at a distinct disadvantage in terms of information asymmetry.However, as a result of developments in the computer security industry in response to targeted ransomware, this is not entirely true.The potential for negotiation of large ransoms has led to the emergence of organisations that offer professional ransomware negotiation services [33].The negotiators have experience in dealing with malicious actors behind the various strains of ransomware, allowing them to negotiate effectively on the behalf of targeted organisations.The negotiators also offer specialised knowledge.This can include statistics such as the reliability of a given strain of malware, but it can also include less quantifiable information, such as how aggressively the malicious actor behind a particular strain will negotiate.Operators of targeted ransomware are typically willing to reduce their demand in the interest of getting paid, but not all are equally receptive to negotiations.Some malicious actors are likely to perceive a low counteroffer to their demand as an insult, and may react aggressively to punish the target.The severity of the reaction depends on how aggressive the malicious actor is, which varies depending on individual and cultural factors.In extreme cases, they may even react by abandoning the negotiations, and their ransom, so that future targets will be less likely to negotiate.This aggressive behaviour is a double-edged sword; if the malicious actor is not aggressive enough, then their targets will not pay them a large ransom.If they are too aggressive, their inclination to punish their targets for perceived insults will cost them ransoms.In order to be successful, the malicious actor must balance their aggression with their investments in their strain of targeted ransomware.In the next section, we construct a model of ransomware negotiation that is based on these key features.

III. MODELLING
We propose to study the dynamics of targeted ransomware by modelling the negotiations as a two-player game.This approach was inspired by Selten's analysis of a two-player game modelling the interaction between a hostage taker and a hostage negotiator [34].Our two players are the attacker A and the defender D. Player A is a malicious actor (or group of malicious actors) operating a strain of targeted ransomware.Player D is an organisation targeted by player A, assisted by a professional negotiator hired to negotiate the ransom.

A. PLAYER A'S INVESTMENT
As noted in the previous section, there are three areas in which player A must invest in order to pull off a successful targeted attack: • The circumvention of player D's security • The reliability of player A's ransomware • The estimation of the value of player D's data We will not be considering player A's investment in circumventing player D's security here.While it is likely to be an important factor in the overall dynamics of the system, it has little bearing on the negotiations.Once player D's computer network has been infected, this investment only affects player A's net profit, and affects player D not at all.However, the other two investments are very significant to the negotiations.
By the time player A launches their attack, they have already developed their strain of ransomware.In particular, they have invested in the reliability of their ransomware.Investing in reliability is important; if the decryption process is likely to fail, then player D will not be willing to pay very much for the decryption key.This is a significant upfront development cost; it does not scale with the number of targets player A attacks.For the sake of simplicity, we assume that player A can amortize this investment over the targets that they will infect with their strain of ransomware.We refer to this investment as I β .As I β increases, so does the reliability of player A's ransomware.Let β be the probability that the decryption key successfully decrypts data encrypted by the ransomware.We choose β such that where I 50 is the amount of investment required to achieve a reliability of 50%, or β = 0.5.I 50 can be considered as an economic scaling factor, and determines the amount of development that player A can purchase for a given investment.This choice of β results in diminishing return for large I β , so that β → 1 as I β → ∞, as shown in Fig. 1.Note that I 50 and I β are dimensionless, so that our model remains generally applicable.I 50 = 0.02 means that an investment of 2% of the value of the targeted files will yield a decryptor that is 50% reliable.While player D doesn't know the value of I β , their negotiator can provide an estimate of β from their experience of individual ransomware strains.The final area in which player A can invest is in their estimation of how much player D's data is worth.Without an accurate estimate, player A is unlikely to choose an optimal ransom demand, and so investing in producing an accurate estimate is an important aspect of their strategy.We let x be the value that player D attaches to their encrypted data, and let x be player A's estimate of x.We refer to player A's investment in data value estimation as I σ .As I σ increases, so does the probability that x will be close to x.In this paper, we choose to model x as random variable following a Lognormal µ, σ 2 distribution [35] with probability density function With the chosen parameters, x has median x.As I σ → ∞ , σ → 0, the variance e σ 2 − 1 e 2µ+σ 2 → 0 and mean The Lognormal distribution has previously been used for modelling positive quantities that are determined by human behaviour [36].As x is player A's estimate of how much player D values their data, we believe that this is an appropriate choice.Fig. 2 shows the probability density function f (x, µ, σ) for x = 1 and varying levels of investment I σ .I σ is scaled similarly to I β .As I σ increases, the distribution narrows around x.

B. NEGOTIATION
Once player A has infected player D's computer network with ransomware, the negotiation begins.Player A issues a ransom demand R, and player D must decide how to respond.We now examine the negotiation process between player A and player D that occurs if player D is willing to pay, but does not wish to pay the full demand.This is a timesensitive issue, particularly for player D. Player D cannot conduct business while their data is encrypted, but they still incur costs, which can grow significant over a protracted negotiation.Player A does not suffer such ongoing costs, but they have invested significant resources in the attack, and the longer the negotiation continues, the more time player D has to consider their position.This makes a rapid negotiation process highly desirable, and so we make a simplification to the negotiation process and model it in the simplest way possible in the manner used by Selten [34].Player A issues a ransom demand, player D responds with a counteroffer C, then player A decides whether to accept C and hand over the decryption key, or reject C and abandon the negotiations.This is a substantially simplified description of the negotiation process and should not be taken literally.In reality, there may be a series of offers and counteroffers that take place over time.However, player D has finite capital with which they can absorb the costs incurred by not being able to conduct business; a drawn out negotiation for a lower ransom may be more costly than a prompt negotiation for a higher ransom.
Why would player A ever reject C? In doing so, they lose both their potential earnings and waste any investment they've made in the attack, which is clearly an undesirable outcome.However, player A must maintain their status as a threat; if they appear to be willing to accept low counteroffers, they will only receive low counteroffers.In order to maintain their credibility and their profits, player A may punish player D for making a low counteroffer.Therefore, we must expect that with a positive probability α, A will perceive a counteroffer C < R as an affront, to which they react aggressively by abandoning the negotiations.It is reasonable to suppose that α will be greatest for C = 0 and lowest for C = R.We define α as where a > 0 is the aggression parameter of player A, quantifying their tendency to perceive a low counteroffer as an affront and react aggressively.This aggressive reaction is different to that implemented by Selten [34], where α = a 1 − C D and a ∈ [0, 1] so that α ≤ a ≤ 1.The reason for this is that in Selten's game, the aggressive reaction of the hostage taker is to kill the hostage, while in our game, the aggressive reaction is merely to not decrypt data.It is reasonable to assume that a hostage taker, faced with having their ransom demand being disregarded, might still refrain from killing their hostage.However, a malicious actor, divorced from the consequences of their actions, would have no reason not to react aggressively and abandon the negotiation.This choice of α allows for a wide range of behaviour from player A, with very lenient negotiations Table 1.Payoff table for the targeted ransomware negotiation game.

Outcome
for a < 1, scaling up to very aggressive negotiations as a increases.Larger a causes α to increase more rapidly as the difference between C and R increases, as shown in Fig. 3.As with β, player D can estimate a through the negotiator's experience of interacting with individual ransomware operators.If player A does not react aggressively to a counteroffer C < R, player A receives payment C and player D receives the decryption key, which they then attempt to use to decrypt their data.The probability of successful decryption is β.If the decryption process is successful, player D retrieves their data.If the decryption is unsuccessful, their data is rendered permanently irretrievable.The possible outcomes of the game are detailed in Table 1.In the next section, we consider the how these outcomes may arise through rational decisions made by both players.

C. SUMMARY OF GAME RULES
The model variables are summarised in Table 2.The rules of the game are summarised as follows: 1) Player A incurs cost I β + I σ to infect player D's computer system and make a ransom demand R.

IV. ANALYSIS A. OPTIMAL CHOICE OF C
In the subgame beginning with player D's choice of C, player D knows that making a counteroffer C < R will provoke an aggressive reaction from player A with probability α.Player D has no incentive to make a counteroffer C > R, so they rationally chooses C to maximize their expected utility U : The probability of successful decryption is β; therefore, the expected value of the encrypted data is βx, and so player D will not offer more than βx for the decryption key.We assume that there exists a critical value C max ∈ [0, βx] such that player D's expected utility is maximized by making counteroffer C max if R > C max .By substituting from Eq. ( 4), player D's expected utility for C < R simplifies to To calculate the optimal value of C, we calculate By setting ∂U ∂C = 0 we find that for 0 < C < R, U achieves its maximum at C = aβx 1+a .∂U ∂C < 0 for C > aβx 1+a ; for any R > aβx 1+a , player D's optimal counteroffer is C = aβx 1+a .
∂U ∂C > 0 for C < aβx 1+a ; however, player D will never make a counteroffer C > R. If R < aβx 1+a , player D's optimal counteroffer is C = R.Thus, C max = aβx 1+a , yielding player D's optimal counteroffer Ĉ: This scenario is demonstrated graphically in Fig. 4. Any choice of C max = aβx 1+a results in a drop in player D's expected utility.

B. OPTIMAL CHOICE OF R
In the subgame beginning with player A's choice of R, player A knows that under rational decision-making, player D will optimally make counteroffer Ĉ. Player A rationally chooses R to maximize their expected profit P : Substituting from Eq. ( 5) yields By differentiating with respect to R we find that ∂P ∂R > 0 for R < aβx 1+a and ∂P ∂R < 0 for R > aβx 1+a .Therefore, the optimal  ransom demand R = aβx 1+a is the highest ransom that player D is willing to pay.If player A can reliably make demand R, player D will always pay.Player A will always make their maximum profit, and there is no risk of an aggressive reaction from player A, trivialising the negotiation.Under such conditions, player A's profit is aβx 1+a − I β − I σ .This would suggest that, in order to maximise their profit, player A should be infinitely aggressive, rejecting any counteroffer even slightly lower than their demand (i.e. a → ∞).
Of course, this "ideal" scenario is unrealistic, as it ignores the often-significant effect of imperfect information [37], [38], [39].In reality, negotiations are not trivial affairs, and the risk of an aggressive reaction is always present.Player A does not know x, only x, which, lacking any alternative, is what they use to calculate their ransom demand.Therefore, under optimal play while accounting for imperfect information, player A's ransom demand is R = aβ x 1+a .The potential for error in player A's estimate x gives rise to the necessity of negotiations that may result in an aggressive reaction.We can illustrate this by substituting R = aβ x 1+a into Eq.( 6).Under optimal play, Thus, it is the potential for error in the estimate x which prevents player A from playing optimally.The effect of error in x on player A's profit is shown in Fig. 5. Here, for investment levels are fixed at I β = I σ = 0.1, so that the maximum profit depends on a.The effect of error differs depending on whether player A underestimates or overestimates the value of the data.If they underestimate x, then their profit decreases linearly with x.If they overestimate x, then the possibility of an aggressive reaction emerges, which increases with both a, and the error in x.High aggression might increase player A's capacity for demanding large ransoms, but at an increased risk of an aggressive reaction.
In order to further understand how error and aggression 0.00 0.25 0.50 0.75 1.00 1. 25   interact, we consider player A's expected profit.First, let player A's gross profit be We let x follow a distribution with probability density function g (x, M ) for x > 0 and mean M ; the shape of the distribution is unimportant in this case.With probability density functions g (x, M ) for x and f (x, µ, σ) for x, Player A s net profit for a given combination of aggression and investments can be written as a convolution of x and x in double integral form.However, due to our choice of α and Lognormal x, by making the substitution y = x x we can reduce the double integral to a single integral.
By making this substitution, we can see that what appears to be a convolution of x and x depends merely on the ratio x x .Player A's strategy is consistent across all values of x, so it is only the mean M of the distribution of x that remains in the final expression.In this form, we can clearly see where each element of player A's strategy (a, I β , I σ ) comes into play.As aggression a increases, the multiplicative term a a+1 increases, but the second integral in the sum, where player A has overestimated x, converges to 0. Increasing I β increases costs, but leads to increased β which may increase profit.Increasing I σ also increases costs, but narrows the distribution of x around x, allowing for greater aggression at decreased risk of aggressive reaction.Thus, through our choice of α and x, we can more clearly demonstrate how player A's strategy depends on the interaction between the various elements of their strategy.The optimal counteroffer and expected profit for parameters corresponding to various strategies are shown in Table 3. Fig. 6 shows player A's profit for varying parameters a, I β and I σ .The left column shows results calculated via numerical integration of Eq. ( 8) for M = 1.The right column verifies these calculated results with an agent-based simulation, where results are averaged over n = 10000 runs with constant target data value x = 1.These figures demonstrate that player A's optimal strategy is (a, I β , I σ ) = (4.68,0.091, 0.104), rather than the naive maximal aggression a → ∞ noted previously; in order to realise their potential profits, the attacker must be willing to negotiate.

V. CONCLUSION
We have constructed a simple game-theory model of targeted ransomware negotiations between two agents; an attacker, a malicious actor operating targeted ransomware, and a defender, their target.Our model focuses on three key elements of the attacker's strategy which have a direct effect on the negotiations; aggression, investment in the  In each plot, the hidden parameter is set to its optimal value.The maximum mean profit achieved is marked by a black dot.The red curve marks the the mean profit is equal to 0. reliability of their ransomware, and investment in their estimation of target data value.By thoroughly analysing our model, we demonstrate how the necessity of negotiation arises from optimal decision made by the agents under imperfect information.We show how the key elements of the attacker's strategy interact with each other, and demonstrate by numerical integration and by agent-based simulation that the attacker's profit depends on developing a balance of investment and aggression.While ransomware strains do show a significant variety of behaviour, even within the subgroup of targeted ransomware, the features present in this model are quite generic, and so we expect the insights provided to be broadly applicable in the study of targeted ransomware.
In this paper we have focused solely on the negotiations of targeted ransomware that follow the infection of an organisation in order to construct a simple model that can be analysed in detail.In this scenario, the attacker has greater agency to determine the outcome, and so our analysis is focused on the decisions of the attacker.As noted in the Modelling section, this model neglects the attacker's investment in circumventing security, as it has no direct effect on the negotiations.By extension, the defender's investment in security is omitted.One could grant greater agency to the defender by considering how the defender's investment in security acts as a deterrent to a potential attacker.Every additional effort, such as redundant backups, data exfiltration countermeasures, and employee training makes an organisation harder to hold to ransom, and so less attractive as a target.Such deterrents, and other solutions to the growing problem of ransomware, are of vital importance to strengthening our shared internet security.Our model contributes to this goal by improving our understanding of various negotiation strategies through the use of game theory.As cybercrime continues to threaten our increasingly technology-dependent lives, we hope that the application of game theory to targeted ransomware will be of interest to a wide audience.

Figure 3 .
Figure 3. α, the probability of an aggressive reaction from player A, depends on a and the ratio of counteroffer to ransom demand C R .

2 )
Player D makes a counteroffer C. 3) Player A aggressively rejects player D's counteroffer with probability α. 4) If player A does not aggressively reject the counteroffer, player A receives the counteroffer C and player D receives the decryption key.5) Player D's data is successfully decrypted with probability β.

Figure 4 .
Figure 4. Player D's expected utility for varying R and different values of Cmax where a = 10, I50 = 0.02 and I β = 0.1.

Figure 5 .
Figure 5. Player A's expected profit as a function of x for varying a when x = 1, I50 = 0.02, and I β = Iσ = 0.1.

Figure 6 .
Figure 6.Player A's profit for varying strategy parameters (a, I β , Iσ) when I50 = 0.02.In each plot, the hidden parameter is set to its optimal value.The maximum mean profit achieved is marked by a black dot.The red curve marks the the mean profit is equal to 0.

Table 2 .
Table of game variables with detail on the information asymmetry in the targeted ransomware negotiation game.

Table 3 .
Model variables for optimal and various sub-optimal strategies.
This research was supported by the Irish Research Council and McAfee LLC through the Irish Research Council Employment-Based Postgraduate Programme.Pierce Ryan is an employee of McAfee LLC.Any opinions expressed in this work are solely those of the authors, and do not necessarily reflect the views of the supporting organisations.