By Topic

• Abstract

SECTION I

## INTRODUCTION

Surveillance of a financial exchange market for preventing market abuse activities has been attracting significant academic and industrial attention after the financial crisis in 2008 and especially since the flash crash in 2010. The abuse of financial markets can occur in a variety of ways, all of which can be extremely damaging to the proper functioning and integrity of the market. Trade-based manipulation, where the manipulation tactic is carried out only by simply buying and selling [1], is one of the primary forms. Price and volume are usually two major objects to be manipulated, and the former format, price manipulation, is thoroughly studied in [2] [3] [4] [5] [6]. Another format of trade-based abuse is volume manipulation, the manipulation actions intending to increase the transaction volume for the purpose of giving a false impression of high trading volume on the market [1], [7]. The major form of volume manipulation is wash trade, which occurs when the same individuals or a group of collusive clients are on both sell and buy sides of a financial instrument (i.e., stock) trading. While there is no beneficial change in ownership, wash trading has the effect of creating a misleading appearance of an active interest in the stock [8].

The remainder of this paper is organized as follows. Section II provides a review of wash trade manipulation and the corresponding detection methods. The features of all types of wash trade scenarios as well as the proposed detection approach are analyzed, formulated, and characterized in Section III. The performance evaluation of the proposed approach is provided in Section IV. Finally, Section V concludes the paper and discusses potential improvements and future work.

SECTION II

## WASH TRADE AND ITS DETECTION

### A. Wash Trade

In capital markets, limit orders indicate the trading intention of the trader to buy or sell the volumes of a specific equity at a specific price or better [9] (better price refers to higher selling prices or lower buying prices). The transaction occurs when eligible orders meet order-matching rules. The outstanding unmatched limit orders are recorded in the order of books of the exchange market, in which the highest buying price decides the best bid price while the lowest selling price is the best ask price. The gap between the best bid and the ask price is defined as bid–ask spread [10]. In most of the exchange markets, the matching rule selects the earliest order with the matched price for execution. In the following examples in Table I, three limit orders, #01, #02, and #03, are submitted in sequence to the exchange market. According to the matching rule, order #03 is first executed by 300 shares with #01, which has the same price but is earlier than order #02, and then, the remaining 100 shares are executed with #02.

Table I Limit Order Sequences

Wash trades follow the same matching rules as legitimate transactions with the special feature defined as the Financial Conduct Authority (FCA) as no change in beneficial interest or market risk, or the transfer of beneficial interest or market risk only between parties acting in concert or collision, other than for legitimate reasons [11]. The Committee of European Securities Regulators (CESR) further indicates that a wash trade is the deliberate arrangement in concert or collusion [12]. On August 28, 2014, Chicago Mercantile Exchange (CME) released a new rule [adopted by U.S. Commodity Futures Trading Commission (CFTC)], termed Rule 575 [13]. Rule 575 clearly states that no person shall enter messages to the market as prearranged collusion (wash trade) with intent to mislead other participants. The definition in Rule 575 in the U.S. shows the consistent regulation to CESR in Europe that the prearranged collusive trading is wash trade and shall be strictly prohibited. Although clearly defined the wash trade activity, the regulators (FCA, CESR, and CFTC) do not provide any quantitative approach on detecting such activities.

As illustrated by the example in Table II, the simplest format of wash trade is the simultaneous submission of two opposite limit orders with identical price (125 in Table II) and similar volume (495 in Table II) from one trader $\boldsymbol {A}$. By the matching rules, orders #01 and #02 match and 495 shares are executed immediately after the submission. In addition, the wash trade actions can also be carried out by multiple orders and traders as the example formats, as shown in Tables III and IV. In Table III, order #03 is matched and executed with #01 and #02 sequentially so that a transaction of 490 shares can be artificially created by trader $\boldsymbol {A}$. In Table IV, two transactions are created by four matched orders between traders $\boldsymbol {A}$ and $\boldsymbol {B}$. After the transactions (450 matched volumes), there is almost no effective transfer of beneficial interest among the two traders.

Table II Basic Format of Wash Trade
Table III Wash Trade With Multiple Orders
Table IV Wash Trade With Multiple Traders

Summarizing the typical formats in Tables II and IV as well as the definitions from the regulators, we obtain three features of a successful execution of a wash trade manipulation as follows.

1. Tight submission intervals between the matched buy and sell orders (to minimize the risk of the orders being unintentionally picked up by other traders).
2. Executable prices (to make the orders an immediate execution).
3. Mostly matched volumes (to minimize the risk of loss from the unmatched volumes executed with other traders).

Perfect matching orders, which have the same price, volume and submission time according to the summarized features, guarantee the execution but are obviously easy to be suspected as market abuse trade by the regulators. Therefore, to avoid being easily detected, smart manipulators design the wash trade orders to be mostly matched, such as the examples in Tables II and IV, where around 99% volumes are executed, respectively. Similarly, due to the matching rules in most exchange markets [14], that is buy (sell) limit order matching sell (buy) limit orders with the same price or lower (higher), the limit prices in the examples in Table IV, which are different but executable, are also deliberately designed to avoid inspection. In Table IV, order #02 can be executed with order #01 at price 125, and order #04 can be executed with order #03 at price 125.5. The 125 and 125.5 are the execution prices of the two possible transactions; we refer to such prices as transaction prices.

### B. Wash Trade Detection

To the best of our knowledge, there is no related work on the detection of wash trade activities in capital markets. The only analogous research is work on the detection of collusive cliques based on certain similar trading behaviors, which are defined as the buy/sell activities of equities in a similar way. A spectral clustering-based approach was developed [15], where a trading-behavioral network is generated and any behavior that deviates from the network is reported as an irregularity. The assumption of this paper is the strong consistency between trader’s current behaviors and his/her previous trading network. A graph clustering algorithm for detecting a set of collusive traders has been proposed in [16]. The relationship between traders is constructed as a stock flow graph, and those with heavy trading within their network are clustered as a collusion set.

A new trading collusion detection approach, the correlation matrix of one trading day, was presented in [17], where the trader behavior was represented by an aggregated time series of signed volumes of submitted orders. The similarities of behaviors among multiple traders are measured by Pearson’s product-moment coefficient, and the cliques with a coefficient higher than a user-specified threshold were considered as suspicious collusions. The experiments of this study evaluated the real order data of futures traded in the Shanghai Futures Exchange. The signed order volume is constructed by volumes and directions (buy/sell) of the order. The order price information is ignored according to the assumption that the order prices are not related to the trader’s behaviors [17]. However, the market impact measure shows that the order price significantly impacts the market [18] so that the market moves caused by the traders’ own actions (orders) become the principal part of the transaction costs [19]. It is, therefore, unacceptable to ignore the order price information, which not only distinguishes traders’ intention, but is a key feature of wash trade manipulation tactics.

A technique developed by the CME to prevent wash trades at the engine level was rolled out in the middle of 2011 [20] and updated in the summer of 2013 [14]. However, it only monitored the same-priced buy/sell orders from trading accounts with the same beneficial ownership [14] (example in Table II). The lack of the surveillance mechanisms for wash trades with multiple orders or traders (example illustrations in Tables III and IV) left it possible for collusive parties to create a number of transactions that give a false appearance of large trading volumes.

SECTION III

## WASH TRADE DETECTION METHODOLOGY

### A. Analysis Terminologies

To analyze the wash trade strategic behaviors, the definitions and terminologies in [24] are adopted and revised to formalize the trading properties and market changes. The effect of wash trade can be represented by the position of the whole trading collusion, where position is the amount of equities held by a trader. As the wash trade is merely fraudulent activities rather than true trading actions, each participated trader tends to maintain his own positions unchanged for minimizing the unnecessary financial loss, and therefore, the position of the whole wash trade collusive group is also not changed. During the wash trade process, the position change is caused by a number of orders from the trader in the collusive group and can be defined as TeX Source\begin{equation*} \textrm {Position} + \textrm {Orders} \to \textrm {Position}. \end{equation*} Position is comprised of a sequence of orders TeX Source\begin{equation*} \textrm {Position} = \{(\mathrm {Order}_{1}), (\mathrm {Order}_{2})\ldots (\mathrm {Order}_{n})\} \end{equation*} where each order is defined as TeX Source\begin{equation*} \textrm {Order} = ({\textrm {Trader}}\_{}{\textrm {ID}}, \textrm {Type}, \textrm {Price}, \textrm {Volume}) \end{equation*} where Type = buy $\vert$ sell. Representing the order Type buy and sell by positive and negative signs, respectively, and affixing the sign to the Trader_ID and Volume, a sell order can be represented as TeX Source$$\textrm {Order} = (-{\textrm {Trader}}\_{}{\textrm {ID}}, \textrm {Price}, -\textrm {Volume}).$$ By this, the orders in Table IV can be illustrated as TeX Source\begin{align*} \textrm {Position}=&\{ (A, 125, 500), (-B, 124.2, -450) \\&~ (B, 125.5, 450), (-A, 125, -500) \}. \end{align*} The buy/sell orders having matched prices can be merged as TeX Source\begin{align*} \textrm {Position}=&\{ ( A-B, 125, 500-450=50 ) \\&~ ( B-A, 125.5, 450-500=-50 ) \}. \end{align*} As discussed in Section II-A, prices 125 and 125.5 are represented as transaction prices. The difference between the executable limit prices is calculated as the margins of the transaction prices. In this case, the transaction price 125 has the margin $125-124.2=0.8$, and the transaction price 125.5 has the margin 125.5 − 125 = 0.5. We merge the potential transactions who price margins are overlapped, i.e., 125 + 0.8 and 125.5 + 0.5 are overlapped. After the merge, we rerepresent the positions, i.e., the margin between 124.2 and 125.5 is represented as: 124.85 ± 0.65 TeX Source\begin{align*} \textrm {Position}=&\{ A-B+B-A, 124.85\pm 0.65, 50-50 \} \\=&\{ {{\unicode{0x201C}}0,\unicode{0x201D}}~124.85\pm 0.65, 0 \} \end{align*} where the Trader_ID calculation is carried out as a symbolic operation, and 0.65 is represented as the transaction margin $\delta ^{T}$ and 124.85 is the transaction price $P^{T}$. The zero-valued signed trader ID implies that each collusive trader transacts at both sides (buy and sell) of the market and the zero signed volume indicates the total amounts of the transactions in both sides are zero. No equity is really bought or sold. Therefore, the unchanged position, represented through zero-valued signed trader ID and signed volume, indicates the wash trade activities in certain collusion.

### B. Wash Trade Among Multiple Traders

As the FCA and CESR pointed out in their consultation reports [11], [12], it is difficult to distinguish a wash trade, because the format of trading collusions varies and the collusive transactions can be buried in the mass numbers of normal trading activities, such as the complex network reported by NANEX on May 31, 2013 [25], where vertices illustrate traders and directional connections among vertices represent the transaction between traders. We utilize this idea in [25] and represent submitted limit orders (from a number of traders) by a graph, where vertices represent traders, and the short arrows affixed to the vertex represent the orders submitted by the trader (buying and the selling orders are represented by arrows pointing inward and outward, respectively) and the dotted arrow lines represent the possible executed orders according to the matching rule discussed in Section II-A. An example of wash trade action mixed up with legitimate trading orders is shown in Table V and illustrated by the graph in Fig. 1. Among the 14 orders submitted by 6 traders in this example, four pairs (#1–#4 in Table V) of wash trade orders are deliberately submitted by four traders with tight submission intervals, executable prices, and mostly matched volumes so that the orders in each pair are suspiciously easy to match and execute. In Fig. 1, the possible executions of the orders are illustrated by four dotted arrow lines: each dotted arrow line connecting one pair of matched orders, and the arrowhead indicating the transaction direction of the financial equity, i.e., $\boldsymbol {A}$ pointing to $\boldsymbol {B}$ means trader $\boldsymbol {A}$ sells shares of equity to trader $\boldsymbol {B}$. From the illustration in Fig. 1, when participating wash trade activities, traders ($\boldsymbol {A}$, $\boldsymbol {B}$, $\boldsymbol {C}$, and $\boldsymbol {D}$) connect as a closed simple cycle (dotted arrow lines) and continuous transactions among the traders flow throughout the cycle in one single direction (either clockwise or counterclockwise) with each trader along the pathway passing the parcel [26]. After a complete transaction loop, the beneficial interest has been transferred across the collusive group, and no traders in the group have an actual position change.

Fig. 1. Closed connection cycle of traders and the possible execution flow along the cycle in wash trade action (14 orders in Table V are mapped to the graph).
Table V Example of Wash Trade in a Sequence of Limit Orders From a Number of Traders

The no beneficial interest change of all collusive traders in wash trade activities can also be calculated by the terminologies defined in Section III-A as (2).

Equation (2) shows the possible execution [dotted arrow line (1) in Fig. 1] of two orders in pair #1 in Table V due to the matching rule, execution occurring on earliest orders with matched prices, as discussed in Section II-A. Similarly, the executions of matched pairs #2–#4 in Table V [dotted arrow lines (2)(4) in Fig. 1] are represented by (2). The aggregated results of those executions are calculated in (2), where 50 shares of volumes are remained due to the mostly matched volumes tactic between any two smart manipulator neighbors to avoid regulatory inspections [26]. The unmatched volumes (for example, 2%) can then be defined as the matching margin $( {\delta }_{v} )$. Similarly, the differences between the limit order prices and the transaction prices can be defined as the limit price margin $( {\delta }_{p} )$ and the transaction margin $( {\delta }_{p}^{T} )$, respectively. In the following case, ${\delta }_{p}^{T}=0.005\!:$ TeX Source\begin{align} \textrm {Position}=&\{ (-\boldsymbol {A}, 125.00, -1450), (\boldsymbol {B}, 125.01, 1500)\notag \\&~ (-\boldsymbol {B}, 124.95, -1500), (\boldsymbol {C}, 125.01, 1450)\notag \\&~ (-\boldsymbol {C}, 125.00, -1450), (\boldsymbol {D}, 125.01, 1500)\notag \\&~ (-\boldsymbol {D}, 125.01, -1450), (\boldsymbol {A}, 125.01, 1450) \}\notag \\=&\{ (-\boldsymbol {A}+\boldsymbol {B}, 125.00+0.01, +50)\notag \\&~ (-\boldsymbol {B}+\boldsymbol {C}, 124.95+0.06, -50)\notag \\&~ (-\boldsymbol {C}+\boldsymbol {D}, 125.00+0.01, +50)\notag \\&~ (-\boldsymbol {D}+\boldsymbol {A}, 125.01+0, 0) \}\notag \\=&\{ -\boldsymbol {A}+\boldsymbol {B}-\boldsymbol {B}+\boldsymbol {C}-\boldsymbol {C}+{~\boldsymbol {D}}-\boldsymbol {D}+\boldsymbol {A}'\notag \\&~124.95+0.06, 50-50+50+0 \}\notag \\=&\{ 0{'}, 125.005\pm 0.005, +50 \}. \end{align}

Furthermore, as shown in Table V, the time intervals between different pairs can vary as random events occurred in one single trading day. To avoid being detected as suspiciously trading action, in practice, smart manipulators tactically place the pairs at separated time points as the examples in Table V, where the time differences among any two pairs are completely different and random. To achieve this, manipulators carefully design each pair of matched orders to minimize the possible financial loss from price changes in the time period (i.e., from 9:00 to 10:50 in Table V) and to maintain the positions of their whole collusive group at zero. The separated arrangement of the matched pairs increases the complexity of detecting a wash trade under a mixture environment of both normal and manipulative trades.

Additional to the example in Table V and Fig. 1, the matched pairs among any two manipulators can also be constructed by a number of limit orders, as shown in Table III, rather than simply matched one-to-one sell and buy orders (as the pairs in Table V). For example, the matched pair #1 in Table V can be constituted by four selling orders and one buying orders, as shown in Table VI and Fig. 2.

Fig. 2. Multiple matched orders between two manipulators in wash trade action (14 orders in Table V and 5 orders in Table VI are mapped into the graph).
Table VI Example of Matched Pair Composed of Multiple Orders in Wash Trade Activity

In the examples, the submission of four sell orders is followed tightly by one large buy order, which matches, potentially executes, and removes all (or most) volumes of previous four sell orders. The graph of the traders and the transaction flow are revised in Fig. 2, where the #1 matched pair between $\boldsymbol {A}$ and $\boldsymbol {B}$ is illustrated by four short outward arrows affixed to $\boldsymbol {A}$ connecting with one short inward arrows affixed to $\boldsymbol {B}$ through the dotted arrow and other parts of the structure of the whole closed cycle of the traders is remained. In the example in Table VI, since the buy order #05 is submitted later than the sell orders, it will be executed at the prices of four sell orders, i.e., order #05 will be first executed as 450 shares at 124.99 with order #01, and then, another 450 shares executed at 124.98 with order #02 and so on.

### C. Wash Trade Features

From the discussion in Sections III-A and III-B, the strategy that constructs a wash trade activity has the following two key features.

1. Feature 1:Matched orders—as the first step of wash trade manipulation, traders deliberately submit the matched orders to the market in tiny time intervals to guarantee the execution; those orders can be one-to-one (examples in Table V) or one-to-many matched (example in Table VI); this feature refers to dotted arrow lines in Figs. 1 and 2.
2. Feature 2:Closed transaction cycle—any single execution of the matched orders does not refer to a wash trade manipulation unless those executions constitute a closed cycle as illustrated in the examples shown in Figs. 1 and 2; this feature refers to closed cycle of dotted arrows among the traders in Figs. 1 and 2.

Considering the example in Table V, the manipulators set up the matched orders from the #01 order at time 9:00:000, but the wash trade is not completely constructed until the submission of the #12 order at time 10:50:001, which closes the transaction cycle. Therefore, a wash trade can be detected through detecting the matched orders and closed cycle in two steps.

1. Step 1:Detect the suspiciously matched order pairs $S$ according to the matching rule and wash trade features, tight submission intervals, executable prices, and mostly matched volumes TeX Source\begin{equation*} \textrm {Order Pair} = \left\{{ \sum {\textrm {Orders}} }\right\}=\big \{ +T_{m}-T_{n}, {P}^{T}{\pm }\delta _{p}^{T}{,\pm }\delta _{v} \big \} \end{equation*} where $+ T_{m}-T_{n}$ represents trader $T_{n}$ selling shares of equity to $\vphantom {\Big ({}} T_{m}$ and $\delta _{v}$ and $\delta _{p}^{T}$ represent the matching margin of volume and transaction price ${ P}^{T}\!$.
2. Step 2:Among $S$, find the order pairs whose transaction price margins are overlapped, in those pairs, if some pairs fulfill the condition TeX Source\begin{equation*} \textrm {Position} = \left \{{ \sum \limits _{k\in S} {\mathrm {Order Pair }}_{k} }\right \}=\big \{ {{\unicode{0x201C}}0,\unicode{0x201D}}{P}^{T}{\pm }\delta _{p}^{T},\pm \delta _{v} \big \} \end{equation*} a wash trade alert is triggered.

To further formulate those features, we define the #k order $L$ submitted by trader $T_{n}$ at time $t_{k}$ as TeX Source\begin{equation*} L_{k}=(t_{k},\pm T_{n}, P_{k}, \pm V_{k}) \end{equation*} where $P_{k}$ and $V_{k}$ are the #k order price and volume, respectively, and the positive and negative signs ± represent buy and sell operation. The matching margin $\delta$ is defined as a vector $\delta =[ \delta _{p},\delta _{t},\delta _{v} ]$ with three small positive values for price, time, and volume, respectively. If buy order #$\boldsymbol {K}$ is matched with $\boldsymbol {K} - 1$ sell orders from #1 to #$\boldsymbol {K}{-1}$, their features have the following.

1. Tiny time interval TeX Source$$| t_{1}-t_{\boldsymbol {K}} |<\delta _{t}.$$
2. Executable tiny price difference TeX Source$$P_{\boldsymbol {K}}\mathrm {-min}( P_{1},\ldots ,P_{\boldsymbol {K}-1} )<\delta _{p}.$$
3. Mostly matched volume TeX Source$$\left |{ \sum \limits _{k=1}^{K-1} V_{k} -V_{\boldsymbol {K}} }\right |<\delta _{v}.$$

If $\boldsymbol {K}$ orders among $N$ traders construct wash trade action, their features meet the following condition, where $r_{nk}$ is the indicator that if order #k from trader $T_{n}$ is a sell order, then ${ r}_{nk}=-1$, and $r_{nk}=+1$ for buy order:TeX Source\begin{align} \textrm {Position}=&\left \{{ \sum \limits _{n=1}^{N} {r_{nk}T}_{n},{P}^{T}{\pm }\delta _{p}^{T},{\pm \delta }_{v} }\right \}\notag \\=&\big \{ {{\unicode{0x201C}}0,\unicode{0x201D}}~{P}^{T}{\pm }\delta _{p}^{T},{\pm \delta }_{v} \big \}. \end{align} The features in (2)(5) are detected in Step 1, and the feature in (6) is detected in Step 2.

### D. Problem Formulation

To discover the wash trade before it completely occurs (fulfilling the recent regulations on preventing the attempts of wash trade), the detection approach is applied to the limit order streams instead of the trade records. The order stream is the sequence of limit orders received by the trading platform from numerous traders. The stream is updated by the order event, which could be submission, modification, cancellation, or execution. As shown in Table I, an order includes ID, trader ID, time, buy/sell sign, price, and volume. In this paper, we assume that the orders in the stream are on one specific stock. Thus, the stock information in the stream can be ignored once the specific stock is determined. This assumption, on one hand, narrows the scope of this study specifically on the underlying problem and, on the other hand, conforms the practical trading platform environment, where the algorithm can be easily applied to selected equity.

Step 1, detecting the suspiciously matched order pairs according to (3)(5), is termed coarse detection, while Step 2, recognizing the closed cycle based on (6), is termed fine detection. The limit order stream is then required to be preorganized to commence with those two tasks. A physical time sliding window sized $\theta _{T}$ is specified, and the trading order stream can be split into two queues of consecutive orders: 1) buy order queue, $Q_{b}$ and 2) sell order queue, $Q_{s}$ each of which maintains a size $\theta _{T}$. That is, if a new order $L_{k}$ is a buy order, push it into ${ Q}_{b}$; otherwise push it into $Q_{s}$. If the length of the updated queue is larger than $\theta _{T}$, pop the earliest orders to maintain the length of the sliding window. The algorithm is described in Algorithm 1. Since the order stream is measured in order event time, $\theta _{T}$ is maintained by calculating the difference between the physical time stamps of the first and the last orders in the queue. Hence, the number of orders in each queue ultimately depends on the underlying frequency of order activities and differs across time (Algorithm 1 is named WASH_TRADE_DETECT, because it will involve all detection subfunctions, which are discussed in follow-up sections).

#### Algorithm 1 Wash Trade Detection – Pre-Organization

The intention of the wash trade, increasing transaction volume, indicates that the wash trades are usually associated with large-sized orders. Consequently, the orders with volumes smaller than a predefined threshold $\theta _{V}$ are ignored, where the threshold can be set up according to the requirements of the detection solidness. Given the limit order queues $Q_{b}$ and $Q_{s}$, the coarse detection can then be formulated as follows. For a large incoming order, examine in the opposite order queue for one or multiple potential matching orders, which are characterized by (3)(5). The result of the coarse detection comprises all order combinations matched with the incoming order. Collusions may exist among those combinations.

Similarly, the fine detection can be formulated as follows. Given the matched order pairs, find certain sets of pairs in which the sum of signed trader ID and signed volume have zero values as the illustrations in (6). Defining coarse detection and fine detection as the function COARSE_DETECT and FINE_DETECT, respectively, the wash trade detection is further designed in Section III-E.

### E. Coarse Detection—Matching Search

The matching relationship of wash trade order pairs is summarized in (3)(5). In the coarse detection process, three conditions are sequentially checked to identify the potential matching.

The time matching margin $\delta _{t}$ in (3) shows the tiny interval between the orders in a pair. Setting the length of the order queue $\theta _{T}$ in Algorithm 2 and Algorithm 1 equivalent to $\delta _{t}$, the coarse detection is designed as the illustration in Fig. 3: given the incoming order ${L}_{k}$, examining the opposite orders in previous $\delta _{t} (\theta _{T})$ period for potential matched orders, which are determined by price and volume margin, $\delta _{p}$ and $\delta _{v}$. Algorithm 1 is then revised as Algorithm 2, which includes both the COARSE_DETECT and FINE_DETECT functions, where the {MP} is the detected matched pairs of COARSE_DETECT.

Fig. 3. Coarse detection scheme.

#### Algorithm 2 Wash Trade Detection Algorithm

In financial markets, only the orders following executable price rules [14] match and execute. Therefore, the price margin $\delta _{p}$ in (4) is constrained by the following rules.

1. Rule 1:Sell order matches buy orders with equal or higher prices.
2. Rule 2:Buy order matches sell orders with equal or lower prices.

The example in Table VI, where the #5 buy order price is slightly higher than all previous sell orders, shows Rule 2 of price margin $\delta _{p}$. Considering the price margin $\delta _{p}$, the coarse detection is designed as follows. Given the incoming buy (sell) order ${L}_{k}$, among all executable orders (in terms of the executable limit prices) in the previous $\delta _{t} (\theta _{T})$ period, find the order pairs having the best matching volumes.

The volume matching can be defined as a function VOL_MATCH $( Q^{t,p}{,} L_{k} )$, where $Q^{t,p}$ is a set of orders after being filtered by $\delta _{t}$ and ${ \delta }_{p}$. Given this, COARSE_DETECT ($Q$, $L_{k}$) is defined in Algorithm 3, where $Q$ contains all opposite orders in the previous ${\delta }_{t}$ periods and ${ L}_{k}$ is the incoming order. Based on the above discussions and the constraints in (5), the function VOL_MATCH $( Q^{t,p}, L_{k} )$ is defined as follows: given incoming order $L_{k}$ and a set of matched orders $Q^{t, p}$, find subsets $S$ of the order pairs from $Q^{t, p}$ such that TeX Source\begin{equation*} \left |{ \left ({ \sum \limits _{i\in S} V_{i} }\right )- V_{k} }\right |\le \delta _{v}. \end{equation*} The number of limit orders in subset $S$ is $n_{s}$ ($n_{s}$ is smaller than the size of $Q^{t,p})$. In essence, the problem of VOL_MATCH is a practical case of a more general problem called the knapsack problem [27] [28] [29]. The name knapsack refers to the problem of filling a knapsack of capacity $W$ using a subset of $m$ items $\{ 1,\ldots ,m \}$, each of which has a mass and a value, such as the total weight of the selected items is less than or equal $W$, and their total value is maximized. The volume matching problem can be viewed as a simplified form of the knapsack problem: given a capacity $V_{k}$ (the knapsack size) and a set $Q^{t,p}$ of items, each having nonnegative size $V_{i}$, find all possible subsets $S$ of items to eventually make TeX Source\begin{equation*} \left |{ \left ({ \sum \limits _{i\in S} { V_{i}} }\right )-V_{k} }\right |\le \delta _{v}. \end{equation*}

#### Algorithm 3 Coarse Detection

Due to the similarity of the two problems, the widely used approach solving the knapsack problem, dynamic programming, is employed in VOL_MATCH $( Q^{t,p}, L_{k} )$. The main principles of dynamic programming are that we have to come up with a number of subproblems so that each subproblem can be solved easily from smaller subproblems, and the solution of the original problem can be obtained easily once we know the solutions to all the subproblems [30]. Dynamic programming has been studied thoroughly in optimization problems in [31] and [32].

To solve the special form of the knapsack problem under $N$ limit orders and volume $V_{k}$, denoting the final subset of orders in an optimum solution for the original problem as $S_{N}$, we then use the notation OPT $( N, V_{k} )$ to denote the sum of the order volumes of the first $N$ orders in the subset $S$ under the constraint $| \mathrm {\mathbf {OPT}}~( N, V_{k} )- V_{k} |\le \delta _{v}$. The sum in the first $N-1, N-2,\ldots ,1$ orders can then be represented as OPT $( N-1, V_{k} )$, OPT $( N-2, V_{k} ), \ldots , {\mathbf{OPT}}( 1, V_{k} )$. To determine OPT $( N, V_{k} )$, we not only need the solution of OPT $( N-1, V_{k} )$, but also need to know OPT $( N-1, V_{k}- V_{N} )$, the best solution for the first $N-1$ orders with the remaining capacity $V_{k}- V_{N}$, which constructs the constraint as $| \mathrm {\mathbf {OPT}}~( N-1,V_{k} )-( V_{k}- V_{N} ) |\le \delta _{v}$. The recursion can then be summarized as follows: if $L_{N}$ is not one of the orders in the final subset $S_{N}$, we can ignore the order $N$ and determine OPT $( N-1,V_{k} )$; however, if $L_{N}$ is one of the orders, we need to seek an optimal solution for the remaining orders, $1,\ldots , N-1$, which is OPT $( N-1,V_{k}- V_{N} )$. Using this set of subproblems, we are able to express the OPT $( N,V_{k} )$ as a simple expression in terms of values from smaller problems. Therefore, the recursion is summarized as two conditions.

1. If $L_{i}{\notin }S_{N}$, then OPT $( N,V_{k} )= \textrm {OPT}( N{-1,}V_{k} )$.
2. If $L_{i}{\in }S_{N}$, then OPT $( N,V_{k} )=V_{N}+ \textrm {OPT}~( N{-1,}V_{k}{-}V_{k} )$.

This recursive process is reorganized based on the above two conditions to give Algorithm 4. This recursive algorithm can be used by invoking OPT $( N,V_{k} )$ for $N$ limit orders and the capacity ${ V}_{k}$.

### F. Fine Detection—Collusion Search

${S}_{N}$, orders from $Q$ matched with the incoming order $L_{k}$, is the result of the coarse detection. To further detect the potential closed cycle of transactions, the orders in ${S}_{N}$ are represented by (1), where the trader ID and the volumes are affixed with trading direction signs. After the conversion, ${S}_{N}$ is defined as $S_{N}^{c}$, the input of the fine detection algorithm FINE_DETECT. As discussed in Section III-A and (2), the order pairs with potential transaction prices with overlapped price margins are grouped together for potential collusion detection.

Detecting trader collusion is treated as discovering the combinations $C$ from $S_{N}$ such that the sum of the signed trader equals zero as illustrated in (6). This process can be considered equivalent to a special case of the previously defined volume matching problem: given a capacity $W=0$ (the knapsack size) and a set of signed trader pairs, each having a value (e.g., $\{ {+A}\}$ and $\{ {-A} \})$, select all possible subsets ${C}$ of signed trader pairs are defined in Section A and can be implemented by operator overloading. The subset $C$ is considered as trading collusion in a wash trade.

Algorithm 5, derived from Algorithm 4, provides the recursive solution for FINE_DETECT $( {S}_{N}^{c} )$.

SECTION IV

## EXPERIMENTS AND EVALUATION

Evaluating a detection model usually relies on real data of both normal and abuse cases. However, due to the limited reports on wash trade manipulation and regulatory rules prohibiting the disclosure of illegitimate market data, the availability of the examples of wash trade behaviors in capital markets is far less than the availability of routine normal trading records. Therefore, to evaluate the proposed detection model, it is acceptable to the financial industry that all the characteristic patterns of wash trade examples are reproduced, and then injected into original trading records to generate a mixed data set of normal and abuse cases [33]. Randomly synthesized exploratory manipulation cases can mimic any possibility of wash trade scenarios, i.e., we can generate the matched order at any time with any volume size as well as matching margins. Synthetic exploratory financial data are also accepted in academia for evaluating the proposed model when real market data are hard to collect [15], [16], [34]. In this paper, the experimental evaluation is composed of two parts.

1. Part 1:Experimental evaluation using original trading data sets from the market.
2. Part 2:Experimental evaluation using original trading data sets injected with synthetically generated wash trade scenarios following the analysis in Section II-A.

### B. Determining the Marginal Parameters

As discussed in Section II-A, the submissions of the matched orders in a wash trade are usually within tiny time intervals $\delta _{t}$ so that the manipulated execution can compete against the action of normal traders who may pick the orders unintentionally [11], [14]. Consequently, the normal execution time shows a reasonable reference to the time interval $\delta _{t}$, which otherwise is not available because of the lack of the statistical studies of the real wash trade cases.

Usually, the execution time of a limit order is strongly associated with its volume [8], [18], [36]. Therefore, a more reasonable measure of the average execution time of normal limit orders can be given by volume-weighted average execution time (VWAT), defined as TeX Source$$T_{\mathrm{ VWAT}}=\frac {\sum \nolimits _{j} ( T_{j}\ast {v}_{j} ) }{\sum \nolimits _{j} {v}_{j} }$$ where $T_{\mathrm{ VWAT}}$ is the volume-weighted average execution time, $T_{j}$ is the execution time of order $j$, $v_{j}$ is the volume of order $j$, and $j$ is each individual order [36]. In practice, if the wash trade orders are submitted with time intervals larger than $T_{\mathrm{ VWAT}}$, they are apparently easy to pick by other legitimate traders. Accordingly, by setting ${\delta }_{t}=T_{\mathrm{ VWAT}}$, this approach covers a time period for all possible wash trade activities. The order execution time $T_{j}$ and $T_{\mathrm{ VWAT}}$ across the seven stocks in the test data set are calculated and summarized in Table VII.

Table VII VWAT and Average Volume

Theoretically, the wash trade can be carried out by a large number of small orders. However, in practice, the wash trade orders are usually larger than the average volume of the normal trading orders, because a large number of orders can significantly increase the uncertainty of the order executions, which may bring a risk of loss if it does not follow the expected arrangements. Therefore, the average order volume of each stock is selected as the threshold $\theta _{V}$ for the order volume filtering discussed in Section IV-D. The average volume across seven stocks is also calculated and summarized in Table VII.

In addition, the volume matching margin ${\delta }_{v}$ is selected as percentages: 0%, 1%, 2%, 3%, 4%, and 5% indicating the ratio of not matching (1% refers identifying orders with 99% matching volumes). In the example, in Table VI, the #5 buy order volume (1500 shares) is $\sim 96.7$% matched with all previous sell orders (1450 shares). The price margin $\delta _{p}$ is unconstrained in the detection so that any orders following the price matching rules Rules 1 and 2 are scanned for possible matching pairs under the condition in (5).

Under the configurations of $\delta _{t}, \theta _{V},{\delta }_{v}$, and $\delta _{p}$, Algorithm 4 reflects the fact that given an order $L_{k}$, among all executable priced orders (unconstrained $\delta _{p}$ but following Rules 1 and 2) with volume not smaller than $\theta _{V}$ in a previous $\delta _{t}$ time period, find the matched orders that executed at least ($1-\delta _{v})$% volumes of $L_{k}$.

### C. Part 1: Experiments on Original Datasets

In Part 1 experiment, the wash trade detection algorithm is evaluated on the original seven data sets using the parameters in Section IV-B. The evaluation shows the applicability of the proposed algorithms to real transaction data and also examines the legitimacy of the transactions in original data set. Since the original data sets do not contain any reported wash trade manipulation activities, it is assumed to only contain legitimate transactions. Thus, the evaluation measure is based on false negative rate ($\textrm {FNR})=({\textrm {FN}}/{\textrm {FN}+\textrm {TP}})$, which is based on false negative (FN), defined as normal cases detected as a wash trade, and true positive (TP), defined as normal cases detected as normal.

The results of the experiments (max FNR values on each stock data set are highlighted) are shown in Table VIII. It is clear that in each data set, some transactions are detected as suspicious wash trade actions, and the numbers of the detected actions increase across the increases of volume margins. Most of the data sets do not contain any suspicious actions when the volume margin is set to 0%, and the Apple stock shows the highest FNR rate (1.263%) at the 5% volume margin.

Table VIII Experiment Results (FNR) Across Original Data Sets of Seven Stocks

With careful inspection and consultation with the financial industry experts, we determined that the detected FN cases show very similar features to the wash trade actions although not reported by the regulators. The detected FN cases fall into two formats, as shown in Table IX.

Table IX False Negative Cases of Stock AAPL

### D. Part 2: Experiments on Data Sets With Injected Wash Trade

Testing with synthetic data can mimic any possible wash trade cases and can also evaluate the robustness of the proposed algorithms under any wash trade scenarios, i.e., random combinations of one or multiple traders wash trade activities.

#### 1) Wash Trade Case Generation

The typical wash trade activities are reproduced and injected in each stock data set. The activities are reproduced in two format groups:

1. Group 1:One order matched with single opposite order, termed single-matching.
2. Group 2:One order matched with multiple opposite orders, termed multimatching.

Each group contains three different sets according to trader numbers in the wash trade collusion: 1) set #1 has examples with one trader in a trading collusion and 2) sets #2 and #3 have two and four traders in a trading collusion. To ensure a comprehensive assessment of the approach, in each set, volume matching margin $\delta _{v}$ is selected as a percentage of 0%, 1%, 2%, 3%, 4%, and 5% indicating the ratio of not matching (1% indicating the orders from two sides are 99% matching). There are ten examples for each combination of the above parameters.

The examples in Tables X and XI show an excerpt of the generated wash trade cases: 1) case #1: two traders with 5% single matched volumes; 2) case #2: four traders with 5% single matched volumes; and 3) case #3: two traders with 5% multiple matched volumes. The volume, time, and matching margin of the synthetic orders are all randomly generated. For example, in case 3 in Table XI, buy order volume $v_{b}$ in pair 1 is randomly generated (under condition: $v_{b} \ge \theta _{V} )$, and all sell orders in pair 1 are also randomly generated under the condition that volume sum $V_{s}$ of all sell orders satisfies: $v_{b}\ast ( 1-{\delta }_{v} )\le V_{s}\le v_{b}$. The time of orders in pair 2 is also randomly generated as long as they are much later than the time of pair 1. Similarly, the prices of order in each pairs are randomly generated following the price matching rules discussed in Section III-E. Similar to the examples in Table VI, two order pairs in Table XI have different transaction prices. The buy order in pair #1 in Table XI will be executed with the previous four sell orders at 58, 58.01, 58.02, and 58.03, respectively. Therefore, the generated examples have different transaction prices within transaction margins.

Table X Generated Single Matched Wash Trade Cases (${\delta }_{v}=5$%)
Table XI Generated Multiple Matched Wash Trade Cases (${\delta }_{v}=5$%)

Such random generation of synthetic cases provides the possibility of thorough evaluation of the proposed algorithms using any possible wash trade cases.

As discussed before, the models are tested on seven real stocks, each of which contains two groups of injected examples. Each group has three sets (one, two, and four traders), and each set contains six margin configurations. Under each configuration, there are ten examples. There are overall ${7\times 2\times 3\times 6\times 10=2520 }$ different experiments carried out as a robust evaluation plan for the proposed detection model.

The generated wash trade orders are then injected into the data of corresponding stocks making the test data a mixture of both normal and abuse patterns. The time intervals between different pairs are selected randomly as examples in Tables X and XI. For example, in case #1 in Table X, the time of pair 2 is randomly selected after the pair 1 occurs. In addition, the generated orders in each pair are separated by several normal orders in original data sets to mimic the practical case in the markets. This is a practical approach to simulate how these wash trade scenarios occur in the real world [37].

#### 2) Performance Evaluation Metrics

The performance evaluation of the proposed model is based on two popular statistical measures: 1) sensitivity (SEN) and 2) specificity (SPE). Both of them are based on the confusion matrix, where a false positive (FP) is defined as a wash trade case detected as normal; a true negative (TN) is defined as a wash trade case detected as wash trade, and an FN and a TP that are defined in Section IV-C. The SEN, defined as ${ \textrm {SEN}=\textrm {TP}/(\textrm {TP}+\textrm {FN})}$, represents the rate of correctly detecting normal trading orders (also known as the TP rate), while the SPE, defined as ${ \textrm {SPE}=\textrm {TN}/(\textrm {FP}+\textrm {TN})}$, refers to the rate of correctly detecting wash trade cases (also known as the TN rate).

#### 3) Experimental Results

The experimental evaluations across seven stocks are summarized in Fig. 4, where the average SEN and SPE values across different numbers of traders are illustrated against the margin values.

Fig. 4. Experiment results across seven stock data sets.

From Fig. 4, the SPE values for single matching show that the algorithm completely detects the single-matching cases, which is the simplest wash trade format and is apparently easy to detect. The SPE values for multimatching vary across the margins and the different stocks as the illustrations in Fig. 4.

The SPE values increase with the increase of the margins and approach 100% when the margin is higher than 5%. The result conforms to the design expectation of the detection approach. More possible collusions will be detected under bigger matching margins. As discussed in Section II-A, mostly matched (for example, 98%) orders might be built by smart manipulators for standing aside from the inspections. A big marginal value compensates this smart tactic, and the configurability of the margin increases the practicability of the model in a real trading context.

The SEN values show more volatile results across the margins. In most experiments, the SEN values reduce as the margin increases indicating more normal activities incorrectly detected as wash trade cases. On the contrary, the highest SEN value appears at the zero margin value.

From the experimental results, it can be concluded that the proposed approach detects the primary wash trade scenarios effectively and consistently across the selected stocks with SEN values in a range of 97%–100%.

SECTION V

## CONCLUSION

A wash trade activity detection approach is proposed after thoroughly studying the various scenarios of wash trade behaviors. The analysis of the collusive activities in wash trades through a graph of traders with transactions represented by the directed connections among the vertexes shows the basic structure of the collusion among multiple traders following a closed cycle of the transactions among certain traders. Further studies also show that the limit orders in wash trades are usually submitted fast with mutually executed prices and matched volumes. According to the analyzed features, the proposed method is then split into steps defined separately in Algorithms 4 and 5.

There are two major innovations in the proposed method as follows.

1. Graph theory has been used to represent and model the collusive relationships of the traders in wash trade activities. The concluded fundamental structure of the closed-cycle structure within a trader graph simplifies the detection from the complexity of the collusive networks.
2. The wash trade order detection has been approached as a knapsack problem, which can be solved in two steps by the traditional dynamic programming approaches.

Instead of only detecting the same-priced buy/sell orders in the engine level detection mechanism in CME, the proposed method determines the wash trade activities by considering the suspicious matched orders as well as the collusive groups, which are according to the trading activities in a certain time period rather than a tiny time interval in real-time detection. Therefore, the proposed approach best suits overnight detection in real financial world. However, the rapidly growing trading frequency challenges detection mechanisms and hence implementing the proposed approach in real time in a computationally efficient way will be the focus of future work.

## Footnotes

This work was supported by the companies and organizations involved in the Northern Ireland Capital Markets Engineering Research Initiative.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available