Circuit: A JavaScript Memory Heap-Based Approach for Precisely Detecting Cryptojacking Websites

Cryptojacking is often used by attackers as a means of gaining profits by exploiting users’ resources without their consent, despite the anticipated positive effect of browser-based cryptomining. Previous approaches have attempted to detect cryptojacking websites, but they have the following limitations: (1) they failed to detect several cryptojacking websites either because of their evasion techniques or because they cannot detect JavaScript-based cryptojacking and (2) they yielded several false alarms by focusing only on limited characteristics of cryptojacking, such as counting computer resources. In this paper, we propose CIRCUIT, a precise approach for detecting cryptojacking websites. We primarily focuse on the JavaScript memory heap, which is resilient to script code obfuscation and provides information about the objects declared in the script code and their reference relations. We then extract a reference flow that can represent the script code behavior of the website from the JavaScript memory heap. Hence, CIRCUIT determines that a website is running cryptojacking if it contains a reference flow for cryptojacking. In our experiments, we found 1,813 real-world cryptojacking websites among 300K popular websites. Moreover, we provided new insights into cryptojacking by modeling the identified evasion techniques and considering the fact that characteristics of cryptojacking websites now appear on normal websites as well.

The associate editor coordinating the review of this manuscript and approving it for publication was Diana Gratiela Berbecaru . cious script code of cryptojacking is automatically executed 27 on the client side when a user visits a cryptojacking website. 28 Hence, detecting cryptojacking websites and filtering them 29 out in the web environment is crucial for protecting user 30 resources. However, the precise detection of cryptojacking 31 websites is complex and prone to errors. As script code 32 obfuscation techniques are frequently applied to crypto-33 jacking websites, it is increasingly growing more challeng-34 ing to detect cryptojacking based on the static analysis 35 approach. Furthermore, the cryptojacking websites' charac-36 teristics (e.g., running numerous threads or consuming high 37 resources of victims' computers) now appear on various 38 normal websites (e.g., live-streaming websites), thereby com-39 plicating the precise detection of cryptojacking websites. sites more precisely than existing approaches by detecting the websites. Among them, CIRCUIT detected 1,813 cryptojack-95 ing websites with cryptojacking behaviors, most of which 96 used evasion techniques to avoid cryptojacking detection. 97 CIRCUIT responded flexibly to evasion techniques in four 98 categories based on the evasion techniques modeled in the 99 experiment. Furthermore, by analyzing the distribution of 100 the number of threads of the collected websites, we demon-101 strated the limitations of the existing resource monitoring 102 and thread-count-based approaches and proved the efficiency 103 of CIRCUIT from the perspective of precise cryptojacking 104 detection (see Section IV). 105 C. CONTRIBUTIONS 106 We summarize our contributions below: 107 • We propose CIRCUIT, a precise approach for detecting 108 cryptojacking websites based on the JavaScript memory 109 heap. CIRCUIT is robust to evasion techniques applied 110 to cryptojacking websites to avoid cryptojacking detec-111 tion.

112
• Although evasion techniques were applied to most of the 113 identified cryptojacking websites, CIRCUIT succeeded 114 in detecting 1,813 cryptojacking websites from 300K 115 real-world websites.

116
• Modeling evasion techniques to avoid cryptojacking 117 detection allows us to provide new insights into cryp-118 tojacking as behaviors previously associated with cryp-119 tojacking now appear widely on normal websites.

121
This section describes the background knowledge related to 122 cryptojacking (Section II-A) and introduces related works on 123 cryptojacking detection (Section II-B). Cryptocurrency is a digital asset designed to function as a 127 medium of exchange. Cryptocurrency mining (cryptomin-128 ing) is the process of validating a cryptocurrency transac-129 tions. To gain cryptocurrencies (e.g., Bitcoin and Ethereum), 130 Proof-of-Work (PoW) is performed, which is a blockchain 131 consensus mechanism. In a nutshell, peers (i.e., miners) in 132 the PoW blockchain network solve complex mathematical 133 problems with taxing computational power. The fixed time 134 (e.g., 10 minutes for Bitcoin) rewards (i.e., cryptocurrency) 135 a peer who wins the race and mines the block. Mining is 136 computationally taxing because only the first miner who 137 solves the problem is rewarded. To strengthen the probabil-138 ity of finding a block, miners combine their computational 139 resources through public mining pools.
140 2) CRYPTOJACKING 141 Cryptojacking refers to the malicious behavior that intercepts 142 all profits arising cryptomining by using the visitors' 143 resources in a web environment, without their consent. 144 When visiting a website injected with cryptomining, a user's 145 hardly realize that they are infected. Figure 1 shows the 154 workflow of the cryptojacking process.

155
Cryptojacking is executed in the following three steps:  198 If an asynchronous function is executed (e.g., a callback 199 function), the JavaScript engine calls the web API, which 200 is provided by the browser. The web API stores an asyn-201 chronously executed function in the task queue. Thereafter, 202 the event loop [15] checks the status of the call stack and 203 task queue, and when the call stack becomes empty, the 204 first callback of the task queue is put into the call stack and 205 executed. 206

207
JavaScript has become one of the most popular lan-208 guages [9], [10], [33], and the cryptojacking that leverages 209 it has also been on the rise recently [5]. In JavaScript, 210 web workers enable multi-threaded processing. Previously, 211 JavaScript only supported a single-thread process, mean-212 ing that JavaScript could only process one task at a time. 213 Therefore, when a task was performed, the following task 214 waited until the previous task was completed. If websites 215 had heavy tasks that could not afford a single thread, they 216 became unresponsive due to the overhead. To address this 217 problem, a web worker [30], [31] was introduced to support a 218 multithread process in JavaScript. As cryptomining requires 219 a lot of resources to recursively check the validity of several 220 blocks connected to a cryptocurrency network (i.e., a heavy 221 task), it is indispensable that browser-based cryptomining is 222 implemented through a multi-thread process. Consequently, 223 the appearance of web workers has a significant influence on 224 making cryptojacking more active. In JavaScript, data types belong to two categories: primitive 227 value and reference value [6].  All data types, except wrapper objects, are contained in 241 the reference variables (e.g., array, object, and function).

242
As an example of these two data types, Listing 1 presents 243 the difference between primitive and reference values. access data is referred to as a reference in JavaScript. To understand code reuse in JavaScript, we introduce the con-256 cept of prototype-based programming language in JavaScript. Listing 2 presents an instance of the JavaScript code 265 used to describe the prototype chain, and Table 1 lists 266 the prototype chains for the corresponding code. As the 267 basic type of JavaScript is the object, all elements, such 268 as functions and arrays, are linked to a top-level object, 269 Object.prototype. The top-level object has null as its 270 prototype; therefore, the prototype chain ends.

288
A resource monitoring approach is based on the fact that 289 cryptojacking is a resource-intensive task [38], [40]. This 290 method detects a website as a cryptojacking website if the 291 computer resources (e.g., CPU usage) exceed a predeter-292 mined threshold when visiting the website. In particular, this 293 approach has been highlighted as a new detection mechanism 294 because it is not affected by script code obfuscation and is 295 more convenient than a blacklisting-based approach requiring 296 continuous management of blacklists.

298
As cryptojacking requires continuous mining, a thread with a 299 separate execution space was created to proceed with min-300 ing. Unlike a normal website, the number of threads on a 301 cryptojacking website is proportional to profitability [38], 302 [41]. Consequently, several approaches have found a dif-303 ference in the number of threads between cryptojacking 304 and normal websites, and proposed methods can be uti-305 lized for cryptojacking detection [41]. This approach detects 306 VOLUME 10, 2022 cryptojacking more flexibly than blacklisting-based or 307 resource-monitoring-based approaches. 308

309
Wasm is a binary instruction format that can run in mod-  Blacklisting-based approaches have two main limitations.

324
As this approach is solely dependent on the stored keywords, 325 keywords related to cryptojacking must be periodically col-   In P1, CIRCUIT first generates a heap graph that shows 360 the behavior of the script code running on the website to 361 detect cryptojacking, even if its script codes are obfuscated. 362 CIRCUIT then extracts reference flows, that refer to the refer-363 ence relations between objects in JavaScript. As the reference 364 flows can denote the call flow of objects, we decided that 365 the reference flows would represent cryptojacking behaviors. 366 Therefore, CIRCUIT stores the reference flows of known 367 cryptojacking websites as cryptojacking signatures. In P2, 368 CIRCUIT compares the reference flows of the target web-369 sites with the signatures. If the reference flow of the target 370 website resembles that of cryptojacking websites, CIRCUIT 371 identifies the target website as a cryptojacking website.

372
Key Idea: CIRCUIT utilizes the fact that script code obfus-373 cation does not directly affect the information stored in mem-374 ory, and web threads are stored in the memory area as objects. 375 Thus, it is very flexible for indistinguishable script codes 376 and can be analyzed by classifying web threads individually. 377 If a mining-related thread is discovered on a website, it is 378 identified as a cryptojacking site.

379
To precisely detect cryptojacking sites, we leveraged two 380 key observations as follows:  2) Distinguishable behaviors of cryptojacking. To gain 386 benefits, cryptojacking should perform its own mining 387 behaviors, distinguishable from normal websites, e.g., 388 as joining a mining pool → mining cryptocurrency → 389 sending rewards to attackers.

390
These two observations provide the following intuition: 391 since cryptojacking is utilized in a third-party library form 392 (i.e., cryptojacking families), the JavaScript call stack and 393 memory heap are comparable among websites using the same 394 cryptojacking [41]. Furthermore, each cryptojacking contains 395 its behavior; therefore, we can use the behavior as the signa-396 ture of cryptojacking and detect cryptojacking websites by 397 analyzing whether a particular website contains the same or 398 similar behaviors of cryptojacking.  their reference relations by taking heap snapshots using the 439 JavaScript engine. For instance, the V8 JavaScript engine [29] 440 provides data in JSON format, and object and reference infor-441 mation can be obtained by parsing the corresponding JSON. 442 Next, as described in Section II-A6, if objects with ref-443 erence relations are connected, a heap graph is constructed. 444 In the running example (Listing 3), ''foo'' exists in the 445 string node because it belongs to the wrapper object 446 as a string type of data. As the variable ''key'' refers to 447 the memory address where the value of ''foo'' is stored, it is 448 converted to an edge and connects ''Foo''. The variable ''a'', 449 created by the constructor function of class ''Foo'' is a value 450 that has the memory address for the created ''Foo'' object, 451 and therefore ''a'' is converted into an edge that connects 452 the node indicating the web page itself and the ''Foo'' node. 453 Thus, to access the value ''foo'' from a web page, we first 454 access ''Foo'' node by ''a'' edge which has the memory 455 address value of ''Foo'', and then access ''foo'' node by 456 a key edge that also has the memory address of ''foo''. 457 Figure 4 depicts the overall flow where Listing 3 is converted 458 into a heap graph.

459
The generated heap graph can express the reference rela-460 tions between the objects declared on the website; therefore, 461 we can grasp the passing of all object flows to access a 462 particular object. Consequently, the heap graph can identify 463 and display declared variables or objects, even though the 464 script code of a website is obfuscated.

466
CIRCUIT extracts reference flows from the generated heap 467 graph. Reference flows are defined as the reference relations 468 between objects in JavaScript, which denote the call flows of 469 objects. We first reduce the searching space by focusing on 470 the existence of a multi-thread. As previously explained in 471 Section II-A4, running a multi-thread is an essential property 472 for cryptojacking. Consequently, to determine whether a web-473 site runs multiple threads, we confirm whether a web worker 474 exists in the heap graph of the website. In general, if a website 475 runs multi-thread, the WebWorker object is contained in the 476 memory heap, as shown in Figure 5. Subsequently, CIRCUIT 477 first finds the WebWorker node in the heap graph to deter-478 mine whether the website runs multi-thread, and thereafter 479 CIRCUIT attempts to extract reference flows from the heap 480 graph. 481 VOLUME 10, 2022   The cryptomining script code has three areas: the head, 508 body, and tail. The head is a script code area for import-509 ing cryptojacking related resources (e.g., objects and vari-510 ables) with an external server link. The body is a code area 511 that declares the necessary functions and objects before the 512 mining operation is executed on a cryptojacking website. 513 Finally, the tail is a code area where an object is created 514 for mininig on the client side, and the mining is executed. particularly the mining operations performed on the body, 526 remain identifiable in the memory heap, we can use this 527 information to detect cryptojacking websites, irrespective of 528 code obfuscation. 529 Therefore, we collect the cryptomining script code pro-530 vided by the cryptojacking vendors. To extract reference 531 flows from the collected cryptomining script code, we create 532 an arbitrary website to open a web server inside and embed 533 the collected script code. We then implement a cryptomin-534 ing website using the collected cryptomining script code 535 by referring to the provided usage document and storing 536 the heap information of the JavaScript engine created when 537 the website is executed. Subsequently, we generate the heap 538 graph from the JavaScript memory heap and then extract 539 the reference flows from each web worker. The extracted 540 reference flows for each vendor are indexed by the name 541 of each vendor. Figure 6 shows examples of the extracted 542 reference flows from seven known cryptojacking websites. Finally, CIRCUIT detects cryptojacking websites using 545 extracted cryptomining reference flows. To confirm that a 546 target website contains cryptojacking, we extract all reference 547 flows from the target website and compare every extracted 548 reference flow to the indexed cryptomining reference flows. 549 Here, we employ an edit distance algorithm [17] and calculate 550 the edit distance between all the reference flows obtained 551 from the target website and all the indexed cryptomining 552 reference flows. If any pair shows an edit distance below 553 the predefined threshold (we set 5 as the threshold; see 554 Section IV-A), CIRCUIT identifies the target website as a 555 cryptojacking website. The algorithm that detects cryptojack-556 ing websites is presented in Algorithm 1.  the detected cryptojacking websites. We ran CIRCUIT on 564 a machine with Ubuntu 18.04 LTS, 3.8 GHz AMD Ryzen 565 processor, 32 GB RAM, and 1 TB SSD.

566
Dataset Collection: The experiment collected real-world 567 websites from the dataset. Specifically, we decided to collect 568 popular websites that have greater impacts on several users, 569 and then confirmed the existence of cryptojacking websites. 570 We collected 300,000 websites listed in Amazon's Alexa top 571 website service [7] and Majestic [28], which provide the 572 world's most popular website list for free, and then gathered 573 top websites in both lists to confirm the distribution of cryp-574 tojacking in the overall Internet environment. Furthermore, 575 to identify the website service field where cryptojacking is 576 distributed, we also collected an additional Alexa category 577 top service [8] that indexes websites by category. We col-578 lected a list of 6,000 websites, each with 500 of the most 579 popular rankings for 12 categories. Therefore, we collected 580 306,000 websites as our dataset to evaluate CIRCUIT (see 581  Table 2).

Memory Heap Collection:
We developed a crawler that 583 stores the memory heap area of a visited website using 584 the remote interface [13] and puppeteer [27] functions of 585 the Chrome browser [12]. This crawler visited the collected 586 306,000 websites, and after waiting for the website content 587 to finish loading (i.e., load event), it extracted a snapshot of 588 the memory heap area of the JavaScript engine. Here, if the 589 connection time of the website exceeds 30,000 ms or the 590 website cannot be accessed from the domain name system 591 (DNS) server, the crawler ignores the website. Therefore, our 592 crawler collected memory heap areas from 204,773 websites 593 to evaluate CIRCUIT, and the results are summarized in 594 Table 3.   cryptojacking behaviors (see Figure 6). Thereafter, from 601 the 204,733 heap graphs generated for common websites 602 (Table 3) between two graphs as an integer greater than or equal to 613 zero; if the distance is zero, the two input graphs are the same. 614 Hence, we set the threshold to 5 (defined in Section III-C) 615 and determined two graphs (i.e., two reference flows) with 616 an edit distance of below 5 as similar. We decided that 617 the target website that contains a similar reference flow to 618 cryptojacking signatures was the cryptojacking website.  Since all detected websites contain a reference flow sim-635 ilar to that of cryptojacking websites, the detected websites 636 contain the cryptojacking behaviors, either potentially or 637 directly. Manually inspecting all the detected websites is an 638 error-prone and burdensome task, and thus, we randomly 639 selected 100 websites (6%) and manually checked whether 640 they performed cryptojacking. To verify our results, as most 641 of cryptojacking websites leverage evasion techniques to 642 hide cryptojacking behaviors, we checked the CPU usage of 643 websites, an evaluation method that was used in the existing 644 approaches [35], [38], [42]; since we have already confirmed 645 that the websites detected by CIRCUIT contain cryptojack-646 ing signatures, we decided that it was valid to verify them 647 by further investigating the CPU usage. As a result, all the 648 100 selected websites exhibited over 55% CPU usage; 25 out 649 of the 100 websites showed over 90% CPU usage. The CPU 650 usage of the verified websites was significantly higher than 651 that of the normal websites; the normal websites exhibited 652 below 1% CPU usage on average. This result affirmed that 653 CIRCUIT successfully detected malicious websites that were 654 actually running cryptojacking behaviors.

655
The main advantage of CIRCUIT is that it has reported 656 fewer false positives. In existing approaches (e.g., Out-657 guard [41]), for example, if the number of threads on a 658 website is greater than the threshold, or if the resource con-659 sumption is higher than the threshold, all of them are deter-660 mined as cryptojacking websites. Although these websites 661 may use the resources of visitors, some of them ask for 662 the consent of the visitor, and most of them have a lower 663 influence on visitors than cryptojacking websites in terms of 664 resource consumption. Thus, we can argue that our result is 665 more precise and compact because CIRCUIT detects only 666 cryptojacking websites that clearly contain the cryptojacking 667 behavior.

669
As cryptojacking websites were blocked by the emer-670 gence of several applications, such as Dr.Mine [16] and 671 MinerBlock [25], attackers started hiding the mining script 672 code to avoid cryptojacking detection. Therefore, we gath-673 ered the evasion techniques found in our experiment and 674 summarized them as the following four evasion models (E1 675 to E4). Figure 8 shows the heap graphs for each evasion 676 technique. Obfuscation is obfuscating and compressing cryptojacking 679 script codes on a website, or to hide notable keywords in the 680 script code using the CharCode or eval function. This is 681 one of the representative evasion techniques used to avoid 682 cryptojacking detection, which makes it difficult to detect 683 95364 VOLUME 10, 2022 FIGURE 8. Illustrations of heap graphs that change by various evasion techniques. In (c), the same reference flow (i.e., thread #0) as in (a) remains identical even if the evasion technique is applied, thus, the edit distance between the two reference flows is zero. For (b) and (d), changes occurred one by one at the node and edge of the mining reference flow, respectively, but the edit distance between mining thread #0 in (a) and mining thread ''0'' in both of (b) and (d) is exceedingly small (the measured edit distance is 2). Note that the evasion technique for modifying the external server link does not affect the original heap graph.
Listing 5. Example code for obfuscating the script code.  mining threads. This technique does not change significantly 712 in mining script code, but it is an option that is often utilized 713 to bypass the detection method based on resource monitoring. 714 For instance, attackers can leverage this technique by adding 715 the following simple option (i.e., throttle) to their script 716 code:

717
Here, detection methods based on resource monitoring 718 and thread counts may fail to detect cryptojacking websites. 719 However, even if the number of mining threads decreases, 720 the behavior of existing reference flows is maintained (e.g., 721 mining thread #0 in Figure 8 (c)); therefore, CIRCUIT can 722 precisely detect these kinds of cryptojacking websites. 723

724
Cryptojacking websites bypass detection by embedding a 725 separate cryptojacking code, such as iframe, on the web-726 site, allowing cryptomining without a specific script code. 727 In addition, by applying obfuscation to the embedded cryp-728 tojacking code, cryptojacking detection becomes more diffi-729 cult. The sample code is presented in Listing 8. 730 However, as shown in Figure 8 (d), only the start node of 731 the reference flow is replaced with another object, and there is 732 no change in the internal behavior. Therefore, CIRCUIT can 733 detect cryptojacking websites even if this evasion technique 734 is applied.

736
As previously explained in Section II-B, some of the recent 737 approaches to detect cryptojacking have focused on the fact 738 that cryptojacking websites run several threads. AI learn-739 ing using this indicator effectively detects cryptojacking 740 VOLUME 10, 2022 websites, but several normal websites using multiple web 741 workers have also been mistakenly detected as cryptojacking 742 websites. Therefore, we checked the number of web workers 743 on these websites.   Figure 10 shows the generated heap graph 783 focusing on the identified reference flow after injecting the 784 cryptomining code of CoinIMP into the ''057.ua'' website, 785 which uses a web worker intentionally. In the heap graph, 786 a web worker created using Google's reCAPTCHA and a 787 web worker for cryptomining exist simultaneously, together 788 with seven other web workers, as shown in Figure 10. In this 789 example, the existing resource monitoring-based approach 790 or thread count-based approach determines that this website 791 runs cryptojacking before inserting the cryptomining code. 792 In addition, if we obfuscate the cryptomining code and insert 793 it into a website, blacklisting-based approaches fail to detect 794 this website as a cryptojacking website.

795
By contrast, since CIRCUIT considers an individual refer-796 ence flow for each web worker, it can detect only web workers 797 related to cryptojacking, even in a complex structure. When 798 similarity was measured based on the reference flow of Coin-799 IMP, the reference flow of the web worker used in Google's 800 reCAPTCHA showed an edit distance of 11.0, whereas the 801 injected cryptomining reference flow showed an edit distance 802 of 2.0. This is not a characteristic of Google reCAPTCHA. 803 For instance, when we measured the similarity between ref-804 erence flows of ''Video.js'' [23], ''hls.js'' [18], and 805 ''vectortaillay.js'' [11], which are generally exe-806 cuted by various web workers, and the reference flows of 807 CoinIMP, the edit distances were obtained as 13.0, 18.0, 808 and 32.0, respectively. In conclusion, CIRCUIT can precisely 809 detect only web workers related to cryptomining, even on 810 websites with multiple web workers. Handling relatively heavy tasks in a web environment was 816 challenging before the introduction of web workers. The dis-817 tinction between cryptojacking and normal websites became 818 ambiguous after introducing web workers; hence, methods 819 for detecting cryptojacking websites are required. In addition, 820 cryptojacking websites attempt to avoid detection through 821 various evasion techniques. Therefore, we focused on how to 822 flexibly cope with technologies to avoid detection and how to 823 precisely detect cryptojacking websites. If the memory area 824 allocated to the website is used, the detection ability will 825 not be affected unless the evasion technique directly affects 826 memory. CIRCUIT reduced false positives in cryptojacking 827 detection and showed robust results compared with the exist-828 ing detection methods. In addition, the analysis results of the 829 evasion techniques and distribution of web workers in the 830 overall web environment proved the necessity and efficiency 831 of approaching memory rather than simply depending on 832 the script code, resource consumption monitoring, or several 833 threads. The detection method using this memory area can 834 flexibly cope with detection bypass technologies, which hin-835 der cryptojacking detection, and will become an important 836 insight for detection methods focusing on accuracy.

873
Increasing cryptocurrency values have led to an increase in 874 cryptojacking, which utilizes mining maliciously. Therefore, 875 we propose CIRCUIT, a precise approach for detecting cryp-876 tojacking websites based on the JavaScript memory heap. 877 1 https://www.acs.org/ 2 https://www.chestnet.org/ We define a reference flow, which can represent script code 878 behavior for each thread on a website and utilize the refer-879 ence flow to detect websites with cryptojacking behaviors. 880 CIRCUIT successfully detected 1,813 cryptojacking web-881 sites from 300K real-world websites. We demonstrated the 882 efficacy of CIRCUIT by (1) precisely detecting cryptojacking 883 websites using evasion techniques and (2) clearly distinguish-884 ing normal websites with similar characteristics to crypto-885 jacking websites. In addition, the model of evasion tech-886 niques that we discovered and the distribution of web workers 887 within a website can provide new insights for cryptojacking 888 detection.