EstiDroid: Estimate API Calls of Android Applications Using Static Analysis Technology

Tracking API calls of an Android application (app) has significant value for deeply understanding the app’s running behaviors, so that to detect security damages, sensitive information leakages, energy consumptions, system resources occupations of the app, etc. However, existing methods track API calls of a target app through launching and manipulating the app in a real or simulated operating environment. The entire process is time consuming, which leads to low efficiency for practical system executing batch analysis for a considerable scale of apps. In order to enhance the speed of API calls tracking, in this paper, we propose a static analysis method, called EstiDroid, to estimate API calls of Android apps by statically analyzing the apps without actually running them. EstiDroid is composed of a static analyzer and an estimation algorithm. To analyze a target app, EstiDroid first obtains several types of static information from the app’s.APK file via the static analyzer, then, the estimation algorithm is employed to establish the estimation model for the app based on the static information. Finally, according to the model, the proportion of each API’s calls in the total number of calls is estimated. In experiments, 300 apps are tested via EstiDroid and manual operation in smartphone, the results show that EstiDroid only consumed 49242ms on average compared with manual testing, and it reached 84.06% average similarity and 90.74% maximum similarity compared with the API calls tracked in real environments.


I. INTRODUCTION
Android has become the most widely used mobile operating system (OS) for smartphones. The market share of Android in smartphone markets has reached at 85.1% (2018), and is still growing [1]. There are millions of Android applications (apps) developed and published in Android app markets. The number of available apps in the Google Play Store was most recently placed at 2.7 million apps in July 2019 [2]. However, the prosperity of Android apps also brings a series of challenges, such as security damages of malicious apps, sensitive information leakages of normal apps, excessive energy consumption of low-quality apps, intentional system resources occupations of rascal apps, etc. Thus, analyzing an The associate editor coordinating the review of this manuscript and approving it for publication was Porfirio Tramontana .
Android app before delivering it to public is a necessity for Android app managements in order to ensure benignity and optimality of the app.
Currently, static and dynamic methods are two mainstream technologies for Android app analysis. The formers mainly adopt Android Virtual Machine bytecode analysis technology, which tries to convert .APK files of a target app into some intermediate representations, then generate the app's control flow graph, which reveals static information flows of the app. Whereas, the latters conduct real-time tracking, which actually runs and monitors the app in a smartphone or an Android simulator. The information flows are tracked during the running period of the app.
The Android APIs, written in Java language, act as interfaces used by Android apps to communicate with Android Framework. These APIs are vital to Android apps since they provide apps system resources, information communications and life maintenance.
Tracking API calls of an Android app has significant value for deeply understanding the target app's running behaviors. API calls can be tracked by the dynamic methods through launching and manipulating the app on an Android smartphone or a simulator. However, the entire process is time consuming, which leads to low efficiency for practical system executing batch analysis for a considerable scale of apps; the static method analyzes the .APK file of the target app. The process is fast, but the result can only shows the app's control flow which reveals the relations among API calls, but it can not give any information about the number of API calls, which reflects the frequency that each API appearing in the execution flows of the app.
Aiming at boosting the speed of API calls tracking, in this paper, we propose an analysis method, called EstiDroid, to estimate API calls of Android apps by statically analyzing the apps without actually running them. It's an approach to high-speed API calls tracking through estimation based on static analysis technology.
EstiDroid consists of a static analyzer and an estimation algorithm. To analyze a target app, (a) the static analyzer is used to obtain several types of static information from the app's .APK file, including page layouts, manifest and intermediate representation; (b) the estimation algorithm is employed to establish the estimation model for the app based on the static information. Establishing the estimation model includes constructing entity description models, composing entity relationship graph and computing access intensities of entities. Then, the estimation algorithm estimates the proportion of each API's calls in the total number of calls, through traversing all entities in the entity relationship graph.
Experiments are conducted to evaluate the performance of EstiDroid. We picked up 00 apps from Android markets, then manually ran each of them on smartphones. API calls generated in the running period of each app were tracked using DroidInjector [3], a pre-installed dynamic API calls tracking tool that can track API calls during the running period of the app without modifying the Android OS. Then, we employed EstiDroid to estimate the API calls of these apps. It can be found that the estimated API calls via EstiDroid reached 84.06% similarity on average, 90.74% similarity on maximum (vs. 48 hours manual testing), in comparison with the tracked API calls via manual testing, whereas, EstiDroid only consumed 49242ms on average. The experiment results demonstrate the high efficiency of EstiDroid on estimating API calls of Android apps.
The rest of this paper is organized as follows: Section 2 discusses the related works. Section 3 presents the architecture of EstiDroid, including descriptions of the static analyzer and the estimation algorithm. Section 4 shows the experiment results of EstiDroid, and comparisons with manual testing and automatic testing. Section 5 concludes our work and introduces our future work briefly.

II. RELATED WORKS
In recent years, several representative Android app analysis technologies have been proposed, as are aforementioned, they are categorized into two types: static and dynamic methods.

A. STATIC METHODS
Android Virtual Machine bytecode analysis, which includes control-flow and data-flow analysis, is the main technology used by static methods. Control-flow analysis can help identify possible execution paths of the target app. Data-flow analysis can help predicate possible values of variables at some location of execution of the target app. In order to facilitate deep analysis, an intra-procedural or inter-procedural flow graph can be generated. FlowDroid [4] provides precise static tracking through parsing the converted intermediate representations of the target app. Android component lifecycle is modeled according to the call graph, which is attached with multiple dummy methods to identify lifecycle phases. ComDroid [5], AmanDroid [6], R-Droid [7], IccTA [8], DroidRA [9] and HornDroid [10] try to improve the static analyzer to detect implicit data flows across components among Android apps. LeakMiner [11] extracts Java byte code and metadata from the .APK file of the target app for processing, based on which, the call graph is then generated. LeakMiner is less context-aware since it does not mark lifecycle phases, so it may cause low precision and false positives. TrustDroid [12] carries out a detailed data flow tracking by converting Java byte code to tree structure, and then generating the call graph of the target app. TrustDroid can run on either a sever or a smartphone, but unfortunately, Android component lifecycle is not considered as well. Androguard [13] is an open-source, static analysis tool, can disassemble and decompile Android apps to make reverse engineering. Androguard unique Normalized Compression Distance approach can find similarities and differences in code between two apps, which can be used to detect repackaging. DroidMOSS [14] is a prototype detecting app repackaging using semantic file features. It extracts DEX opcode sequence from a target app, then generates a signature from it using fuzzy hashing technology. However, all above three systems have to treat libraries as black boxes since it is very hard to decompile their source codes. Tracking for inter-component communications is missing as well.

B. DYNAMIC METHODS
Bouncer [15] is a virtual machine based on dynamic analysis platform, which is used by Google officially to assess the security problems of apps uploaded by third-party developers. Bouncer runs app to check any malicious behaviors and compares them with previous analyzed malicious apps. TaintDroid [16] is a system-level dynamic tracking system for Android. Sensitive information is tagged for being tracked from tainted sources to sinks. The target app under analysis is executed in emulated environment to perform taint-analysis and API monitoring. Many systems [17]- [21] are based on TaintDroid to conduct further analysis. Kynoid [22] is based on TaintDroid, and it implements a middleware between app and data in Android system to provide a runtime security policy enforcement for app accessing shared data. Andromaly [23] is a light-weight dynamic analysis tool which performs real-time monitoring for collection of various system metrics, including CPU usage, amount of network data, number of active processes and battery usage, etc. Although, Andromaly can not monitor API calls during the running period of target app. DroidTrace [24] proposed an implementation of a ptrace-based dynamic tracking system, which can monitor selected Linux system calls invoked during the running period of the target app. DroidInjector [3] is a dynamic tracking tool which can monitor API calls in Android Virtual Machine Runtime, which can provide more fine-grained analysis results compared with Linux system calls monitoring. It uses multiple technologies, such as Linux ptrace, JNI conversion, etc., to execute context-aware, flow-aware and library-aware API calls tracking for the target app.

C. METHODS FOR TRACKING API CALLS
Dynamic methods provide real or simulated environments where apps can be installed, executed and operated. The Android OS in the environments is modified, so that an API call will be tracked once the target app calls the API. The time consumption of the entire process is high, because most of time is consumed in the process of operating the app, where user/system inputs are carried out through Graphical User Interface (GUI) operations and system event triggers. A human tester has to manually operate the app for a time period long enough to ensure most of possible inputs for the app are executed. For an automatic test, test tools like Robotium [25], Monkeyrunner [26] actually replace human tester to carry out the inputs of the target app. However, the process still takes a long time since the test tools change the input patterns merely, but running and operating the app remain unchanged.
Existing static methods can give the target app's control flow which only reveals the relations among API calls. These methods are fast since the result is generated by analyzing the .APK file of the app without actually running the app, but they are not capable to provide any information about the number of API calls. The number of calls for a certain API essentially reflexes the frequency of the API appearing in the execution flow of the app. It is an important criteria in Android app analysis. For example, we can evaluate security problems of the target app by observing abnormal number of API calls. If the energy consumption of each API is obtained previously, we can also predicate energy consumptions of the target app according to the frequency of each API.

III. ARCHITECTURE OF EstiDroid
How to obtain API calls of Android apps, especially the number of calls of each API, through an efficient way which only consumes a very short time? EstiDroid is an approach to high-speed API calls tracking through estimation based on static analysis technology. EstiDroid doesn't need a smartphone environment or a simulator environment to operate the target app, whereas, it uses static analysis technology to obtain several types of static information from the .APK file of the app, and then estimates API calls according the estimation model established based on the static information. Thus, tracking API calls via EstiDroid is rather faster compared with dynamic methods, meanwhile, the API calls estimated by EstiDroid keep high similarity with those tracked when the app runs in actual environment.
In this section, the architecture of EstiDroid is presented. Firstly, we describe the steps used by EstiDroid to estimate API calls of a target Android app. Then, we explain in detail about the two components of EstiDroid: static analyzer and estimation algorithm.

A. STEPS OF ESTIMATING API CALLS
EstiDroid contains two components: a static analyzer and an estimation algorithm. The output from the static analyzer is used by the estimation algorithm.
As shown in Fig. 1, there are mainly two steps to analyze a target app: 1) The static analyzer carries out XML file parsing and Android Virtual Machine bytecode analysis for the .APK file of the target app. Extracted XML files and converted code files generated by the static analyzer is then used to output several types of static information, including page layouts, manifest and intermediate representation. Properties of widgets, components and entry functions, and relations among API calls of the app can be obtained from the static information.
2) The estimation algorithm establishes the estimation model for the target app through constructing entity description models, composing entity relationship graph, computing access intensities of entities and estimating the proportion of each API's calls in the total number of calls. The entity description model of an entity is a structure which contains necessary attributes of the entity. The entity relationship graph, which consists of entity description models, expresses relations among the entities. The access intensity of an entity is the weight reflecting the probability of the entity being accessed by the execution flow of the app. The execution flow is impacted by the user's and system's common input. To estimate API calls, the estimation algorithm traverses all entities in the entity relationship graph. The result of estimated API calls can be computed according to access intensities during the traversing. When the traversing ends, the estimation is completed.

B. STATIC ANALYZER
The static analyzer is composed of a XML parser and a Java decompiler, as shown in Fig. 2.

1) XML PARSER
The XML parser is implemented based on Apktool [27], a tool for reverse engineering of Android .APK files. The XML parser uses Apktool to extract AndroidManifest.xml and layout files in 'res' folder from the .APK file of the target app.
Android components used in the app are declared in AndroidManifest.xml, thus, the XML parser can get the name list of these components including Activitys, Services, Applications and statically registered Broadcas-tReceivers. The XML parser can find the start point of the app through analyzing 'intent-filter' labels defined in AndroidManifest.xml, and it can also find the Services that can be launched remotely through analyzing the tag 'android:process ='':remote''' in the declaration of each Service in AndroidManifest.xml.
The page layout for each activity in the app is declared in xml files in the 'res' folder of the app. The XML parser first reads xml files in the 'value' sub-folder, in order to get 'name-ID' correspondence for each layout and widget. Then, the XML parser obtains necessary properties, including type, location and size, of the widgets in the page of each Activity through traversing these xml layout files.

2) JAVA DECOMPILER
The Java decompiler is implemented based on Soot [28]. Soot is a code analysis tool which converts the Android Virtual Machine bytecode of the target app into an intermediate representation called Jimple [28]. The java decompiler uses Soot after the app's .APK file being extracted by Apktool.
Firstly, the Java decompiler traverses Jimple files, and finds out Jimple files corresponding to each Activity, Service and Application in the name list obtained by the XML parser.
Then, the Java decompiler traverse the contents in these files, and (a) finds all lifecycle functions for each Activity, Service and Application via scanning key words, such as 'onCreate', 'onResume', 'onStop', etc.; (b) finds all BroadcastReceivers (including statically registered Broad-castReceivers), accepted broadcasts (that is, names of the broadcasts) by scaning the keyword 'BroadcastReceiver' and corresponding listener functions like 'setXXXListener'; (c) finds each widget and its event handlers in each Activity through scanning the function 'setContentView' and 'findViewById', referring the correspondence between the widget's name and ID, and marking listener functions like 'setXXXListener'.
Additionally, in part (a) -(c), the Java decompiler also establishes execution flow graph for every entry function, including lifecycle functions of Activitys, Services and Applications, and event handler functions of BroadcastReceivers and widgets. The execution flow graph for an entry function is a representation, using graph notation, of all paths that might be traversed starting from the entry function during the app's execution. The APIs called in each path are marked, start points and end points of loops using 'for', 'while', and 'do while' are marked, and start points and end points of branches in conditional judgments using 'if-else' and 'switch-case' are also marked.
The flowchart of the static analyzer is shown in Fig. 3. Finally, properties of widgets, components and entry functions, and relations among API calls are all obtained through running the static analyzer. The above static information is then delivered to the estimation algorithm.
The whole process of the static analyzer analyzing a target app is very fast since the time is mainly consumed by extracting the .APK file, converting original code into Jimple representation, traversing the Jimple files and scanning texts in these files. All of them only require computing resources. Thus, the speed of the process can be further boosted if a more powerful CPU is employed.

C. ESTIMATION ALGORITHM
The estimation algorithm exploits the static information generated by the static analyzer to estimate API calls for the target app.
As the flowchart shown in Fig. 4, the process of the estimation algorithm consists of 4 steps: constructing entity description models, composing entity relationship graph, computing access intensities, and estimating API calls.

1) CONSTRUCT ENTITY DESCRIPTION MODELS
We employ the term entity to uniformly describe Android structures used in the estimation algorithm, including  Activity, Application, Service, BroadcastReceiver, widget, entry function and API. The entity description model of an entity is a collection which contains the entity's attributes needed by the estimation algorithm.
Entity description models of different entities are described in TABLE 1, where attributes and their corresponding explanations are illustrated as well.

1.1) Entity Description Model for an Activity
The entity description model for an Activity a contains 7 attributes: name, η, W, E, A, S and B.

1.2) Entity Description Model for an Application
The entity description model for an Application p contains 2 attributes: name and E.

1.3) Entity Description Model for a Service
The entity description model for a Service s contains 4 attributes: name, η, type and E. Note that, the value of type identifies whether the Service can be launched remotely, or locally only.
1.4) Entity Description Model for a BroadcastReceiver The entity description model for an BroadcastReceiver r contains 4 attributes: name, η, broadcasts and e.

1.5) Entity Description Model for a Widget
The entity description model for a widget contains 10 attributes: name, η, type, size, location, E, W, A, S, and B. Note that, type expresses the type that current widget belongs to, such as Button, ListView, EditView, CheckBox, ImageView, etc. size denotes the size (height×width) of current widget in the page layout. location express the location of current widget in the page layout. As shown in Fig. 5, the value of location is selected from {left_top, middle_top, right_top, center, bottom}. The location of a widget is decided by its relative location compared with other widgets in the page layout.
Note that, in 1.1) -1.5), an element a in A represents an successive Activity that current entity can jump to; an element s in S represents a successive Service that current entity can launch; an element b in B represents a broadcast that current entity can send.

1.6) Entity Description Model for an Entry Function
The entity description model for an entry function contains 2 attributes: name and F. Note that, the set F contains all APIs called in the execution flow graph (generated by the Java decompiler of the static analyzer) starting from the entry function.
1.7) Entity Description Model for an API The entity description model for an API contains 2 attributes: name and q.
We use frequency to express the frequency of an API appearing in the execution flow graph (generated by the Java decompiler of the static analyzer), namely, the frequency of calls for the API in the execution flow when the entry function is called once.
For an API f , if it is in a loop, f .q will increase by the count number of the loop, since f will be called multiple times in the loop. If the count number of the loop is unknown, we will assign a constant value for simplicity. If f is in a branch of a conditional judgement, f .q will increase based on the number of branches of the conditional judgement. In the running period of the app, the API will be called probabilistically because only one branch in all branches is called by the execution flow. For simplicity, we consider it follows average probability distribution among all branches, so f .q will increase by 1/(number of branches). Note that, for an API in nested loops, nested conditional judgements, or combined loops and conditional judgements, the total increased frequency is the product of increased frequency generated at each loop or conditional judgement in the nested or combined structure.
Configuring each API in the F of an entry function e is completed through traversing the execution flow graph (generated by the Java decompiler of the static analyzer) of e.
The detailed process of is given in Algorithm 1.
In Algorithm 1, we denote δ as frequency cache, which temporally caches the value of frequency for current API. When the algorithm enters into a loop or a branch of a condition judgement, δ will accumulate the increased frequency generated by the loop or branch; when the algorithm exits from the loop or branch, δ will wipe off the increased frequency correspondingly. When the algorithm meets an API, δ will be added to frequency of the API. In order to construct entity description models, the estimation algorithm scans the static information generated by the static analyzer, creates the entity description model for each entity, assigns name for all entities, assigns type, size and location for all widgets, assigns type for all Services, and assigns accepted broadcasts into B for all BroadcastReceivers. Besides, the estimation algorithm builds the affiliations between each Activity/Application/Service and its entry functions, between each BroadcastReceiver and its entry function, between each Activity and its widgets, and between each entry functions and its APIs.
The detailed process is given in Algorithm 2.

2) COMPOSE ENTITY RELATIONSHIP TREE
The entity relationship graph is a graph-like structure where nodes are composed of entities description models of the app, and a link between two nodes represent the relation between two corresponding entities, including jumps of Activitys, launchings of Services and sending of broadcasts. For example, a link from Activity a 1 to a 2 means a 1 can jump to a 2 ; a link from a 1 to Service s 1 means a 1 can launch s 1 ; a link from a 1 to broadcast b 1 means a 1 can send b 1 ; In order to compose the entity relationship graph for the app, the estimation algorithm scans the entity description models and the static information generated by the static Algorithm 2 Process of Constructing Entity Description Models Input: static information generated from the static analyzer Output: entity description models for each Activity, Application, Service, BroadcastReceiver, widget and entry function do create the entity description model for current entity; name <= the entity's name; if the entity is a widget then type <= the type of the widget; size <= the proportional size of the widget; location <= the relative position of the widget; add current widget into W in the entity of Activity it belongs to; else if the entity is a Service then type <= the Service's type, which is identified by XML Parser; else if the entity is a BroadcastReceiver then add the names of all broadcasts accepted by the Broadcas-tReceiver into B; else if the entity is an entry function then configure F according to Algorithm 2; add current entry function into E (for Activity, Application, Service and widget) or e (for BroadcastReceiver) in the entity that the entry function belongs to; end if end for analyzer. It links relevant models by adding elements into the set A, S and B.
The detailed process is given in Algorithm 3.
In Android, jumping to an Activity is completed by calling API 'startActivity' or 'startActivityForResult', where the destination Activity is designated in the parameters. So we can find relations of Activity jumps through scanning the above two functions and their parameters. Similarly, launching a Service relies on calling API 'startService' or 'bindService', and sending a broadcast relies on calling 'sendBroadcast' or 'sendOrderedBroadcast'. The relations of Service launching, and relations between broadcasters and BroadcastReceivers can be found by scanning these APIs and their parameters.

3) COMPUTE ACCESS INTENSITIES
The access intensity of an entity, including widget, Activity, Service, BroadcastReceiver, denotes its weight which measures the probability of the entity being accessed by the execution flow caused by the app's common input.
The common input of an app represents the user's normal operations and the system's normal event triggers during the running period of the app. It reflects the statistical result of common user behaviors, and system activities. If an entity has high access intensity, it means the execution flow of the common input accesses the entity with high probability, that is, APIs in the entity will be called more frequently;  if an entity has low access intensity, it means the execution flow of the common input accesses the entity with low probability, that is, APIs in the entity will be called more occasionally.

a: ACCESS INTENSITY OF WIDGET
Considering the use habit of users, we design that the access intensity of a widget is decided by its type, location and size. A user is more likely to operate a widget with specified type, hotspot location, and big size, thus, such widget should have higher value of access intensity, else it should be assigned as a low value.

i) TYPE OF WIDGET
We consider the fact that type of widget impacts the frequency of user's operations on it. For example, Button widgets get more clicks of users than ImageView widgets in user's common usage. We statically allocate a weight factor for each type of widget, as shown in TABLE 2 partially. We express as t (t) k the weight factor of widget w k 's type.

ii) LOCATION OF WIDGET
As shown in Fig. 5, more specifically, on the layout of the app, the widget at left_top is always in charge of returning to last page; the widgets at middle_top are basically the label of current page or search bar, etc.; the widget at right_top can be menus, settings or other functionalities, etc.; the widgets at center are accessed by most of activities in current page; user frequently-used widgets are deployed at bottom since the area are the nearest to user's fingers. We set a weight factor for each location of widget, as shown in TABLE 3. We express as t (l) k the weight factor of widget w k 's location.

iii) SIZE OF WIDGET
The size of a widget impacts the popularity of the widget being accessed by the user. Generally, the larger the widget is, the more attractive the widget will become, so the more frequent the user will access the widget. We express the size of widget w k as t (s) k , the value of which is the area of w k , that is, height × width.

iv) COMPUTATION OF w k .η
For widget w k in Activity a i , its intensity w k .η is computed by averaging the normalized t j is the sum of all weight factors of types of all widgets in a i , so do w j ∈a i .W t (l) j and w j ∈a i .W t (s) j correspondingly. α, β and γ are balance factors to tune the proportion of the weight of type, location and size. α, β, γ ∈ [0, 1] and α + β + γ = 1.
The computation of access intensities of widgets is done by traversing each widget of each Activity in entity relationship graph. The process is described in Algorithm 4.

b: ACCESS INTENSITY OF ACTIVITY
In an Android app, an Activity manages a page of the app. When a user operates the app, a page can jump to another one according to the user's commands through clicking screen of the smartphone, pressing buttons of the smartphone, etc. The jumps among Activitys of the app is actually a network where Activitys are nodes, and links are jumps. A link connecting Activity a 1 and a 2 indicates that a 1 can jump to a 2 , and a 2 can return to a 2 through return operation.
In order to abstract user operations on Activitys of the target app, we propose a PageRank-based algorithm to compute the access intensities of Activitys. The PageRank algorithm [29] is used by Google Search to rank websites in their search engine results, whereas, here we modify the PageRank algorithm to evaluate the importance of Activitys in the target app.
Our PageRank-based algorithm works by evaluating the jumps from or to an Activity to determine an estimation in regard to how important the Activity is. The underlying assumption is that more important Activitys are likely to have more jumps from other Activitys.
A jump from Activity a 1 to a 2 is triggered by calling the function 'startActivity' or 'startActivityForResult', which can be included in event handler functions of the widgets in a 1 and lifecycle functions of a 1 . As aforementioned, all jumps starting from a 1 are recorded in a 1 .A and all A of the widgets belonging to a 1 . Additionally, if a 1 can jump to a 2 , a 2 can also return to a 1 through user pressing 'back' button.
Thus, we have 3 groups of types of Activity jumps: i contains each widget that can jump to a i and the Activity that the widget belongs to. The elements in jumps (w) i are ordered pairs. For example, if in a jump, Activity a 1 can jump to a i via its widget w 1 , then we have < w 1 , a 1 >∈ jumps (w) i . For each jump in group (b), jumps (r) i includes each destination Activity that a i can jump to, and the total number of jumps from other Activitys to the destination Activity. For example, if in a jump, a i can jump to a 1 , and a 1 has c 1 VOLUME 8, 2020 jumps from other Activitys to it, then we have < a 1 , c 1 >∈ jumps (r) i . For each jump in group (c), jumps (l) i consists of all Activitys that can jump to a i via lifecycle functions. For example, if in a jump, a 1 can jump to a i by calling onCreate, then we have a 1 ∈ jumps (l) i . Therefore, we design that intensity a i .η is composed of η i , which are the intensities from above 3 groups, respectively. a i .η can be computed by where δ (0 < δ < 1) is proportion factor which represents the proportion of user's return operations in all user's operations on a i . η is computed through summing up the proportional intensities of all Activitys which can jump to a i via their widgets. It can be given by where |w z .A| is the number of elements in w z .A, thus w x .η/( w z ∈a y .W (|w z .A|w z .η)) is the proportion of the intensity of widget w x in the sum of intensities of a y 's widgets. η (r) i is computed through accumulating the averaged intensity of each Activity that a i can jump to. It is given by where 1/c x shows the probability of returning operations is averaged by all jumps to a i . η (l) i is computed through accumulating the intensity of each Activity that can jump to a i through lifecycle functions. It is given by In order to guarantee the convergence of our PageRankbased algorithm, if an Activity has no jumps from itself to other Activitys, it will contribute all its intensity averagely to the Activitys that can jump to it according to Formula (4). The above rule is used to replace the random surfing rule for sink nodes in standard PageRank algorithm, since in Android, a user can not switch to an arbitrary page from current page.
Algorithm 5 gives the whole process to compute access intensities of Activitys. First, the algorithm initialize all Activitys, then the access intensity of each Activity is computed iteratively. In each iteration, the gap between the access intensities of each Activity in current iteration and previous iteration are measured. The access intensity of each Activity become stable with the increase of the number of iterations. The iteration will terminate when the sum of the gaps of all Activitys is less than or equal to (gap ≤ ), which means the convergence of the algorithm is considered to be reached. Finally, the access intensity of each Activity is normalized Algorithm 5 Process of Computing Access Intensities of Activitys Input: entity relationship graph Output: entity relationship graph where η of all Activitys are computed for each Activity a i in entity relationship graph do a i .η <= 1; // set initial value as 1 assign elements for jumps (w) i , jumps

c: ACCESS INTENSITY OF SERVICE
A Service in the target app can be launched from (a) inside of the app by calling function 'startService' or 'bindService' in the execution flow. We call it local launching; (b) outside of the app through the system or other apps sending system events. We call it remote launching. For a certain Service, its type is identified by type of the Service.
As aforementioned, S of each Activity and widget contains the Services that current entity can launch, which belong to local launching.
For a Service s l of the app, if s l .type == local, which means s l can be only launched locally, then we consider its access intensity s l .η inherits from the access intensities of all Activitys (including their widgets) that can launch the Service.
s l can be remotely launched if s l .type == remote. The access intensity s l .η consists of two parts: (a) access intensity inheriting from its launcher Activitys (including their widgets); (b) access intensity caused by remote launchings. In order to model remote launching, we define a total access intensity ρ representing the sum probability for all Services launched remotely. Thus, s l 's access intensity caused by In summary, the whole process of computing access intensities of Services is described in Algorithm 6.

d: ACCESS INTENSITY OF BroadcastReceiver
Similarly, a BroadcastReceiver in the target app can receive the broadcasts sent from (a) inside of the app by calling 'sendBroadcast' and 'sendOrderedBroadcast', we call it local broadcasting; (b) outside of the app through system or other apps sending broadcasts, we call it remote broadcasting.
As aforementioned, the B of each Activity and widget contains the broadcasts that current entity can send, which belong to local broadcasting.
Remote broadcasting mainly comes from system broadcasts, such as SCREEN_ON_ACTION, SCREEN_OFF _ACTION, CALL_ACTION, ANSWER_ACTION, DATA_ CONNECTION_STATE_CHANGED_ACTION, SERV-ICE_STATE_CHANGED_ACTION, SIGNAL_STREN-GTH_CHANGED_ACTION, etc. If a BroadcastReceiver in the app receives system broadcasts, its B has the names of corresponding system broadcasts. We define as τ the total probability of the generation of system broadcasts.
Thus, for a BroadcastReceiver r m , its access intensity r m .η includes two parts: (a) local broadcasting: the access intensity inherited from the access intensities of Activitys and widgets which can send broadcasts to r m ; (b) remote broadcasting: τ multiplied by the access intensity brought by the access intensities of the system broadcasts that r m receives. We define for each BroadcastReceiver r m whose r m .B contains b n do r m .η <= r m .η+a i .η; // inherit the access intensity of the // Activity that sends b n end for end for for each widget w k in a i .W for each broadcast b n in w k .B for each BroadcastReceiver r m whose r m .B contains b n do r m .η <= r m .η + a i .η · w k .η; // inherit the access intensity // of the widget that sends b n end for end for end for end for for each BroadcastReceiver r m in entity relationship graph do for each system broadcast b n in r m .B do r m .η <= r m .η + τ · access intensity of b n ; end for end for the access intensity of each system broadcast to represent the probability that the system generates the broadcast, as shown in TABLE 4 partially. The whole process of computing access intensities of BroadcastReceivers is described in Algorithm 7.

4) ESTIMATE API CALLS
For a certain API f y , we define as N (f y ) the estimated number of calls of f y . Note that, f y may exist in multiple execution flows starting from different entry functions of different VOLUME 8, 2020  Initially, the estimated numbers of API calls of all APIs are set as 0. Then, each Activity, widget, Service and Broadcas-tReceiver in entity relationship graph is traversed, where the estimated number of calls of each API in each entry function is accumulated. If f y is in the execution flow of a lifecycle function of Activity a i , the accumulated number of API calls is a i .η · f y .q; if f y is in the execution flow of an event handler function of widget w j in Activity a i , the accumulated number of API calls is a i .η · w j .η · f y .q; if f y is in the execution flow of a lifecycle function of Service s k , the accumulated number of API calls is s k .η·f y .q; if f y is in the execution flow of the event handler function of BroadcastReceiver r l , the accumulated number of API calls is r l .η · f y .q.
The whole process of estimating number of API calls is described in Algorithm 8.
Specifically, Application of the target app is launched only once during the running period of the app. Thus, the estimated API calls in Application of the app can be directly obtained.
For an API f y in a lifecycle function of Application, its accumulated API calls is f y .q, which represents the number of actual API calls of f y in Application, so scaling-up or scaling-down is not required. Note that, the estimated number of calls for API f y is a virtual value, which is used to compute the proportion of f y 's calls in the total number of API calls. It does not represent the actual number of calls of f y in real environment.
We define as N (·) the sum of estimated number of calls of each API, and define as ξ f y the proportion of f y 's calls in the total number of API calls. Then, ξ f y can be formulated as

IV. EXPERIMENTS AND EVALUATIONS
The prototype of Estidroid is implemented using Java language (J2SE 1.8), and with Apktool (v2.2.4) and Soot (v2.5) integrated. The prototype runs in a Dell PowerEdge R720 server with a 2.8 GHz Intel Xeon E5-2680 CPU and 96 GB 1866MHzRAM, which runs Ubuntu 12.04 with Linux kernel 3.8.0. In the evaluation of the performance of EstiDroid and the accuracy of EstiDroid's estimation, we choose 300 apps from Wandoujia Android market [30], and COOLAPK [31] Android market. (a) Each app is firstly deployed on 10 Nexus 6 smartphones with DroidInjector [3] installed. DroidInjector is a dynamic API calls tracking tool that can track API calls during the running period of the target app without modifying Android OS. For each target app, 10 testers (the first 5 authors of this paper, and 5 invited Android users) manually operate it for a time period. The API calls generated are continuously tracked, and they are stored by DroidInjector when the running time reaches 1 hour, 6 hours, 24 hours and 48 hours, respectively; (b) Then, we estimate the API calls of all target apps through running EstiDroid in the server. Parallelism is not used in order to give a fair environment to evaluation the time consumption of EstiDroid. The parameters of Estidroid is illustrated in TABLE 5. For each app, beside obtaining the estimate API calls, we also measure the time consumption of static analyzer and the time consumption of estimation algorithm.
Finally, for each app, the tracked API calls and the estimated API calls are all transformed into proportions in the total number of API calls, for further evaluations.
We measure the similarity between the estimated API calls and the tracked API calls to evaluate the accuracy of EstiDroid. Cosine similarity is adopted, which is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them [32]. In our scenario, each API represents a dimension of the multi-dimensional space, and the proportion of the API's calls is the value on the dimension. Thus, the estimated API calls and the tracked API calls are two vectors in the space. Because the proportions of API calls are all positive values, the value range of cosine similarity is [0, 1]. If the estimated API calls is very close to the tracked API calls, the angle between the two vectors is very small, so the similarity tends to 1, else 0.
The experiment results for the 300 apps show that vs. 48 hours manual testing, EstiDroid reaches 84.06% average similarity between estimated and tracked API calls, and the minimum and maximum similarities are 77.02% and 90.74%, and standard deviation 0.034727691, respectively; vs. 1 hour manual testing, the apps have 37.96% average similarity, 28 The standard deviation of the similarities for the 300 apps indicates the performance of EstiDroid is quite stable. The average time consumptions of the estimation algorithm and the static analyzer are 2584ms and 46658ms, respectively. Thus. the total average time consumption of EstiDroid is 2584ms + 46658ms = 49242ms. The experiment results demonstrate that EstiDroid can largely reduce the time needed by manual API calls tracking, while keeping a high and stable similarity with the testing results in actual environments.
It can be found that the similarity rises as the time period of manual testing increases, as shown in Fig. 6. This is because the statistical characteristics of the running behavior of an app is built based on the long running period of the app. Too short operation period can not reflex adequately the app's running behavior. Some features of the app may not be activated during the short period, so the cosine similarities vs. 1 hour and 6 hours manual testing are low. As the manual testing period increases, more and more potential API calls are triggered. The statistical characteristics of the running behavior of the app are expressed more and more complete. The increase of the cosine similarity slows down as the operation period increases, especially from 24 hours to 48 hours. It is because the API calls have been already triggered sufficiently and become stable.
Due to the page limitation of our paper, we gives the detailed results of 82 apps from the 300 apps. The results are shown in TABLE 6 and TABLE 7.

V. CONCLUSION
In this paper, we proposed EstiDroid, a static analysis method which estimates API calls of Android apps by statically analyzing the apps without actually running them. EstiDroid contains a static analyzer and an estimation algorithm. It's an approach to high-speed API calls tracking through estimation based on static analysis.
When analyzing a target app, the static analyzer uses Apktool and Soot to extract the .APK file and convert the file into Jimple files. Serval types of static information are output by the XML parser and the Java decompiler of the static analyzer. Then, according to the static information, the estimation algorithm first constructs entity description models for all entities, then composes the entity relationship graph of the app. The access intensities of widgets, Activitys, Services, and BroadcastReceivers are computed using different mechanisms including a PageRank-based algorithm. Finally, the estimated number of API calls are counted based on the access intensities and the frequency of each API.
In experiments, we tested 300 apps picked from Android App Markets. The API calls are collected by manually operating each app on smartphones for 1 hour, 6 hours, 12 hours, 24 hours and 48 hours, and are estimated by EstiDroid for each app. Multiple metrics are evaluated to prove the high performance of EstiDroid. The experiment results show EstiDroid only consumed 49242ms on average, and reached 84.06% average and 90.74% maximum similarity with tracked API calls of apps running in real environment (vs. 48 hours manual testing).
Our future work aims at increasing the accuracy of EstiDroid's estimation and decreasing EstiDroid's time consumption. We plan to further improve EstiDroid using some new technologies. If possible, we would like to apply EstiDroid to malware detection systems, user behavior mining systems, and energy consumption prediction systems, in order to boost the efficiency of these systems.