Static Analysis for Improved Modularity of Procedural Web Application Programming Interfaces

Despite their rapid growth, the utilisation of application programming interfaces (APIs) poses challenges for companies under pressure to yield productive systems integration. APIs of larger systems tend to be large, complex and have reduced modularity and quality, which makes them cumbersome to comprehend and use. These challenges can be addressed by static API analysis that focuses on studying API code itself and deriving business entities and dependencies from operational signatures. However, existing techniques for static analysis of APIs face the challenges in deriving a sufficient coverage of business entity relationship types from implementation-oriented API operational signatures carrying limited semantic insights. The paper aims to address such problems by supporting static analysis techniques for APIs that improve their modularity. Our approach adopts an object-oriented paradigm where the concept of “object” is exemplified by the notion of business entity. It systematically applies interface analysis methods and techniques for eliciting knowledge of business entities and their attributes, for deriving the temporal order of calling operations across multiple business entities, and for learning and extracting various ways of invoking a service via APIs. The approach is implemented as an open-source tool and applied to a group of widely-deployed services in practice for validation. The research contributes to identifying key aspects of both the structure and behaviour of APIs, which will lead to building a simplified but comprehensive interface (presentation) layer to assist service users in understanding complex and overloaded interfaces as well as to facilitate efficient and effective service integration.


I. INTRODUCTION
Web-based application programming interfaces (APIs) are critical for enabling organisations to open up software applications through partner ecosystems and the Internet. APIs provide operational signatures to create, read, update and delete data, related to business (or logical) entities, managed in software applications, without revealing the software implementation that supports the operations. Their descriptions captured through widely supported interface description languages in Web Services Definition Language (WSDL) and Representational State Transfer (REST) API combined with the encapsulated nature of operations they expose, promote flexible ways of accessing and composing software The associate editor coordinating the review of this manuscript and approving it for publication was Zhangbing Zhou .
applications. The significance of APIs can be seen through their support in Internet platforms, e.g. Facebook, Twitter, YouTube, Amazon, eBay and Google Cloud Platform, as well as enterprise systems prevalent in corporate sectors, e.g. SAP Business Hub and Oracle Peoplesoft APIs. Moreover, central API repositories, such as Programmable Web with around 19,000 APIs, are fuelling strategic interest in APIs as evident by the notion of the API Economy [1].
Despite their rapid growth, the utilisation of APIs poses challenges for companies under pressure to yield productive systems integration [2]. Their specifications are essentially technical and the user documentation is targeted at programmers, if available at all [3]. APIs of larger systems, especially, tend to be large, complex and have reduced modularity and quality, given many maintenance cycles of systems. This makes them cumbersome to comprehend and use. As examples, WSDL-based APIs of SAP and Oracle enterprise resource planning (ERP) systems have up to three hundred parameters per operation and multiple levels of nesting while FedEx's shipment service API has around 1000 parameters and 9 levels of nesting [4].
Techniques have been proposed to automatically analyse APIs, reverse-engineer structural and behavioural properties, and make recommendations for improvement. Many techniques have focused on dynamic analysis, which concerns the mining of executed API interactions recorded in systems logs (such as send/receive interactions on operations and the data payload). The service deployment data recorded in systems logs can also be used to analyse non-functional requirements of APIs. One aspect of dynamic API analysis is deriving message formats of APIs [5], given that API documentation may be missing, incomplete or inaccurate regarding specific usage details. Another aspect is deriving message correlations in service interactions (e.g., a set of sales orders are related to a delivery order and a set of delivery orders are related to an invoice). In [6], message correlations are based on casually related request-response interactions, through which different messages are passed (e.g., sales order and delivery order). Other aspects involve discovery of service interaction processes [7], [8] based on operation sequences involving correlated messages, i.e., discovery of service orchestration/choreography. Finally, the derivation of business entities inherent in message types of API services and their dependence can be inferred via analysis of log data [9].
More recently, techniques have been developed for static analysis of APIs. The static API analysis techniques target procedural APIs and WSDL specifically. This is because this style of API is the oldest and most difficult to comprehend, use, and improve for quality. Procedural APIs have parameters which are based on simple and complex (nested) data types, and lack data structure abstractions. As such, they pose the greatest challenge for static API analysis techniques, especially for the large sized operational signatures (in the realm of hundreds of parameters). Existing techniques focus on analysing API code itself and deriving business entities and dependencies from operational signatures. Specifically, they use heuristics for parameter cohesion [10] to identify business entities, entity co-location in operations, and operation input and output dependencies to derive business entity relationships [11]. This knowledge can be used to assess problems of operation overloading and to make recommendations for improved modularity, where operations concern individual entities only. This is beneficial not only for improving flexible API access and composition, but also for evolving older, procedure-oriented WSDL APIs into more contemporary document-oriented WSDL or REST APIs. Extracted structural and behavioural properties from code can further improve the productivity of semantic tagging of APIs based on ontologies, to improve their search and access. In addition, static API analysis can help contextualise dynamic analysis of execution data, e.g., for improved insights into operational dependencies, based on business entity relationships, and data exchange analysis. To date, the challenge of static API analysis can be summarised as deriving a sufficient coverage of business entity relationship types from implementation-oriented API operational signatures carrying limited semantic insights.
This paper provides a state-of-the-art exposition of static API analysis. It aims to address the complexity of APIs by proposing a systematic approach that supports static analysis of API specifications for improved modularity. The approach builds on the notion of business entity and systematically applies interface analysis methods and techniques for eliciting knowledge of business entities and their attributes, for deriving the temporal order of calling operations across multiple business entities, and for learning and extracting various ways of invoking a service via APIs. It is implemented and applied to a group of widely-deployed services in practice for validation. The research contributes to identifying key aspects of both the structure and behaviour of APIs, which will lead to building a simplified but comprehensive interface (presentation) layer to assist service users in understanding complex and overloaded interfaces as well as to facilitate efficient and effective service integration.
The research presented in this paper draws upon and extending the techniques developed from our previous contributions [12]- [14]. An earlier version of our method for extracting business entities from a service interface specification was proposed in [12], and our initial findings in deriving the potential temporal order of operations that may be carried out by different business entities was reported in [13]. In [14], we presented a service variant analysis technique which can be used to compute different ways of invoking a service based on the service's business entity model, and no further extension is made to this technique in our current study. However, given that this technique is an important part of the overall systematic approach that we propose for static analysis of Web APIs, a short introduction to the technique published in [14] is included in the paper. Furthermore, having a complete proposal of our systematic approach allows us to present an overall tool implementation of the approach and its application to several services that are widely deployed in practice.
It is also worth noting that our approach focuses on using WSDL as a specific procedural API language. The choice of WSDL is twofold. Firstly, it is the most prominent API description language and was designed as an independent (API) definition language inspired by the CORBA IDL specification. It maps into different languages such as Java, RFC API, PHP (SoapClient). Importantly, previous static API analysis techniques focused on procedural APIs and WSDL specifically (e.g., [15]- [19]). Another reason that our techniques have not supported REST API is that it is not a procedural language, as the data types are reflected through resources. As such, REST API fosters modular design of APIs. There are other issues that can arise through REST APIs, such as consistency of resource structure and decomposition addressed through research into REST API anti-patterns by Palma et al. [20]. However, the problem that we have specifically addressed is extracting entities from overloaded API operation signatures with parameters that relate to multiple entities or multiple variants of the same entity. Such over-loading is less of a concern for REST APIs given that HTTP CRUD operations (e.g., POST, GET, PUT, DELETE) manipulate data through resource abstractions.
The rest of the paper is structured as follows. Section II elaborates the research motivation via an exemplar scenario and states the research problem. Section III presents our approach consisting of three key building blocks, namely structural interface analysis, behavioural interface synthesis, and service variants analysis. Section IV discusses evaluation of the approach via the implementation of a prototypical tool and the experiment results of applying the tool to a group of widely-deployed services in practice. Section V reviews existing studies on API analysis. Section VI concludes the paper and outlines the future work.

II. PROBLEM STATEMENT
Let us start with a motivating example about a shipping service. A manufacturing company called Smith Brothers (a fictional name) wished to incorporate a shipment service into its web service enabled systems so that it can ship goods and manage shipping orders through its systems. The company identified a number of shipment service providers such as FedEx, UPS and DHL, which all offered web service interfaces to their users, but these interfaces were very complex. For instance, the FedEx Open Shipping service API specification written in WSDL has 7727 lines and more than 1000 parameters 1 (see Listing 1 for a fraction of this specification). Many of these parameters are of a complex data type and hierarchically structured. What FedEx has provided is just a 645-page pdf document, 2 which depicts the details of what each parameter means for programmers.
The APIs of enterprise services, such as those from SAP, FedEx, and Amazon, are often complex and overloaded due to inherited complexity from legacy systems. Most service providers, especially enterprise system vendors, simply migrate their legacy systems to services with heavy operational signatures wrapped in WSDL specifications [21]. The aforementioned FedEx Open Shipping service specification is a typical example. These XML-based documents usually contain thousands of lines of codes that attempt to describe the input and output messages of each operation offered by a service. For example, the aforementioned FedEx Open Shipping service specifies thousands of parameters.
Often the APIs of enterprise services are a result of a direct migration from legacy systems that have a large number of input parameters catering for various needs and different  3 Also, the Procure-to-Pay service bundle describes WSDL service for creating purchase orders. The WSDL has operations with around 300 parameters and five levels of nesting. The document describes several business entities in the different operations of that service. Another widely set of services are from Amazon which reflect similar operation sizes, nesting and existent of business entities. These legacy ERP fields discussed in [22] have been directly turned into input parameters of web services that are related to the creation of Purchase Order. This approach, referred to as a ''super-service approach'' in [5], is a single instance that provides all service capabilities required by all users, while at the same time it yields many different variantsmultiple ways of invoking a service. Furthermore, empirical research has shown that direct migration approaches result in low quality service interfaces due to the fact that new systems components often reuse existing systems components with automatically generated WSDLs, which are hard to comprehend by developers [23]. Despite the complexity of APIs, there lacks a guidance to service users about the service capabilities and valid ways of invoking services. Hence, service users often find it hard to understand what the services offer and how to invoke these services. In reality, consuming and integrating enterprise services usually requires manual effort and reliance on service providers or domain specialists to provide insights into their APIs [24]. As a result, service integration incurs significant lead times and costly maintenance, and its productivity in the context of dynamic service growth on the scale of the Internet is restricted.

III. APPROACH
This research aims at addressing the complexity of APIs via static analysis. Our approach adopts an object-oriented paradigm where ''objects'' are exemplified by business artefacts. This section presents the approach.

A. RATIONALE
A traditional WSDL API specification (or API specification for short) defines operations and their input and output parameters using XML codes. In fact, each operation is associated with one or more business artefacts, but these are not specified in the XML codes of WSDL API specifications. Considering the example of FedEx shipment service in Section II, the ''createOpenShipment'' operation is associated with the business artefact ''ShippingOrder'' which is not specified in the FedEx Open Shipping service API specification. Business artefacts entail what a service offers in an object-oriented manner. Comparing to hundreds of lines of XML codes, an API specification, if defined using business artefacts associated with operations and their parameters, will be much easier for service users to read thus leading to a better understanding of the service capabilities.
Hence, we introduce the notion of business entity to refer to a business artefact mentioned above. More precisely, a business entity is a business-related object being created and having evolved as result of a service invocation. The advantages of applying the idea of business entity are threefold.
Firstly, business entities represent the explicit knowledge concerning a business operational goal [3]. Again, taking the FedEx Open Shipping service for example, ''ShippingOrder'' is a business entity and to create a shipping order is an operational goal associated with this business entity. Secondly, business entities often are not standalone but relate to each other, and the relations between business entities present useful semantic information. For instance, ''ShippingOrder'' and ''Track'' are two correlated business entities that can be derived from the FedEx Shipping service, and this informs the service users that they can track a 'shipping order' using the 'track' operation. Finally, business entities and their relations can be specified using business entity(-based data) models. Comparing to XML coding in the existing API specifications, a graphical representation of a business entity model will provide service users with an articulated view of the internal structure of the service and insights into what a service offers, and hence it helps improve the comprehensibility of APIs.

B. OVERVIEW OF THE APPROACH
Our approach builds on the notion of business entity, and as shown in Figure 1, it mainly consists of three stages. The first stage is fundamental, in which an API specification is analysed to extract the business entities and their relations. This stage focuses on structural interface analysis, which takes an API specification as an input and yields a business entity model as the output. Figure 1 (a) depicts an abstract example illustrating the main idea of this stage. Assume an API specification s contains an operations op 1 , which has a set of 13 input parameters at different levels of nesting. The first step is to map each complex parameter of an operation to a business entity and to map each nested parameter of that complex parameter to an attribute of the corresponding business entity. For example, parameter p 1 is mapped to business entity A, and the nested parameters p 2 to p 5 (of p 1 ) are mapped to the attributes a 0 to a 3 (of A and a 0 is the key attribute). In the second step, the structural relations (e.g. nesting relation) between the complex parameters are used to inform the relations between the derived business entities. For example, the fact that parameter p 5 is nested in p 1 implies that business entity A contains entity B. Following this two-step derivation method, a business entity model representing API specification s can be obtained which comprises four business entities (A to D) and their relations. Such a model presents a simplified and modular representation of the complex API of a service, along with the contextual insights into what the service offers. Design of such a scientific method for structural interface analysis is presented in Section III-C.
In the second stage, our focus proceeds to the derivation of behavioural interface which is concerned with the temporal order of invoking operations across multiple business entities. Given the business entity model extracted from an API specification in the above stage, the business entities and their relations specified in the model can be used to synthesise a behavioural interface for API. Despite the fact that an API often has a large amount of operations, these operations are designed to create, read, update or delete business entities, and thus can be categorised into CRUD operations. Assume that business entity B is contained in (i.e. part of) business entity A and two operation op 1 and op 2 are used to create A and B, respectively. As for behavioural interface, operation op 2 must be invoked before op 1 to ensure that B is created before A can be created. Otherwise, without specifying such order in invoking operations, it may be possible to end up in a deadlock scenario. For example, if op 2 is to be invoked after op 1 , then op 1 can never be completed but waits for entity B to become available (which however will not happen unless op 2 is invoked). Figure 1 (b) depicts an abstract behavioural interface presented in some graphical modelling notation (an abstract representation of a formal modelling language known as Petri nets [25]). Design of behavioural interface synthesis, including behavioural interface model, is presented in Section III-D.
In the third stage, our attention turns to the analysis of overloaded operational signatures and their combinations resulting in service variants, i.e. various ways of invoking a service. The goal is to derive from API specifications all valid service variants and capture them via so-called subtyping relation between business entities. Subtypes of business entities are prevalent in enterprise systems and they may be arbitrarily nested in a type inheritance (or specialisation) hierarchy, leading to complex structures. Given an API specification, service variants are essentially a set of possible combinations of input parameters of operations, which can be transformed into subtypes of business entities. Hence, by extending a business entity model (obtained from the first stage) with subtyping relations, it can be used to specify service variants. Figure 1 (c) depicts an abstract example of a business entity model with subtyping, where business entity A has three subtypes A 1 , A 2 , and A 3 , and furthermore, entity A 2 has two subtypes A 21 and A 22 . The details of service variants analysis are discussed in Section III-E.

C. STRUCTURAL INTERFACE ANALYSIS
We propose a structural interface analysis method, which can be used to systematically extract from an API specification the ('hidden') business entities and their relations to form a business entity model. Below, we formally define the concepts of API specification, business entity, business entity relations, and business entity model. The definitions serve as a necessary preliminary for the design of our method and algorithm to derive business entity models from API specifications. Also, the FedEx Open Shipping service WSDL specification shown in Listing 1 is used to illustrate the formal definitions.
Definition 1 (API Specification): An API specification s is a tuple (OP, P, κ, γ , ξ P , λ P ). OP is the set of operations and P is the set of parameters. ∀p ∈ P, ∀op ∈ OP, κ : P × OP → {input, output, na} indicates if p is an input or output parameter of operation op, or p is not associated (na) with op. γ : P → {primitive, complex} specifies whether each p ∈ P is a primitive or complex parameter. P C = P| γ (p)=complex is the set of complex parameters in P. ξ P ⊆ P C × P specifies the direct nesting relations between parameters. ξ P is intransitive (i.e. ∀ (p,p )∈ξ P ¬∃ p ∈P (pξ P p ∧ p ξ P p )) and irreflexive (i.e. a parameter is not nested in itself). λ P : ξ P → {mandatory, optional} indicates for each (p, p ) ∈ ξ P whether p is a mandatory or optional element nested in p.
Remark: In a WSDL specification, the attribute value of minOccurs of a parameter p within its parent parameter p indicates whether the nesting relation between p and p is mandatory (minOccurs>0) or optional (minOccurs=0).
Example: In Listing 1, the OpenShipService specification has only one operation createOpenShipment, and this operation has only one input parameter, CreateOpenShip-mentRequest. The type of this input parameter is CreateOpen-ShipmentRequest. It is a complex parameter and has two nested parameters: Index and RequestedShipment, both being optional (minOccurs=''0''). RequestedShipment is also a complex parameter which further contains 14 parameters, i.e., 14 nesting parameters of RequestedShipment.
Definition 2 (Business Entity): Let E be a set of business entities. For each e ∈ E, N e is the name of e, key e the unique identifier of e, and A e the finite set of attributes associated with e. Given an API specification s, f : P C → E captures the mapping from a complex parameter p ∈ P C to a business entity e ∈ E, and ∀ p∈P C ∀ p ∈P C ((p, p ) ∈ ξ P ⇒ f (p) = f (p )) (i.e. two nested parameters cannot be mapped to the same business entity). ξ E ⊆ E × E specifies the direct nesting relations between business entities, and ∀ (e,e )∈ξ E ∃ (p,p )∈ξ P (e = f (p) ∧ e = f (p )) (i.e. the nesting relations between business entities are informed by the nesting relations between the corresponding parameters in s). λ E : ξ E → {mandatory, optional} indicates, for each (e, e ) ∈ ξ E , whether e is a mandatory or optional element of e, and λ E (e, e ) = λ P (p, p ) if e = f (p) and e = f (p ).
Example: Assume that the RequestedShipment parameter is mapped to business entity ShippingOrder. As aforementioned, RequestedShipment contains 14 parameters, and accordingly ShippingOrder has 14 attributes. Assume that the RequestedShipment further contains RequestedShipper which is mapped to business entity Shipper. Then, Shipper is nested in ShippingOrder because RequestedShipper is a nesting parameter of RequestedShipment.
Definition 3: (Domination, adapted from [10]) Given an API specification s and a set of business entities E, for two business entities e, e in E where ∃ p ∈ P C ∃ p ∈ P C s.t. e = f (p) and e = f (p ) (i.e. both e and e are derived from s), e dominates e , denoted as e → e , if: Remark: Domination is defined between business entities and is derived from how the corresponding parameters associate with each other in an API specification. Assume business entity e is mapped from parameter p and e from p . If every operation in the service interface specification that has p as an input parameter must also have p as an input parameter, whereas at least one operation that has p as an input parameter does not need to have p as an input parameter, then the corresponding business entity e dominates e . Domination is defined to assist in the definitions of the Exclusive and Inclusive Containment relations below.
Example: As aforementioned, the RequestedShipment parameter is mapped to business entity ShippingOrder. Also, consider that the RequestedPackageLineItem parameter is mapped to business entity ShipmentLineItem. Assume that every operation (e.g., modifyPackageInOpen-ShipmentRequest) that requires ShipmentLineItem also needs ShippingOrder, while there is at least one operation (e.g, createOpenShipment) that requires ShippingOrder but does not need ShipmentLineItem (e.g., because a ShippingOrder can be created without a ShipmentLineItem). Then, Shippin-gOrder dominates ShipmentLineItem. Definition 4 (Exclusive and Inclusive Containment): Given an API specification s, a set of business entities E and their nesting relations ξ E , E s = {e ∈ E|∃ p∈P C (e = f (p))} is the set of business entities derived from s, and )} specifies the entity nesting relations derived from s. Then, ω s = {(e, e ) ∈ ξ E s |e → e ∧ ¬∃ e ∈E s e → e } defines the exclusive containment relations between business entities in E s , and ϕ s = ξ E s \ ω s specifies the inclusive containment relations between them.
Example: Assume that the ShipmentLineItem entity is nested in ShippingOrder entity and the latter is the only business entity that dominates the former. Then, the Shippin-gOrder entity exclusively contains ShipmentLineItem. Next, as aforementioned, the Shipper entity is nested in Shippin-gOrder, and also assume that the latter does not dominate the former. Then, ShippingOrder entity inclusively contains Shipper. Furthermore, if it is mandatory that Shipper is nested in ShippingOrder, then the relationship between the two entities is mandatory Inclusive containment.
Definition 5 (Business Entity Model): A business entity model m derived from an API specification s is a tuple (E s , ξ E s , ω s , ϕ s , λ E ). It consists of the set of derived business entities E s , their nesting relations ξ E s which are further divided into exclusive containment relations ω s and inclusive containment relations ϕ s , and λ E specifying the mandatory or optional attribute of a nesting/containment relation.
Next, we propose a three-step method (see Algorithm 1) to derive business entity models for (complex) APIs specified in interface description languages such as WSDL.
The first step (lines 4-16) is to map parameters to business entities and their attributes using semantic matching techniques. A key task is the derivation of business entities from complex parameters. It is carried out by searching in a repository of business entities (R) for one (e) that semantically matches a given complex parameter (p), where the repository R is a collection of pre-identified business entities based on domain-specific knowledge. Users can designate an ontology for a particular context at design time. This ontology is stored in R, and the complex parameter is checked against the repository to determine if there is a matching entry in it. A number of existing semantic matching approaches with tool support (e.g., COMA++, 4 SimMetrics 5 ) can be applied. To measure the semantic similarity between a parameter and an entry in the predefined ontology, this research adopts COMA++, a tool that applies several different semantic matching algorithms and provides an interactive and iterative match process in which users can decide whether to confirm or reject a proposed match based on matching results. The matching operation takes two schemas as inputs, and produces a mapping between elements of these two resources. The tool uses a variety of measures to calculate the similarity between two schema elements or ontology concepts. The similarity confidence is measured by a float number between 0 and 1, where the former denote entirely different (strong dissimilarity) and the later denotes largely similar (strong similarity).
In our algorithm, this search process is captured by function SemanticMatch(p, R) (line 5) which either returns a business entity that semantically matches p or an empty element (null) when no match can be found. Next, if a matching entity e is retrieved, the mapping between e and the corresponding parameter p is recorded (line 11), and all the parameters p nested in p are mapped to the attributes of e (lines [12][13][14]. Mapping a parameter to an attribute is captured by function ConvertToAttr(p ) (line 13).

D. BEHAVIOURAL INTERFACE SYNTHESIS
The business entity model derived from an API specification via structural interface analysis can be used to synthesise service behavioural interfaces, that is, the temporal ordering of operations across multiple business entities. We propose a three-phase method for behavioural interface synthesis and an overview of the method is shown in Figure 2.

1) CATEGORISING CRUD OPERATIONS
At first, the operations associated with the business entities are analysed and categorised into CRUD (i.e. create, read, update, and delete) operations. To launch an instance of business entity e, a create operation is invoked requiring attributes of e as input parameters, and upon the invocation it returns a reference to e (referred to as key(e)). To retrieve an instance of e, a read operation is involved requiring key(e) as an input parameter, and upon the invocation it returns values of attributes of e. Similarly, an update operation is for updating an instance of e, of which the invocation requires key(e) and the new values of the relevant attributes of e; and a delete operation is for deleting an instance of e.

2) GENERATING MODEL FOR CREATE OPERATION
The second phase focuses on generating behavioural models for create operations. These models represent the temporal order of the operations invoked for the creation of a new instance of a business entity, as derived from business entity relations. Here are some examples of the derivation rules. An exclusive containment relation between business entities A and B indicates that A has an exclusive ownership of B. As a result, an instance of B should be launched either as part of or after creating an instance of A. If B is a mandatory part of A (i.e. mandatory exclusive containment), an instance of B must be launched upon or after creation of an instance of A. Next, an inclusive containment relation between B and C indicates that B has an inclusive ownership of C while C has its own independent existence, meaning that launching an instance of C does not necessarily rely on the existence of B.
Let us revisit the excerpt of FedEx Open Shipping service's WSDL specification in Listing 1 (Section II). A Shipping Order exclusive contains Package Line Item(s) and it is a mandatory containment. Hence, a Package Line Item should be only created either as part of creating a Shipping Order or after a Shipping Order is created. Next, a Shipping Order inclusively contains a Shipper and it is mandatory, so a Shipper may exist before the creation of a Shipping Order. A Shipping Order also inclusively contains a Shipping Label, which is optional, and thereby a Shipping Order and a Shipping Label may be created independently.
Algorithm 2 specifies the derivation rules for generating a behavioural interface model for create operations in general. At first, we define the notion of a behavioural interface model. As specified in Definition 6, such a model is defined as a Petri net, which is a mathematical modelling language for precisely describing the behaviour of distributed systems involving choice, iteration, and particularly concurrent executions. A Petri net consists of places (Q), transitions (T ), and flows (F). Transitions are used to model tasks or actions of which the executions often change the state of a system. Places represent pre-conditions required for a task or action to occur as well as post-conditions upon the occurrence of the task or action. Flows capture directed execution order from places and transitions and vice versa. A Petri net is mathematically defined and also offers a graphical notation (e.g., places are drawn as circles, transition as rectangle, flows as directed arcs). Readers interested in Petri nets can find more details in [25]. Algorithm 2 consists of a number of linear operations, applies three other algorithms (lines 10, 20 and 22) for generating parts of the overall behavioural interface model that can be derived from the corresponding containment relations between business entity models, and invokes them in a sequential order. Its run-time complexity is calculated by taking into account the complexity of each of those three algorithms. Two algorithms on lines 10 and 22 deal with inclusive containment relation and have the complexity of O(|ϕ|), the algorithm on line 20 handle exclusive containment relation and has the complexity of O(|ω|). Therefore, the complexity of Algorithm 2 is O(2|ϕ| + |ω|).
For an abstract demonstration of Algorithm 2, Figure 3 (a) depicts a business entity model comprising e 1 the main business entity, e 2 exclusively contained in e 1 (mandatory), e 4 inclusively contained in e 2 (mandatory), and e 5 inclusively contained in e 2 (optional). Figure 3 (b) shows the corresponding behavioural interface model generated by Algorithm 2 for creating the business entities in Figure 3 (a). 6 Note that silent transitions are used to capture those operations or actions that are not the focus of our study, and they are needed in a behavioural interface model for specifying the overall execution behaviour.

3) SYNTHESISING LIFECYCLE FOR ALL OPERATIONS
Finally, to capture the behaviour of invoking the relevant operations, an overall behavioural interface model can be synthesised representing the life cycle of business entities and the associated operations. With CRUD operations, the notion of state can be introduced, and a business entity generally has four states: ready, created, updated and deleted, in its life cycle. A lifecycle model specifies the possible ways that a business entity can evolve from an initial state to a final state. Among these states, ready is defined as the very initial state of a business entity, indicating it is ready for the entity to be created. A business entity can be created if and only if it is the ready state, and it may be updated or deleted only after it is created. Accordingly, the behavioural interface derivation method yieds two forms of models: business entity creation model and lifecycle model. These models, presented as Petri nets, capture the sequences of operations related to the manipulation of business entities (such as the steps of creating a shipment order or the life cycle of operating on a shipment order), and thus they can be used to inform service users about the invocation sequences that they should adhere when invoking a service via an API.

E. SERVICE VARIANTS ANALYSIS
The business entity model derived from an API specification via structural interface analysis can also be used to derive service variants, that is, various ways of invoking a service. By introducing a subtyping relation between business entities, the problem of deriving service variants can be related to finding subsets of parameters corresponding to business entity subtypes in API operations. We proposed an efficient technique for traversing parameter sets and finding valid subtype invocations, using a Monte Carlo method [26], based on likelihood-free Bayesian sampling. The technique exploits close proximity of parameters in each operation to determine the most likely next parameter to find for a subset based on a previous parameter probabilistic tree search. We herein give a short introduction to this service variant analysis technique proposed in our previous publication [14], where readers interested in the technique can find more relevant details. Figure 4 depicts an overview of our service variant analysis technique (using an abstract example). A service variant is a combination of input parameters that are accepted when invoking an operation. Given a list of input parameters of an operation and a known service variant (e.g. op 1 (p 1 , p 2 , p 4 )), the method first initialises a tree with minimum number of leaves. A node of the tree not only stores an input parameter but also the probability of the parameter being a successor of another. With the initial tree, the method searches for other service variants. The key action of the search is to identify the likeliest successor through a Monte Carlo search that employs Bayesian updates and Importance Sampling [26].
The search process takes as input the current parameter being processed, the current path (consisting of a number of parameters traversed along the tree), the current tree node, and a transition kernel variance. It recursively draws a single random variable (i.e. a potential succeeding parameter) based on probability distributions over the current node's child parameters. The search terminates when it reaches the last parameter, and the path drawn from the search is tested thereafter. The test of a path is done via invoking the corresponding operation with the parameters on the path, e.g. invoking op 1 with the sequence of parameters p 1 , p 2 , p 3 , p 8 , p 9 shown in Figure 4. If the combination of parameters is accepted, the search then recursively updates the probabilities associated with each of the parameters along the path. If it is not accepted, the entire attempt is ignored and the algorithm proceeds to the next search.
Once a service variant is derived from the above search process, it is then transformed to a subtype of a business entity in the business entity model obtained from structural interface analysis. Recall that a service variant is specified as a set of parameters. The transformation is mainly carried out in three steps: firstly, to search for a business entity e in the business entity model such that each parameter of a service variant v can be mapped to an attribute of business entity e; secondly, to create a subtype entity e s of e; and thirdly, to map all the parameters of v to the attributes of e s . Let us consider the example depicted in Figure 1 (a) and (b). In Figure 1 (a), four business entities are derived from operation op 1 of API specification s. Business entity A has five attributes a 0 to a 4 , among which a 3 and a 4 are mapped from complex parameters p 5 and p 11 , respectively. Parameters p 5 and p 11 are further mapped to business entities B and D, and in principle the attributes of these two entities are also attributes of object A. Similarly, the attributes of entity C are also attributes of entities B and A. As a result, business entity A has in total 12 attributes a 0 to a 11 (corresponding to parameters p 2 to p 13 of operation op 1 ). Next, we assume that the following five service variants have been obtained: v 1 = {p 2 , p 3 , p 7 }, v 2 = {p 3 , p 4 , p 7 , p 9 , p 10 }, v 3 = {p 4 , p 6 , p 12 , p 13 }, v 4 = {p 3 , p 4 , p 9 }, and v 5 = {p 7 , p 10 }. These service variants are used to form the subtype entities shown in Figure 1 (b). Let us start with variant v 1 . Three parameters p 2 , p 3 and p 7 are mapped to three attributes a 0 , a 1 and a 6 , respectively. These attributes are a subset of the attributes of entity A, and thus form subtype entity A 1 (of A). VOLUME 8, 2020 Similarly, v 2 is used to form A 2 , v 3 to form A 3 , and v 4 and v 5 to form A 21 and A 22 (subtypes of A 2 ), respectively.
We also specify relations between entity subtypes as inspired by subtype exclusion and exhaustion constraints in Object Role Modelling [27]. As shown in Figure 1 (b), a collective exhaustive relation between three subtype entities A 1 , A 2 and A 3 indicates that all the attributes of business entity A can be obtained as the union of the attributes of its three subtypes. An exclusive relation between subtypes A 1 and A 3 means that the two entities do not share any common attribute. Accordingly, an exclusive and collective exhaustive relation holds between subtypes A 21 and A 22 .

IV. EVALUATION
This section focuses on demonstration and validation of the approach presented in the previous section. A prototypical implementation of our approach, known as the Service Interface Analyser, has been developed (using Java) and released as an open-source tool. 7 The experiments discussed in this section were performed on QUT HPC. 8

A. TOOL STRUCTURE
The Service Interface Analyser is divided into three modules as shown in Figure 5. Below, we discuss these three modules github.com/fuguowei/ServiceIntegrationAccelerator, and that of the front end is on https://github.com/fuguowei/ SIAFrontEnd. 8 The Queensland University of Technology high-performance computing lab, see http://www.itservices.qut.edu.au/ researchteaching/hpc/. and the details of each of the components in the tool's structure can be found on the Service Interface Analyser's page on Github.
The structural interface analysis module takes API WSDL specifications as input, together with the knowledge of business entity semantics (e.g. based on the input from domain experts), and yield business entity models (BE models for short). Existing API specifications (which are often complex and overloaded) can be retrieved from the API specification repository. The output business entity models capture simplified representations of complex structural interfaces by deriving business entities and their relations, and are stored in the BE model repository.
Next, the behavioural interface synthesis module takes the above BE model as a key input and generates valid sequences of operations. The results are presented as behavioural interface models involving entity creation and ultimately business entity life cycle (BE lifecycle for short). The BE model and BE lifecycle model constitute a simplified presentation layer rendering business entities aligned APIs.
In addition, API WSDL specifications and BE models are also input to the service variant analysis module for deriving business entity subtyping relations and also service variants which are stored in the service variant repository.

B. VALIDATION OF STRUCTURAL INTERFACE ANALYSIS
The source data for the experiments of structural interface analysis were taken from three categories: Internet Services (IS), Software-as-a-Service (SaaS), and Enterprise Services (ES); while the complexity of their APIs increases from IS, SaaS to ES. Altogether 13 widely-deployed services were drawn from xmethods (Weather Forecast, 9 Find People, 10 and MailBoxValidator 11 ), Amazon (S3, 12 EC2, 13 Advertising, 14 and Mechanical 15 ), and FedEx. 16 Totally 272 operations, 12962 input parameters, and 29700 output parameters were analysed by the Service Interface Analyser. Domain experts were then asked to examine the analysis results, identify false positives, and make any necessary adjustments.
According to the results in Table 1, Internet Services usually have only a few operations with a handful of parameters. For example, the Weather Forecast service has two operations: ''GetCitiesByCountry(Country)'' and ''GetFore-castByCity(City, Country)''. Although the Service Interface Analyser can pick up and present the Internet services' parameters for providing guidance on the structural interface of these services, service users do not benefit significantly from the analysis results because of their simple APIs.
As Table 1 shows, the APIs of the services in the SaaS category present intermediate complexity. The number of operations provided by the four Amazon Web services ranges from 9 to 157, and the average number of input parameters is between 4 and 24. There are around 3 business entities derived per operation. It may appear that service users can cope with this type of service, as the number of input parameters for some operations is not very large, but the number of operations is quite significant and service users may find it difficult to understand the temporal order among 9  these operations. Hence, having a proper structural analysis is essential to derive such order. Finally, the category of Enterprise Services contains the most complex APIs, which usually have operations with a large of number of input and output parameters. Hence, it is important to reduce the complexity so that service users can understand the APIs. The experiment results of the six FedEx services shown in Table 1 reveal that the corresponding complex API specifications have been provided with simplified representations, which demonstrates the Service Interface Analyser works effectively for enterprise services.
For example, the Open Shipping service has 22 operations and the average number of the input parameters is 309 and that of the output parameters is 575. After the structural interface analysis, on average, 11 entities per operation were derived. One of the FedEx Open Shipping service's operations, ''createOpenShipment'', has 1336 input parameters and 596 output parameters. By analysing these parameters, 16 key business entities and their relationships were derived (see Figure 6). This dramatically reduces the complexity as users can now readily understand the interfaces by looking at these business entities and their relationships.

C. VALIDATION OF BEHAVIOURAL INTERFACE SYNTHESIS
A total of 9 services drawn from Amazon and FedEx (as those used for structural interface analysis) are used as the source data for the experiments of behavioural interface synthesis. Table 2 lists the results from the experiments.
In the SaaS category, the number of operations provided by the three Amazon web services ranges from 9 to 44. Based on the business entity models generated and the operations provided by these services, 3, 2 and 9 behavioural models were generated for the creation of business entities involved in the Amazon S3, Advertising, and Mechanical services, respectively. The same number of life cycle models for these entities were also derived.
Taking the Amazon S3 service as an example, Figure 7 (a) depicts a Bucket centric business entity model. The produced VOLUME 8, 2020 TABLE 2. Behavioural interface synthesis experiment results of 9 selected services specified in the following measures: operations each service provides (N), business entities (N BE ), behavioural models for entity creation (N BM ), and lifecycle models (N LC ), the time taken (in milliseconds) for generating these models for each service (T ). The behavioural models for entity creation and lifecycle are detailed with number of places, transitions, and flows (denoted by P/T /F ).

FIGURE 6.
A screenshot of the structural interface analysis output of the Fedex Open Shipping service generated by the Service Intergration Accelerator, where each dot represents a business entity and the lines between dots represent the relation between business entities. behavioural interface model for the Bucket's creation is shown in Figure 7 (b). In this model, the transition ''Create-Bucket'' has been identified as the one that creates an instance of Bucket. Also, entity BucketLoggingStatus is exclusively contained (as mandatory) in entity Bucket, meaning an instance of BucketLoggingStatus has to be instantiated after the creation of a Bucket instance. ''SetBucketLoggingStatus'' has been identified as the transition that creates an instance of BucketLoggingStatus, so this operation is called after ''Cre-ateBucket'' as shown in Figure 7 (b).
An enterprise service usually involves numerous business entities and operations. The statistics for the six FedEx services in Table 2 show the number of behavioural interface models generated. For example, by analysing the 22 operations provided by the FedEx Open Shipping service, 4 behavioural models for the creation of 4 business entities (ShippingOrder, ShipmentLineItem, PendingShipment, and Consolidation) were derived. Correspondingly, 4 life cycle models were created for these business entities. The validation of these behavioural interface models was performed by invoking the services with the sequences derived, and the results show that the temporal sequences revealed in the models are valid and match with what is described in the FedEx OpenShipping reference. 17

D. VALIDATION OF SERVICE VARIANTS ANALYSIS
A challenge for our method is to analyse services and their APIs in the category of Enterprise Services, where the average number of input parameters is around 200. When designing the experiments for service variants analysis, we have simulated services with 20, 50 and 100 parameters, respectively, and with structural complexities comparable to the services analysed. In measuring the performance boost in our service variants analysis method, we have compared it against a brute-force search for problem sizes of 20, 50 and 100 parameters, respectively.
Brute-force search (a.k.a. exhaustive search) is very general problem-solving technique that exhausts all possibilities in order to reach a solution. In the context of deriving service 17 https://images.fedex.com/templates/components/ apps/wpor/secure/downloads/pdf/201507/FedEx_ WebServices__DevelopersGuide_v2015.pdf variants, this method searches all possible service variants in order to identify valid ones. This approach is prohibitive and impractical, especially when the number of parameters is as large as what enterprise services have, because the search space is enormous and simply cannot enumerate all possible parameter combinations.
In the simulated servers, variants were generated at random, so that we could determine the success rates of recovering those variants with our method. In each experiment, the server generated sets of twenty service variants of different lengths and deviations from one another. Experiments in the problem stage of 20 parameters, involved variants selected at random of lengths: 5, 8, 11, 14, and 17; in the problem stage of 50 parameters, the lengths of variants were: 10, 15, 20, 25, 30, 35, and 40; and for the problem stage of 100 parameters, the lengths were: 10, 20, 30, 40, 50, 60, 70, and 80. In creating statistical confidence, two-hundred experiments were conducted for each problem stage, and experiments ran for six days. As depicted in Figure 8, the variant analysis method proposed in this study fared worse than the brute-force one when the total number of parameters is 20. On average, the variant analysis method was able to identify from approximately 35% to 46% of a total of 20 valid variants among the 5 sets given (see the blue line in Figure 8). There is a standard deviation for each set. The maximum percentage picked up by the method is 90%, given 5 and 8 known input parameters, and the minimum one is 10% given, 17 known input parameters. Such differences between the results obtained using the brute-force and Monte Carlo methods are due to the fact that the more parameters a variant has, the more difficult it is for the sampling method to identify the variant. The red line in Figure 8 presents the performance of a brute-force method given the first test case, where the method was able to derive the majority of valid service variants, whereas the Monte Carlo sampling could identify only approximately 40 per cent of them. This is because the search space is still within the reach of the capability of the brute-force method and   However, when the number reached 50, 100, or greater, the brute-force approach became ineffective, while the Monte Carlo search method was more effective by contrast. This can be seen by the performance comparison in Figures 9 and 10. The brute-force method failed to identify anything when the length of the given path was greater than 20 for a total number of 50 parameters (see the red line in Figure 9). In all the experiments, the percentage of the hit rate in the applications of the Monte Carlo sampling method is greater than that of the brute force approach, meaning that the proposed sampling method is more likely to pick up a valid variant. The results showed that the Monte Carlo sampling could find variants in search spaces previously thought to be prohibitive.
In addition, we also evaluated the Monte Carlo-based method on a simulated FedEx shipment service. 18 While this service involved seven operations, the server only simulated the core ''processShipment'' operation. This operation involved 1053 input parameters and 565 output parameters. From which we have derived 34 business entities using the 18 www.fedex.com/templates/components/apps/wpor/ secure/downloads/xml/Aug13/advanced/ShipService_ v13.xml VOLUME 8, 2020 structural interface analysis method (described in Section III-C). From these, a cohesive set within the context of the shipment service of 43 core parameters were selected to demonstrate our method. The search method derived 11 of 20 valid combinations and took 8602 minutes in total given a known combination. An example of service variants of the simulated FedEx Shipment service derived from our approach and presented in the form of business entity subtypes is illustrated in Figure 11.

V. RELATED WORK
API analysis is significant, as seen through many commercial products e.g. Microsoft BizTalk Mapper, 19 Stylus Studio XML Mapping Tools, 20 and SAP XI Mapping, 21 and has been the subject of ongoing research over many years. It has been motivated originally by the challenges of systems interoperability through Web services, with the large number of contributions relating to the utilisation of semantic 19 https://docs.microsoft.com/en-us/biztalk/core/ creating-maps-using-biztalk-mapper 20 http://www.stylusstudio.com/press/ 2005-02-08-sleepycat.html 21 https://wiki.scn.sap.com/wiki/display/XI/ Mapping+Concepts+in+SAP+XI ontologies on Web service specifications to address goals of service discovery [28], adaptation [29] and composition [30]. Both structural and behavioural aspects have been covered in Semantic Web service techniques, with ontologies capturing domain-based entity types and relationships. Moreover, techniques have been proposed to exploit entity subtypes underpinning service variants. For instance, Stollberg and Muth [4] addressed this in the context of service variants in the widely used SAP's Enterprise Resource Planning system while Tosic et al. [31] proposed a generalised language, Web Service Offerings Language (WSOL), for service variants. WSOL was specifically applied to annotate different variants and versions of a service, which supports service discovery applications.
Semantically annotated APIs also make it possible to support service behavioural interface derivation [32]. This form of API typically includes preconditions and postconditions, which define a set of requirements and restrictions such as ''must have an existing account with this company'', and ''only US customers can be served'' or ''a new purchase order will be created''. The key limitation of this body of semantic service analysis techniques is the dependency on manual, user effort to design and annotate API specifications, with upgrade costs incurred as ontologies inevitably evolve. To improve the degree of manual effort, information retrieval techniques have been proposed for ontology conception derivation [33] and schema matching between APIs [34].
Another prominent approach to API analysis involves dynamic analysis (data mining) techniques, which focusses on the analysis of API usage data through systems logs. Of these, a number support the derivation of non-functional properties such as average and variances of call frequencies, data transfer sizes, return times, probability of secondary dependencies and other measures [9]. Other techniques focus on functional aspects captured through recorded service interactions including derivation of message types [5] and message correlation [6]. The availability of API usage data also makes it possible for sequences of service interactions, and, thus, the temporal order of service operation invocations -yielding behavioural models of services not typically available API specifications [7], [8]. Nonetheless, dynamic analysis techniques face the practical limitation that not all possible cases and conditions of execution have been covered in a log [35].
In recent years, static API analysis techniques have been developed to analyse operational signatures only (and no other parts of systems implementation such as source code and execution logs). These techniques have been used to assist developers in improving API structures and supporting API translation to new languages. Static analysis of more contemporary REST APIs has been on structural inconsistencies and other issues concerning resources, e.g., anti-patterns construed on the basis of inconsistencies of resource hierarchies [20]. The focus of static analysis techniques on older style procedural service interfaces, seen through WSDL APIs, has been on analysing various properties and problems in operational signatures for improving operational cohesion and coupling. Earlier approaches pointed to the need of heuristic search to overcome the combinatorial problems of brute-force analysis of operations [36], [37]. Meanwhile, Bertolino et al. [38] analysed API operations to derive their dependencies, which was based on input and output parameter dependencies of the operations using type matching heuristics. This is used to derive the behavioural protocol of the API. The extracted entity models are limited to invocation dependencies and lack structural relationships between entities. Kumaran et al. [10] formalised information entities based on the domination theory (utilising co-location heuristics of parameters in operations) and uses this to derive strict containment relationship of entities. Using extracted domination graphs, behavioural models (i.e., state machines) are derived for transitive closures of entities. The only relationship type studied is strict containment by Eshuis and Van Gorp [39]. The work of [15] was one of the first to extract entities from procedural API operation signatures (WSDL based), using a natural language processing technique. Relatively modular operational signatures were used, through which hierarchical relationships between entities were derived, reflecting the hierarchical resource relationships encountered in REST APIs. This reflected the wider goal of the approach, for WSDL to REST API translation. Other text mining approaches for WSDL specifications focus on detection of anti-patterns by Mateos et al. [16] and Hirsch et al. [17], and quantification of WSDL specification readability by De Renzis et al. [18]. Furthermore, a number of metrics were proposed to demonstrate the quality of service interfaces in the context of legacy systems modernization, through the work of Mateos et al. [19] which concerned COBOL to SOA migration. However, all these approaches insufficiently treated the problem of overloaded signatures and the challenge of automated service remodularisation.
Service interface remodularization was first addressed by Athanasopoulos and Kontogiannis [15] and Ouni et al. [40], based on structural similarity measures of operations in an interface, e.g., similarity of message types used in operations and input/output dependencies of operations. Both approaches analyse structural similarity of operations based and optimization techniques to reason about operations splitting. Athanasopoulos and Kontogiannis [15] focussed on dependencies within operations, operation cohesion, to iteratively split a service interface using a greedy algorithm. Ouni et al. [40] focusses on the operation coupling or dependencies across operations. Both approaches are based on traditional algorithms of greedy search and graph partitioning to address this problem. Boukharata et al. [41] extended the work of Ouni et al. [40] to determine sequential similarity (input/output dependency), communication similarity (message types) and semantic similarity (data types related to domain concepts). These extracted similarity measures are used through multi-objective optimization (using NSGA-II), to find optimal modularization of operations, reaching the best trade-off between minimizing coupling, maximizing cohesion, and minimizing the interfaces modifications.
Our paper extends upon the approach of Kumaran et al. [10] and derives a comprehensive entity model by comparison. Specifically, we extend the application of domination theory to operation parameter co-location analysis to derive different entity relationships types, i.e. strict containment, weak containment and basic associations, each with mandatory and optional cardinalities. These allow a more refined understanding of API operation structure allowing recommendations to be made for operation restructure as published in [12], [42]. Our contribution to intra-operational structural analysis, based on probabilistic tree search of parameter space, has led to a novel technique for service variants derivation from API operations and therefore variant-based service restructure recommendations [14]. Using refined relationship insights of entity models, we have also developed entity behavioural models, focussed currently on entity existence operations (i.e. create and delete operations) [13]. For example, if one of an operation's input parameters depends on at least one of another operation's output parameters, determined through the dependency captured through corresponding entities in the entity model, then that operation should be invoked the other one.
We also note there have been contributions to systems analysis using static and dynamic analysis techniques which relate to APIs. For example, automata learning has proven effective in constructing behavioural interfaces of event reactive systems [43]- [45]. This method actively interrogates target systems with queries, observes behavioural models produced in response to the queries, and learns these models using machine learning algorithms. It is important to handle the data dependencies between invocations, so analysis of data parameters and data flows for the derivation of behavioural models can be complemented by the utilisation of automata learning [45]- [48]. For instance, the work of Bertolino et al. [38] has been complemented by active automata learning [45] to improve the accuracy of behavioural models derived. By contrast our present paper has focussed on reliance of API code for static analysis to extract, as best as possible, structural and behavioural properties which can support API restructure. This, coincidentally, reflects the reality that APIs are typically decoupled and publicly available that software systems code.

VI. CONCLUSION
Despite the fact that Web-based APIs are complex and overloaded, there is a lack of sufficient knowledge about the structural composition as well as invocation sequences of these interfaces. The research reported in this paper has presented for the first time a systematic approach to yield a simplified and insightful presentation of these complex interfaces without requiring their comprehensive semantics. The approach is composed of three building blocks, which are structural interface analysis, behavioural interface analysis, and service variants analysis.
Future work on structural interface analysis can be seen as follows. In this paper, the concepts of business entities and their containment relationships have been introduced and formalised into a business entity model. Multiplicity, specifying the number of instances of one business entity allowed in a containment relationship with another business entity, leads to iteration (i.e. allowing the creation of multiple instances) in a service behavioural interface. This is to be studied in future. Next, the idea of structural interface analysis involving derivation of business entity models presented in this study can be applied to RESTful service interfaces, which is worth of investigation in future.
With regards to behavioural interface synthesis, a method for deriving state-based behavioural interfaces upon a given business entity model has been proposed in the paper. The introduction of states into service behavioural interfaces enables flexible service interactions. The notion of states enables a declarative mechanism for interaction needs, without prescribing which services or which order of interactions should be taken. For future directions, this opens up the possibility of a dynamically determined execution of interaction, such as the interactions relevant to advancing states, the interactions involved in fulfilling interaction progress, and the interleaving of interactions across different services. Advanced operations, such as cancellations back to previous states and replacements with new providers going forward, also become possible.
Finally, in service variants analysis, a Monte Carlo-based sampling method has been developed to search for service variants. The primary significance of this method is that, through experiments, good results can be produced even in a very large search space. This is in stark contrast to conventional methods such as a brute-force method, which cannot derive any variants given a large search space. Another prominent feature of the method is that, compared with existing studies, it requires minimal human intervention and inputsonly a known path (i.e., an acceptable service variant) -and it can automatically identify other valid service variants. While the Monte Carlo sampling method has sensible performance results, importance sampling is currently the only variance optimisation. Optimising this method by introducing additional mechanisms, such as Markov Blanket [49] remains a future research objective.
ALISTAIR BARROS is currently a Professor and the Head of Service Sciences Research, School of Information Systems, Queensland University of Technology (QUT). He has worked extensively with SAP, government departments, and various industry partners. His research interests include the development of novel business design utilising services in different industries, service-based IT architecture, and the technical analysis of systems for identifying and validating new designs.
CHUN OUYANG received the Ph.D. degree. She is currently a Senior Lecturer with the School of Information Systems, QUT. Her Ph.D. research interests include system modeling and verification using formal techniques. In the last decade, she has developed strong research interests in and contributed to the areas of business process modeling and analysis, process execution, and process mining. Her recent research interests include predictive analytics, process automation in the cloud, and microservices.
FUGUO WEI received the Ph.D. degree from QUT. He is currently a Service Integration Specialist and Enthusiastic about all aspects of application integration. His Ph.D. research interests include API analysis and service integration. He is currently with the Super Retail Group, Strathpine, QLD, Australia, and the Queensland University of Technology, Brisbane, QLD, Australia. He has undertaken various roles in both academia and industry, including a software developer (senior Java developer and certified Mulesoft developer), a consulting, and a tech lead. His research interests are Web API analysis, service integration and composition, and microservices architecture. VOLUME 8, 2020