By Topic

Autonomic Computing, 2008. ICAC '08. International Conference on

Date 2-6 June 2008

Filter Results

Displaying Results 1 - 25 of 37
  • [Front cover]

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (1110 KB)  
    Freely Available from IEEE
  • [Title page i]

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (20 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (59 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): v - vii
    Save to Project icon | Request Permissions | PDF file iconPDF (128 KB)  
    Freely Available from IEEE
  • Message from General Chairs

    Page(s): viii
    Save to Project icon | Request Permissions | PDF file iconPDF (103 KB)  
    Freely Available from IEEE
  • Steering Committee

    Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (105 KB)  
    Freely Available from IEEE
  • Technical Program Committee

    Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (71 KB)  
    Freely Available from IEEE
  • Power and Performance Management of Virtualized Computing Environments Via Lookahead Control

    Page(s): 3 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher server utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a server cluster managed by the controller conserves, on average, 26% of the power required by a system without dynamic control while still maintaining QoS goals. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PQR: Predicting Query Execution Times for Autonomous Workload Management

    Page(s): 13 - 22
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    Modern enterprise data warehouses have complex workloads that are notoriously difficult to manage. One of the key pieces to managing workloads is an estimate of how long a query will take to execute. An accurate estimate of this query execution time is critical to self managing Enterprise Class Data Warehouses. In this paper we study the problem of predicting the execution time of a query on a loaded data warehouse with a dynamically changing workload. We use a machine learning approach that takes the query plan, combines it with the observed load vector of the system and uses the new vector to predict the execution time of the query. The predictions are made as time ranges. We validate our solution using real databases and real workloads. We show experimentally that our machine learning approach works well. This technology is slated for incorporation into a commercial, enterprise class DBMS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generating Adaptation Policies for Multi-tier Applications in Consolidated Server Environments

    Page(s): 23 - 32
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (582 KB) |  | HTML iconHTML  

    Creating good adaptation policies is critical to building complex autonomic systems since it is such policies that define the system configuration used in any given situation. While online approaches based on control theory and rule- based expert systems are possible solutions, each has its disadvantages. Here, a hybrid approach is described that uses modeling and optimization offline to generate suitable configurations, which are then encoded as policies that are used at runtime. The approach is demonstrated on the problem of providing dynamic management in virtualized consolidated server environments that host multiple multi-tier applications. Contributions include layered queuing models for Xen-based virtual machine environments, a novel optimization technique that uses a combination of bin packing and gradient search, and experimental results that show that automatic offline policy generation is viable and can be accurate even with modest computational effort. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic-Driven Model Composition for Accurate Anomaly Diagnosis

    Page(s): 35 - 44
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    In this paper, we introduce a semantic-driven approach to system modeling for improving the accuracy of anomaly diagnosis. Our framework composes heterogeneous families of models, including generic statistical models, and resource-specific models into a belief network, i.e., Bayesian network. Given a set of models which sense the behavior of various system components, the key idea is to incorporate expert knowledge about the system structure and dependencies within this structure, as meta-correlations across components and models. Our approach is flexible, easily extensible and does not put undue burden on the system administrator. Expert beliefs about the system hierarchy, relationships and known problems can guide learning, but do not need to be fully specified. The system dynamically evolves its beliefs about anomalies over time. We evaluate our prototype implementation on a dynamic content site running the TPC-W industry-standard e- commerce benchmark. We sketch a system structure and train our belief network using automatic fault injection. We demonstrate that our technique provides accurate problem diagnosis in cases of single and multiple faults. We also show that our semantic-driven modeling approach effectively finds the component containing the root cause of injected anomalies, and avoids false alarms for normal changes in environment or workload. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Guided Problem Diagnosis through Active Learning

    Page(s): 45 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (299 KB) |  | HTML iconHTML  

    There is widespread interest today in developing tools that can diagnose the cause of a system failure accurately and efficiently based on monitoring data collected from the system. Over time, the system monitoring data will contain two types of failure data: (i) annotated failure data L, which is monitoring data collected from failure states of the system, where the cause of failure has been diagnosed and attached as annotations with the data; and (ii) unannotated failure data U. Previous work on wholly- or partially-automated diagnosis focused on L or U in isolation. In this paper, we argue that it is important to consider both L and U together to improve the overall accuracy of diagnosis; and in particular, to proactively move instances from U to L. However, such movement requires manual diagnosis effort from system administrators. Since manual diagnosis is expensive and time-consuming, we propose an algorithm to make the best use of manual effort while maximizing the benefit gained from newly diagnosed instances. We report an experimental evaluation of our algorithm using data from a variety of failures - both single failures and multiple correlated failures - injected in a testbed, as well as with synthetic data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clustering Analysis for the Management of Self-Monitoring Device Networks

    Page(s): 55 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (257 KB) |  | HTML iconHTML  

    The increasing computing and communication capabilities of multi-function devices (MFDs) have enabled networks of such devices to provide value-added services. This has placed stringent QoS requirements on the operations of these device networks. This paper investigates how the computational capabilities of the devices in the network can be harnessed to achieve self-monitoring and QoS management. Specifically, the paper investigates the application of clustering analysis for detecting anomalies and trends in events generated during device operation, and presents a novel decentralized cluster and anomaly detection algorithm. The paper also describes how the algorithm can be implemented within a device overlay network, and demonstrates its performance and utility using simulated as well as real workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Configuration of an Autonomic Controller: An Experimental Study with Zero-Configuration Policies

    Page(s): 67 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (509 KB) |  | HTML iconHTML  

    Autonomic control managers can remove the need for manual system configuration in order to achieve good performance and efficient resource utilization. However, simple controllers based on reconfiguration actions tied to thresholds, or 'if-then' rules, themselves need to be configured and tuned in order to adapt the controller behavior to the expected workload characteristic. In this paper we present an experimental study of zero-configuration policies that can be automatically tuned based on analytical models of the system under control. In particular, we have designed and implemented a threshold-free self-configuration policy for a distributed workflow execution engine and compared it with a standard PID controller. The experimental results included in the paper show that using such a policy the controller can tune itself in addition to reconfiguring the distributed engine and the proposed policy out-performs simpler policies that require manual and error-prone tuning of their parameters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dealing with Quality Tradeoffs during Service Selection

    Page(s): 77 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (215 KB) |  | HTML iconHTML  

    In a service-oriented system (SoS) service requests define tasks to execute and quality of service (QoS) criteria to optimize. A service request is submitted to an automated service selector in the SoS, which allocates tasks to those service that, together, can "best" satisfy the given QoS criteria. When the selector cannot optimize simultaneously the given QoS criteria, users need to specify priorities over the said criteria. Accounting for users' QoS priorities is therefore necessary during service selection. Once specified by the requester, quality properties will be used by the selector to lead autonomic optimization of the service selection process. We outline and test a selection approach that accommodates priorities and that is based on available multi criteria decision making techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Digital Evolution of Behavioral Models for Autonomic Systems

    Page(s): 87 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    We describe an automated method to generating models of an autonomic system. Specifically, we generate UML state diagrams for a set of interacting objects, including the extension of existing state diagrams to support new behavior. The approach is based on digital evolution, a form of evolutionary computation that enables a designer to explore an enormous solution space for complex problems. In our application of this technology, an evolving population of digital organisms is subjected to natural selection, where organisms are rewarded for generating state diagrams that support key scenarios and satisfy critical properties as specified by the developer. To achieve this capability, we extended the Avida digital evolution platform to enable state diagram generation, and integrated AviDA with third-party software engineering tools, e.g., the Spin model checker, to assess the generated state diagrams. To illustrate this approach, we successfully applied it to the generation of state diagrams describing the autonomous navigation behavior of a humanoid robot. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Adaptive Middleware for Supporting Time-Critical Event Response

    Page(s): 99 - 108
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (385 KB) |  | HTML iconHTML  

    There are many applications where a timely response to an important event is needed. Often such response can require significant computation and possibly communication, and it can be very challenging to complete it within the time-frame the response is needed. At the same time, there could be application-specific flexibility in the computation that may be desired. This paper presents the design, implementation, and evaluation of a middleware that can support such applications. Each of the services in our target applications could have one or more service parameters, which can be modified, within the pre-specified ranges, by the middleware. The middleware enables the time-critical event handling to achieve the maximum benefit, as per the user-defined benefit function, while satisfying the time constraint. Our middleware is also based on the existing Grid infrastructure and Service-Oriented Architecture (SOA) concepts. We have evaluated our middleware and its support for adaptation using a volume rendering application and a Great Lake forecasting application. The evaluation shows that our adaptation is effective, and has a very low overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking Transaction Footprints for Non-intrusive End-to-End Monitoring

    Page(s): 109 - 118
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (493 KB) |  | HTML iconHTML  

    Existing transaction monitoring solutions are either platform-specific or rely on instrumentation techniques, which limit their applicability. Consequently, transaction monitoring in enterprise environments often involves the manual collation of information spread across a variety of infrastructure elements and applications, and is a time- consuming and labor-intensive task. To facilitate self- governance in enterprise environments, we present an online, non-intrusive and platform-agnostic solution for transaction monitoring, where the only inputs required are (a) system log files in which footprints left by ongoing transaction instances are recorded, and (b) a model of the transaction, in terms of the valid sequences of steps that a transaction instance may execute and the expected footprint patterns at each step. Given these, our solution generates a dynamic execution profile of ongoing transaction instances that allows their status to be tracked at individual and aggregate levels, even when transaction footprints do not necessarily carry correlating identifiers as injected through instrumentation. We describe our monitoring architecture and algorithms, results from an empirical study, ongoing work on run-time transaction model validation and directions for future research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Design of a New Context-Aware Policy Model for Autonomic Networking

    Page(s): 119 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (251 KB) |  | HTML iconHTML  

    This paper describes a new version of the DEN-ng context model, and how this model in conjunction with the DEN-ng policy model can be used for more effective and flexible context management. Both are part of the FOCALE autonomic network architecture. Context selects policies, which select roles that can be used, which in turn define allowed functionality for that particular context. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Level Intrusion Detection System (ML-IDS)

    Page(s): 131 - 140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (397 KB) |  | HTML iconHTML  

    As the deployment of network-centric systems increases, network attacks are proportionally increasing in intensity as well as complexity. Attack detection techniques can be broadly classified as being signature-based, classification-based, or anomaly-based. In this paper we present a multi level intrusion detection system (ML-IDS) that uses autonomic computing to automate the control and management of ML-IDS. This automation allows ML-IDS to detect network attacks and proactively protect against them. ML-IDS inspects and analyzes network traffic using three levels of granularities (traffic flow, packet header, and payload), and employs an efficient fusion decision algorithm to improve the overall detection rate and minimize the occurrence of false alarms. We have individually evaluated each of our approaches against a wide range of network attacks, and then compared the results of these approaches with the results of the combined decision fusion algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automating ITSM Incident Management Process

    Page(s): 141 - 150
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    Service desks are used by customers to report IT issues in enterprise systems. Most of these service requests are resolved by level-1 persons (service desk attendants) by providing information/quick-fix solutions to customers. For each service request, level- 1 personnel identify important keywords and see if the incoming request is similar to any historic incident. Otherwise, an incident ticket is created and, with other related information, forwarded to incident's subject matter expert (SME). Incident management process is used for managing the life cycle of all incidents. An organization spends lots of resources to keep its IT resources incident free and, therefore, timely resolution of incoming incident is required to attain that objective. Currently, the incident management process is largely manual, error prone and time consuming. In this paper, we use information integration techniques and machine learning to automate various processes in the incident management workflow. We give a method for correlating the incoming incident with configuration items (CIs) stored in Configuration management database (CMDB). Such a correlation can be used for correctly routing the incident to SMEs, incident investigation and root cause analysis. In our technique, we discover relevant CIs by exploiting the structured and unstructured information available in the incident ticket. We present efficient algorithm which gives more than 70% improvement in accuracy of identifying the failing component by efficiently browsing relationships among CIs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Anatomy of a Real-Time Intrusion Prevention System

    Page(s): 151 - 160
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (319 KB) |  | HTML iconHTML  

    Host intrusion prevention systems for both servers and end-hosts must address the dual challenges of accuracy and performance. Researchers have mostly focused on addressing the former challenge, suggesting solutions based either on exploit- based penetration detection or anomaly-based misbehavior detection, but yet stopping short of comprehensive solutions that leverage merits of both approaches. The second challenge, however, is rarely addressed; doing so comprehensively is important since these systems can introduce substantial overhead and cause system slowdown, more so when the system load is high. We present Rootsense, a holistic and real-time intrusion prevention system that combines the merits of misbehavior- based and anomaly-based detection. Four principles govern the design and implementation of Rootsense. First, Rootsense audits events within different subsystems of the host operating system and correlates them to comprehensively capture the global system state. Second, Rootsense restricts the detection domain to root compromises only; doing so reduces run-time overhead and increases detection accuracy (root behavior is more easily modeled than user behavior). Third, Rootsense adopts a dual approach to intrusion detection - a root penetration detector detects activities that exploit system vulnerabilities to penetrate the security perimeter, and a root misbehavior detector tracks misbehavior by root processes. Fourth, Rootsense is designed to be configurable for overhead management allowing the system administrator to tune the overhead characteristics of the intrusion prevention system that affect foreground task performance. A Linux implementation of Rootsense is analyzed for both accuracy and performance, using several real-world exploits and a range of end-host and server benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Just-in-Time Server Provisioning Using Virtual Machine Standby and Request Prediction

    Page(s): 163 - 171
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (388 KB) |  | HTML iconHTML  

    Server provisioning is a practical technique to reconfigure a shared server and to improve resource utilization of servers in datacenters and enterprise systems. For the complex systems, however, long process of server provisioning impedes prompt solutions to system problems. This paper proposes a technique to shorten the provisioning processing time after the occurrence of the provisioning request by speculative provisioning execution on a virtual machine as standby. In order to start the provisioning execution in advance, a prediction method for the provisioning request is required. This paper presents a prediction model based on the logistic regression model using system performance metrics. From the evaluation using the actual performance data of enterprise systems, for 50% of the server provisioning requests, the provisioning processing time after the request is shorten over 10 minutes by using the 20-minutes look-ahead request prediction model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 1000 Islands: Integrated Capacity and Workload Management for the Next Generation Data Center

    Page(s): 172 - 181
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (414 KB) |  | HTML iconHTML  

    Recent advances in hardware and software virtualization offer unprecedented management capabilities for the mapping of virtual resources to physical resources. It is highly desirable to further create a "service hosting abstraction" that allows application owners to focus on service level objectives (SLOs) for their applications. This calls for a resource management solution that achieves the SLOs for many applications in response to changing data center conditions and hides the complexity from both application owners and data center operators. In this paper, we describe an automated capacity and workload management system that integrates multiple resource controllers at three different scopes and time scales. Simulation and experimental results confirm that such an integrated solution ensures efficient and effective use of data center resources while reducing service level violations for high priority applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.