By Topic

Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005. 13th IEEE International Symposium on

Date 27-29 Sept. 2005

Filter Results

Displaying Results 1 - 25 of 78
  • 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems

    Publication Year: 2005
    Save to Project icon | Request Permissions | PDF file iconPDF (166 KB)  
    Freely Available from IEEE
  • 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems - Title Page

    Publication Year: 2005 , Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (63 KB)  
    Freely Available from IEEE
  • 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems - Copyright Page

    Publication Year: 2005 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (64 KB)  
    Freely Available from IEEE
  • 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems - Table of contents

    Publication Year: 2005 , Page(s): v - x
    Save to Project icon | Request Permissions | PDF file iconPDF (100 KB)  
    Freely Available from IEEE
  • Message from the General Chair

    Publication Year: 2005 , Page(s): xi
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Chairs

    Publication Year: 2005 , Page(s): xii
    Save to Project icon | Request Permissions | PDF file iconPDF (54 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Comittees

    Publication Year: 2005 , Page(s): xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • Referees

    Publication Year: 2005 , Page(s): xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • The challenge of complexity and scale

    Publication Year: 2005
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (37 KB) |  | HTML iconHTML  

    Summary form only given. Systems built from commodity processors dominate high-performance computing, with systems containing thousands of processors now being deployed. As node counts for multi-teraflop systems grow to tens of thousands, with proposed petaflop system likely to contain hundreds of thousands of nodes, and with a tsunami of new experimental and computational. The mean time before failure (MTBF) for the individual components (i.e., processors, disks, memories, power supplies, fans and networks) is high. In contrast to parallel systems, distributed software for networks, whether transport protocols or Web/Grid services, are designed to be resilient to component failures. Our thesis is that these "two worlds" of software-distributed systems and parallel systems. In this paper, we describe possible approaches for the design and effective use of large-scale systems. The approaches range from intelligent hardware monitoring and adaptation, through low-overhead recovery schemes, statistical sampling and differential scheduling and to alternative models of system software, including evolutionary adaptation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards pairing Java applications on SMT processors

    Publication Year: 2005 , Page(s): 7 - 14
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB) |  | HTML iconHTML  

    This paper investigates various issues of pairing Java applications for multithreaded execution on Intel's hyper-threading Pentium 4 processor. We first quantify the overall performance of multiprogrammed Java applications using a metric called combined speedup. Using the performance counters provided by the Pentium 4, we then quantitatively evaluate the performance of underneath micro-architecture components and their implications to the combined speedup. A statistical model is proposed to analyze the collected data. This novel approach reveals that trace cache is the major factor determining the pairing performance. In particular, we find that the trace cache miss rates of Java applications can be utilized to predict the combined speedups. Three new scheduling strategies are proposed based on these observations and then evaluated. The experimental results show that the proposed strategies have better performance than the conventional round-robin scheduling scheme. Overall, our best strategy enables a reduction in execution time of 10.5% over the serial execution, comparing with a reduction of 5.92% achieved by the round-robin scheduling. The improvement will be increasingly significant on future SMT processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload characterization of bioinformatics applications

    Publication Year: 2005 , Page(s): 15 - 22
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    The exponential growth in the amount of genomic information has spurred growing interest in large scale analysis of genetic data. Bioinformatics applications represent the increasingly important workloads. However, very few results on the behavior of these applications running on the state-of-the-art microprocessor and systems have been published. This paper proposes a suite of widely used bioinformatics applications and studies the execution characteristics of these benchmarks on a representative architecture-the Intel Pentium 4. To understand the impacts and implications of bioinformatics workloads on the microprocessor designs, we contrast the characteristics of bioinformatics workloads and the widely used SPEC 2000 integer benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved estimation for software multiplexing of performance counters

    Publication Year: 2005 , Page(s): 23 - 32
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB) |  | HTML iconHTML  

    On-chip performance counters are gaining popularity as an analysis and validation tool. Most contemporary processors have between two and six physical counters that can monitor an equal number of unique events simultaneously at fixed sampling periods. Through multiplexing and estimation, an even greater number of unique events can be monitored in a single program execution. When a program is sampled in multiplexed mode using round-robin scheduling of a specified event set, the number of events that are physically counted during each sampling period is limited by the number of counters that can be simultaneously accessed. During this period, the remaining events of the multiplexed event-set are not monitored, but their counts are estimated. Our work quantifies the estimation error of the event-counts in the multiplexed mode, which indicates that as many as 42% of sampled intervals are estimated with error greater than 10%. We propose new estimation algorithms that result in an accuracy improvement of up to 40%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding patterns of TCP connection usage with statistical clustering

    Publication Year: 2005 , Page(s): 35 - 44
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (336 KB) |  | HTML iconHTML  

    We describe a new methodology for understanding how applications use TCP to exchange data. The method is useful for characterizing TCP workloads and synthetic traffic generation. Given a packet header trace, the method automatically constructs a source-level model of the applications using TCP in a network without any a priori knowledge of which applications are actually present in a network. From this source-level model, statistical feature vectors can be defined for each TCP connection in the trace. Hierarchical cluster analysis can then be performed to identify connections that are statistically homogeneous and that are likely exerting similar demands on a network. We apply the methods to packet header traces taken from the UNC and Abilene networks and show how classes of similar connections can be automatically detected and modeled. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An empirical model of TCP performance

    Publication Year: 2005 , Page(s): 45 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB) |  | HTML iconHTML  

    We propose a model of TCP performance that captures the behavior of a set of network paths with diverse characteristics. The model uses more parameters than others, but we show that each feature of the model is important for at least some paths. We show that the model is sufficient to describe the datasets we collected with acceptable accuracy. Finally, we show that the model's parameters can be estimated using simple, application-level measurements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High short-term bit-rates from TCP flows

    Publication Year: 2005 , Page(s): 55 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (512 KB) |  | HTML iconHTML  

    Micro-bursts from TCP flows are investigated. The chip-rate is introduced and used to quantify the short-term bit-rate of TCP flows. This paper examines packets with chip-rates above the 90th percentile. The examination is performed over time scales ranging from 244 μs to 125 ms. Two issues are addressed, the impact and the causes of the micro-bursts. It is found that the packets with high chip-rate experience an elevated probability of burst losses. For example, the probability of a burst loss is up to 10 times larger for packets sent in micro-bursts. Furthermore, in some settings, these packets experience higher loss rate in general. It is also found that micro-bursts cause an increase in queuing delay. The causes of these micro-bursts are investigated. One finding is that at short-time scales, ACK clocking, which should reduce micro-bursts, is not functioning correctly. For example, in some cases, most of the packets contained in micro-bursts are ACKed at a rate that is less than half of the data rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical simulation of multithreaded architectures

    Publication Year: 2005 , Page(s): 67 - 74
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    Detailed, cycle-accurate processor simulation is an integral component of the design and study of computer architectures. However, as the detail of simulation and processor design size increase, simulation times grow exponentially, and it becomes increasingly necessary to find fast, efficient simulation techniques that still ensure accurate results. At the same time, multithreaded multi-core designs are increasingly common, and require increased experimental design evaluation for a number of reasons including higher system complexity, interaction of multiple co-scheduled application threads, and workload selection. Although several effective simulation techniques exist for single-threaded architectures, techniques have not been effectively applied to the simulation of multithreaded and multi-core architecture models. Moreover, multithreaded processor simulation introduces unique challenges in all simulation stages. This work introduces systematic extensions of commonly-used statistical simulation techniques to multithreaded systems. The contributions of this work include: tailoring simulation fast-forwarding for individual threads, the effects of cache warming on application threads, and an analysis of the primary issues of efficient multithreaded simulation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate modeling of aggressive speculation in modern microprocessor architectures

    Publication Year: 2005 , Page(s): 75 - 84
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (240 KB) |  | HTML iconHTML  

    Computer architects utilize cycle simulators to evaluate microprocessor chip design tradeoffs and estimate performance metrics. Traditionally, cycle simulators are either trace-driven or execution-driven. In this paper, we describe ValueSim, a software layer that is interposed between a cycle simulators and either a functional simulator or a value-enhanced trace. By writing to the ValueSim API, the cycle simulator can run in either trace-driven mode or execution-driven mode, allowing it to exploit the advantages of both approaches. The ValueSim API allows a cycle simulator to accurately model a complete range of aggressive speculative mechanisms developed by computer architects, even in the trace-driven mode. Using ValueSim, we illustrate, for three key commercial applications, the significant underestimation of off-chip bandwidth, queuing delays and cache pollution when modern speculative mechanisms are not accurately modeled, highlighting the importance of accurately modeling these mechanisms in chip multiprocessor designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cycle accurate memory modeling a case-study in validation

    Publication Year: 2005 , Page(s): 85 - 94
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB) |  | HTML iconHTML  

    Simulation is an integral tool in performance analysis, however without some knowledge of a simulator's underlying accuracy and limitations, the results may prove wrong or misleading. Timing validation is one aspect of development which is easy to overlook, typically due to the lack of a comparison target at the time the simulator was written. This paper discusses the design and validation of an accurate timing model for an UltraSPARC IIICu-based system. An existing functional simulator was augmented with a cycle-accurate model of the memory hierarchy of a reference system. Key features of the model include the use of a 'bridge' for the processor/memory system interface, the use of event windows between the simulated backplane and processors, implementation of pipelined transactions, and the extension of the processor run loop to support this. The modeling of the store buffer and prefetch mechanisms proved both challenging and important for the model's accuracy. Using a combination of documentation, microbenchmarks, and comparisons of the NAS parallel benchmarks between the simulator and a real machine, it was possible to uncover several undocumented architectural artifacts, and validate the simulator to a reasonable degree. Hardware performance counters and timing information were used to identify the source of discrepancies. Surprisingly, the overhead of introducing the model was within a factor of two, compared with the original functional simulator. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of TCP/AQM under denial-of-service attacks

    Publication Year: 2005 , Page(s): 97 - 104
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    The interaction between TCP and various active queue management (AQM) algorithms has been extensively analyzed for the last few years. However, the analysis usually assumed that routers and TCP flows are not under any network attacks. In this paper, we investigate how the performance of TCP flows is affected by denial-of-service (DoS) attacks under the drop tail and various AQM schemes. In particular, we consider two types of DoS attacks-the traditional flooding-based DoS (FDDoS) attacks and the recently proposed pulsing DoS (PDoS) attacks. Both analytical and simulation results support that the PDoS attacks are more effective than the FDDoS attacks under the same average attack rate. Moreover, the drop tail surprisingly outperforms the RED-like AQMs when the router is under a PDoS attack, whereas the RED-like AQMs perform better under a severe FDDoS attack. On the other hand, the Adaptive Virtual Queue algorithm can retain a higher TCP throughput during PDoS attacks as compared with the RED-like AQMs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving OSPF dynamics on a broadcast LAN

    Publication Year: 2005 , Page(s): 105 - 114
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB) |  | HTML iconHTML  

    In this paper, we analyze OSPF's interface state machine and propose modifications in order to reduce the time/processing requirements of the leader election process in a broadcast LAN environment. The proposed modifications are based on dynamic adjustment of wait time duration rather than using a static value. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of routing protocols in very large-scale mobile wireless ad hoc networks

    Publication Year: 2005 , Page(s): 115 - 122
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB) |  | HTML iconHTML  

    As wireless devices become more and more popular, ad hoc networks grow in the number of nodes as well as the complexity of communication among the large number of nodes. However, due to the limitation of simulation technologies, it is either impossible or very hard to investigate the scalability of ad hoc routing protocols in very large-scale wireless networks. In this paper, a comprehensive simulation study is conducted of the performance of an on-demand routing protocol on a very large-scale, with as many as 50,000 nodes in the network. We address the scalability analysis based on various network sizes, traffic load, and mobility. The reasons for packet loss are analyzed and categorized at each layer. Based on the observations, we optimize the parameter selection and try to exhaust the scalability boundary of the on-demand routing protocol for wireless ad hoc networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Disk infant mortality in large storage systems

    Publication Year: 2005 , Page(s): 125 - 134
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB) |  | HTML iconHTML  

    As disk drives have dropped in price relative to tape, the desire for the convenience and speed of online access to large data repositories has, led to the deployment of petabyte-scale disk farms with thousands of disks. Unfortunately, the very large size of these repositories renders them vulnerable to previously rare failure modes such as multiple, unrelated disk failures leading to data loss. While some business models, such as free email servers, may be able to tolerate some occurrence of data loss, others, including premium online services and storage of simulation results at a national laboratory, cannot. This paper describes the effect of infant mortality on long-term failure rates of systems that must preserve their data for decades. Our failure models incorporate the well-known "bathtub curve," which reflects the higher failure rates of new disk drives, a lower, constant failure rate during the remainder of the design life span, and increased failure rates as components wear out. Large systems are vulnerable to the "cohort effect" that occurs when many disks are simultaneously replaced by new disks. Our more accurate disk models and simulations have yielded predictions of system lifetimes that are more pessimistic than existing models that assume a constant disk failure rate. Thus, larger system scale requires designers to take disk infant mortality into account. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Storage performance virtualization via throughput and latency control

    Publication Year: 2005 , Page(s): 135 - 142
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    I/O consolidation is a growing trend in production environments due to the increasing complexity in tuning and managing storage systems. A consequence of this trend is the need to serve multiple users/workloads simultaneously. It is imperative to make sure that these users are insulated from each other by visualization in order to meet any service level objective (SLO). This paper presents a 2-level scheduling framework that can be built on top of an existing storage utility. This framework uses a low-level feedback-driven request scheduler, called AVATAR, that is intended to meet the latency bounds determined by the SLO. The load imposed on AVATAR is regulated by a high-level rate controller, called SARC, to insulate the users from each other. In addition, SARC is work-conserving and tries to fairly distribute any spare bandwidth in the storage system to the different users. This framework naturally decouples rate and latency allocation. Using extensive I/O traces and a detailed storage simulator, we demonstrate that this 2-level framework can simultaneously meet the latency and throughput requirements imposed by an SLO, without requiring extensive knowledge of the underlying storage system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A unified multiple-level cache for high performance storage systems

    Publication Year: 2005 , Page(s): 143 - 150
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB) |  | HTML iconHTML  

    Multi-level cache hierarchies are widely used in high-performance storage systems to improve I/O performance. However, traditional cache management algorithms are not suited well for such cache organizations. Recently proposed multi-level cache replacement algorithms using aggressive exclusive caching work well with single or multiple-client, low-correlated workloads, but suffer serious performance degradation with multiple-client, high-correlated workloads. In this paper, we propose a new cache management algorithm that handles multi-level buffer caches by forming a unified cache (uCache) which uses both exclusive caching in L2 storage caches and cooperative client caching. We also propose a new local replacement algorithm, frequency based eviction-reference (FBER), based on our study of access patterns in exclusive caches. Our simulation results show that uCache increases the cumulative cache hit ratio dramatically. Compared to other popular cache algorithms, like LRU, the I/O response time is improved by up to 46% for low-correlated workloads and 53% for high-correlated workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A traffic characterization procedure for multimedia applications in converged networks

    Publication Year: 2005 , Page(s): 153 - 160
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    This work presents a traffic characterization procedure for traffic engineering (TE) in converged networks. An analytical model, focused in a self-similar aggregated traffic characterization is proposed, which considers QoS restrictions for delay metrics. The model, together with evolutionary techniques, is used for the optimization of the link capacity assignment task in network planning with multi-service applications. The results show that the traffic characterization model is reliable as a part of the optimization of costs in converged networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.