By Topic

Computers, IEEE Transactions on

Issue 11 • Date Nov. 2005

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (74 KB)  
    Freely Available from IEEE
  • The VPC trace-compression algorithms

    Page(s): 1329 - 1344
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2152 KB) |  | HTML iconHTML  

    Execution traces, such as are used to study and analyze program behavior, are often so large that they need to be stored in compressed form. This paper describes the design and implementation of four value prediction-based compression (VPC) algorithms for traces that record the PC as well as other information about executed instructions. VPC1 directly compresses traces using value predictors, VPC2 adds a second compression stage, and VPC3 utilizes value predictors to convert traces into streams that can be compressed better and more quickly than the original traces. VPC4 introduces further algorithmic enhancements and is automatically synthesized. Of the 55 SPECcpu2000 traces we evaluate, VPC4 compresses 36 better, decompresses 26 faster, and compresses 53 faster than BZIP2, MACHE, PDATS II, SBC, and SEQUITUR. It delivers the highest geometric-mean compression rate, decompression speed, and compression speed because of the predictors' simplicity and their ability to exploit local value locality. Most other compression algorithms can only exploit global value locality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A configurable statistical lossless compression core based on variable order Markov modeling and arithmetic coding

    Page(s): 1345 - 1359
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1640 KB) |  | HTML iconHTML  

    This paper presents a practical realization in hardware of the concepts of variable order Markov modeling using multisymbol alphabets and arithmetic coding for lossless compression of universal data. This type of statistical coding algorithm has long been regarded as being able to deliver very high compression ratios close to the information content of the source data. However, their high computational complexity has limited their practical application in embedded environments such as in mobile computing and wireless communications. In this paper, a hardware amenable algorithm named PPMH and based on these principles has been developed and its architecture and implementation detailed. This novel lossless compression core offers innovative solutions to the computational issues in both stages of modeling and coding and delivers high compression efficiency and throughput. The configurability features of the core allow efficient use of the embedded SRAM present in modern FPGA technologies where memory resources range from a few kilobits to several megabits per device family. The core has been targeted to the Altera Stratix FPGA family and performance, coding efficiency, and complexity measured for different memory configurations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving computer architecture simulation methodology by adding statistical rigor

    Page(s): 1360 - 1373
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2184 KB) |  | HTML iconHTML  

    Due to cost, time, and flexibility constraints, computer architects use simulators to explore the design space when developing new processors and to evaluate the performance of potential enhancements. However, despite this dependence on simulators, statistically rigorous simulation methodologies are typically not used in computer architecture research. A formal methodology can provide a sound basis for drawing conclusions gathered from simulation results by adding statistical rigor and, consequently, can increase the architect's confidence in the simulation results. This paper demonstrates the application of a rigorous statistical technique to the setup and analysis phases of the simulation process. Specifically, we apply a Plackett and Burman design to: 1) identify key processor parameters, 2) classify benchmarks based on how they affect the processor, and 3) analyze the effect of processor enhancements. Our results showed that, out of the 41 user-configurable parameters in SimpleScalar, only 10 had a significant effect on the execution time. Of those 10, the number of reorder buffer entries and the L2 cache latency were the two most significant ones, by far. Our results also showed that instruction precomputation - a value reuse-like microarchitectural technique - primarily improves the processor's performance by relieving integer ALU contention. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and properties of a new pseudorandom generator based on a filtered FCSR automaton

    Page(s): 1374 - 1383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    Feedback with carry shift registers (FCSR) was introduced by Goresky and Klapper in 1993. It is similar to the classical linear feedback shift registers (LFSR) used in many pseudorandom generators. The main difference is that the elementary additions are not additions modulo 2 but with propagation of carries. The main problem for the use of an FCSR automaton is the fact that the generated sequences are predictable. In order to remove this weakness of FCSR-based generators, we propose filtering the state of the FCSR with a linear function. This method is efficient since the FCSR structure is not related to a linear property. This paper presents an extensive study of FCSR automata, a security analysis of our generator (concerning linear and 2-adic cryptanalysis, algebraic attack, correlation attack, etc.), and a practical example of parameters in order to design this generator. An important point concerning this generator is the fact that it is simple and efficient, both in hardware and software implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fair bandwidth sharing in distributed systems: a game-theoretic approach

    Page(s): 1384 - 1393
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (243 KB) |  | HTML iconHTML  

    Fair sharing of bandwidth remains an unresolved issue for distributed systems. In this paper, the users of a distributed LAN are modeled as selfish users with independence to choose their individual strategies. With these selfish users, the contention-based distributed medium access scenario is modeled as a complete-information, noncooperative game, designated the "access game". A novel MAC strategy based on p-persistent CSMA is presented to achieve fairness in the "access game". It is proven that there are an infinite number of nash equilibria for the "access game", but they do not result in fairness. Therefore, it may be beneficial for the selfish users to adhere to a set of constraints that result in fairness in a noncooperative fashion. This leads to the formulation of a constrained "access game" with fairness represented as a set of algebraic constraints. It is proven that the solution of the constrained game, the constrained nash equilibrium, is unique. Further, it is shown that, in addition to achieving fairness, this solution also optimizes the throughput. Finally, these results are extended to a more realistic incomplete-information scenario by approximating the incomplete-information scenario as a complete-information scenario through information gathering and dissemination. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network interface data caching

    Page(s): 1394 - 1408
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1824 KB) |  | HTML iconHTML  

    Network interface data caching reduces local interconnect traffic on network servers by caching frequently-requested content on a programmable network interface. The operating system on the host CPU determines which data to store in the cache and for which packets it should use data from the cache. To facilitate data reuse across multiple packets and connections, the cache only stores application-level response content (such as HTTP data), with application-level and networking headers generated by the host CPU. Network interface data caching reduces PCI traffic by 12-61 percent for six Web workloads on a prototype implementation of a uniprocessor Web server. This traffic reduction improves peak throughput for three workloads by 6-36 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant protocol for energy-efficient permutation routing in wireless networks

    Page(s): 1409 - 1421
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (936 KB) |  | HTML iconHTML  

    A wireless network (WN) is a distributed system where each node is a small hand-held commodity device called a station. Wireless sensor networks have received increasing interest in recent years due to their usage in monitoring and data collection in a wide variety of environments like remote geographic locations, industrial plants, toxic locations, or even office buildings. Two of the most important issues related to a WN are their energy constraints and their potential for developing faults. A station is usually powered by a battery which cannot be recharged while on a mission. Hence, any protocol run by a WN should be energy-efficient. Moreover, it is possible that all stations deployed as part of a WN may not work perfectly. Hence, any protocol designed for a WN should work well even when some of the stations are faulty. The permutation routing problem is an abstraction of many routing problems in a wireless network. In an instance of the permutation routing problem, each of the p-stations in the network is the sender and recipient of n/p packets. The task is to route the packets to their correct destinations. We consider the permutation routing problem in a single-hop wireless network, where each station is within the transmission range of all other stations. We design a protocol for permutation routing on a WN which is both energy efficient and fault tolerant. We present both theoretical estimates and extensive simulation results to show that our protocol is efficient in terms of energy expenditure at each node even when some of the nodes are faulty. Moreover, we show that our protocol is also efficient for the unbalanced permutation routing problem when each station is the sender and recipient of an unequal number of packets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounded-latency content distribution feasibility and evaluation

    Page(s): 1422 - 1437
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1464 KB) |  | HTML iconHTML  

    This paper investigates the performance of a content distribution network designed to provide bounded content access latency. Content can be divided into multiple classes with different configurable per-class delay bounds. The network uses a simple distributed algorithm to dynamically select subsets of its proxy servers for different classes such that a global per-class delay bound is achieved on content access. The content distribution algorithm is implemented and tested on PlanetLab, a world-wide distributed Internet testbed. Evaluation results demonstrate that, despite Internet delay variability, subsecond delay bounds (of 200-500 ms) can be guaranteed with a very high probability at only a moderate content replication cost. The distribution algorithm achieves a four to fivefold reduction in the number of response-time violations compared to prior content distribution approaches that attempt to minimize average latency. To the authors' knowledge, this paper presents the first wide-area performance evaluation of an algorithm designed to bound maximum content access latency, as opposed to optimizing an average performance metric. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Causality-based predicate detection across space and time

    Page(s): 1438 - 1453
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1912 KB) |  | HTML iconHTML  

    This paper presents event stream-based online algorithms that fuse the data reported from processes to detect causality-based predicates of interest. The proposed algorithms have the following features. 1) The algorithms are based on logical time, which is useful to detect "cause and effect" relationships in an execution. 2) The algorithms detect properties that can be specified using predicates under a rich palette of time modalities. Specifically, for a conjunctive predicate φ, the algorithms can detect the exact finegrained time modalities between each pair of intervals, one interval at each process, with low space, time, and message complexities. The main idea used to design the algorithms is that any "cause and effect" interaction can be decomposed as a collection of interactions between pairs of system components. The detection algorithms, which leverage the pairwise interaction among the processes, incur a low overhead and are, hence, highly scalable. The paper then shows how the algorithms can deal with mobility in mobile ad hoc networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new minimal average weight representation for left-to-right point multiplication methods

    Page(s): 1454 - 1459
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB) |  | HTML iconHTML  

    This paper introduces a new radix-2 representation with the same average weight as the width-w nonadjacent form (w-NAF). In both w-NAF and the proposed representations, each nonzero digit is an odd integer with absolute value less than M. However, for w-NAF, M is of the form 2w-1, while, for the proposed representation, it can be any positive integer. Therefore, using the proposed integer representation, we can use the available memory efficiently, which is attractive for devices with limited memory. Another advantage of the proposed representation over-w-NAF is that it can be obtained by scanning the bits from left-to-right. This property is also useful for memory-constrained devices because it can reduce both the time and space complexity of fast point multiplication techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An exact stochastic analysis of priority-driven periodic real-time systems and its approximations

    Page(s): 1460 - 1466
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (656 KB) |  | HTML iconHTML  

    This paper describes a stochastic analysis framework which computes the response time distribution and the deadline miss probability of individual tasks, even for systems with a maximum utilization greater than one. The framework is uniformly applied to fixed-priority and dynamic-priority systems and can handle, tasks with arbitrary relative deadlines and execution time distributions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Diagnosis of multiple hold-time and setup-time faults in scan chains

    Page(s): 1467 - 1472
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (752 KB) |  | HTML iconHTML  

    This paper presents a diagnosis technique to locate hold-time (HT) faults and setup-time (ST) faults in scan chains. This technique achieves deterministic diagnosis results by applying thermometer scan input (TSI) patterns, which have only one rising or one falling transition. With TSI patterns, the diagnosis patterns can be easily generated by existing single stuck-at fault test pattern generators with few modifications. In addition to the first fault, this technique diagnoses remaining faults by applying thermometer scan input with padding (TSIP) patterns. For the benchmark circuits (up to 6.6 K scan cells), experiments show that the diagnosis resolutions are no worse than 15, even in the presence of multiple faults in a scan chain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (74 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au