By Topic

Computers, IEEE Transactions on

Issue 6 • Date June 2012

Filter Results

Displaying Results 1 - 18 of 18
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (169 KB)  
    Freely Available from IEEE
  • ST-CDP: Snapshots in TRAP for Continuous Data Protection

    Publication Year: 2012 , Page(s): 753 - 766
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1628 KB) |  | HTML iconHTML  

    Continuous Data Protection (CDP) has become increasingly important as digitization continues. This paper presents a new architecture and an implementation of CDP in Linux kernel. The new architecture takes advantages of both traditional snapshot technology and recent Timely Recovery to Any Point-in-time (TRAP) architecture [CHECK END OF SENTENCE]. The idea is to periodically insert snapshots within the parity logs of changed data blocks in order to ensure fast and reliable data recovery in case of failures. A mathematical model is developed as a guide to designers to determine when and how to insert snapshots to optimize performance in terms of space usage and recovery time. Based on the mathematical model, we have designed and implemented a CDP module in the Linux system. Our implementation is at block level as a device driver that is capable of recovering data to any point-in-time in case of various failures. Extensive experiments have been carried out to show that the implementation is fairly robust and numerical results demonstrate that the implementation is efficient. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Reliability Wall for Exascale Supercomputing

    Publication Year: 2012 , Page(s): 767 - 779
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1357 KB) |  | HTML iconHTML  

    Reliability is a key challenge to be understood to turn the vision of exascale supercomputing into reality. Inevitably, large-scale supercomputing systems, especially those at the peta/exascale levels, must tolerate failures, by incorporating fault-tolerance mechanisms to improve their reliability and availability. As the benefits of fault-tolerance mechanisms rarely come without associated time and/or capital costs, reliability will limit the scalability of parallel applications. This paper introduces for the first time the concept of "Reliability Wall” to highlight the significance of achieving scalable performance in peta/exascale supercomputing with fault tolerance. We quantify the effects of reliability on scalability, by proposing a reliability speedup, defining quantitatively the reliability wall, giving an existence theorem for the reliability wall, and categorizing a given system according to the time overhead incurred by fault tolerance. We also generalize these results into a general reliability speedup/wall framework by considering not only speedup but also costup. We analyze and extrapolate the existence of the reliability wall using two representative supercomputers, Intrepid and ASCI White, both employing checkpointing for fault tolerance, and have also studied the general reliability wall using Intrepid. These case studies provide insights on how to mitigate reliability-wall effects in system design and through hardware/software optimizations in peta/exascale supercomputing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Polynomial Multiplication in Chebyshev Basis

    Publication Year: 2012 , Page(s): 780 - 789
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (748 KB) |  | HTML iconHTML  

    In a recent paper, Lima, Panario, and Wang have provided a new method to multiply polynomials expressed in Chebyshev basis which reduces the total number of multiplication for small degree polynomials. Although their method uses Karatsuba's multiplication, a quadratic number of operations are still needed. In this paper, we extend their result by providing a complete reduction to polynomial multiplication in monomial basis, which therefore offers many subquadratic methods. Our reduction scheme does not rely on basis conversions and we demonstrate that it is efficient in practice. Finally, we show a linear time equivalence between the polynomial multiplication problem under monomial basis and under Chebyshev basis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Radix-2 Multioperand and Multiformat Streaming Online Addition

    Publication Year: 2012 , Page(s): 790 - 803
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1911 KB) |  | HTML iconHTML  

    In this paper, we present multioperand radix-2 online addition using different data representations (signed-digit, two's complement, and carry-save), in particular cases in which operands with different representations are added. We use the previously defined online full adder (olFA) as a component to build different multioperand online architectures. To merge data with different representations, an inner conversion of data is performed, eliminating any conversion stage and penalty time. We propose a technique to build multioperand trees efficiently and give six practical rules to deal with different kinds of data in the same adder. For addition of a stream of data, we determine the minimum number of separation cycles required to isolate two successive computations and propose a novel hardware technique that eliminates completely the separation cycles, resulting in the maximum throughput possible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines

    Publication Year: 2012 , Page(s): 804 - 816
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1247 KB) |  | HTML iconHTML  

    This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Locality-Sensitive Bloom Filter for Approximate Membership Query

    Publication Year: 2012 , Page(s): 817 - 830
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1118 KB) |  | HTML iconHTML  

    In many network applications, Bloom filters are used to support exact-matching membership query for their randomized space-efficient data structure with a small probability of false answers. In this paper, we extend the standard Bloom filter to Locality-Sensitive Bloom Filter (LSBF) to provide Approximate Membership Query (AMQ) service. We achieve this by replacing uniform and independent hash functions with locality-sensitive hash functions. Such replacement makes the storage in LSBF to be locality sensitive. Meanwhile, LSBF is space efficient and query responsive by employing the Bloom filter design. In the design of the LSBF structure, we propose a bit vector to reduce False Positives (FP). The bit vector can verify multiple attributes belonging to one member. We also use an active overflowed scheme to significantly decrease False Negatives (FN). Rigorous theoretical analysis (e.g., on FP, FN, and space overhead) shows that the design of LSBF is space compact and can provide accurate response to approximate membership queries. We have implemented LSBF in a real distributed system to perform extensive experiments using real-world traces. Experimental results show that LSBF, compared with a baseline approach and other state-of-the-art work in the literature (SmartStore and LSB-tree), takes less time to respond AMQ and consumes much less storage space. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Parallel Hardware Architecture for Real-Time Object Detection with Support Vector Machines

    Publication Year: 2012 , Page(s): 831 - 842
    Cited by:  Papers (8)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (802 KB) |  | HTML iconHTML  

    Object detection applications are often associated with real-time performance constraints that stem from the embedded environment that they are often deployed in. Consequently, researchers have proposed dedicated hardware architectures, utilizing a variety of classification algorithms targeting object detection. Support Vector Machines (SVMs) is among the most popular classification algorithms used in object detection yielding high accuracy rates. However, existing SVM hardware implementations attempting to speed up SVM classification, have either targeted only simple applications, or SVM training. As such, there are limited proposed hardware architectures that are generic enough to be used in a variety of object detection applications. Hence, this paper presents a parallel array architecture for SVM-based object detection, in an attempt to show the advantages, and performance benefits that stem from a dedicated hardware solution. The proposed hardware architecture provides parallel processing, resource sharing among the processing units, and efficient memory management. Furthermore, the size of the array is scalable to the hardware demands, and can also handle a variety of applications such as multiclass classification problems. A prototype of the proposed architecture was implemented on an FPGA platform and evaluated using three popular detection applications, demonstrating real-time performance (40-122 fps for a variety of applications). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model Checking Prioritized Timed Systems

    Publication Year: 2012 , Page(s): 843 - 856
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1745 KB)  

    Real-time systems modeled by timed automata are often symbolically verified using Difference Bound Matrix (DBM) and Binary Decision Diagram (BDD) operations. When designing concurrent real-time systems with two or more processes sharing a resource, priorities are often used to schedule processes and to resolve conflicting resource requests. Concurrent real-time systems can thus be modeled by timed automata with priorities. However, model checking timed automata with priorities needs the DBM subtraction operation, whose result may not be convex, i.e., DBMs are not closed under subtraction. Thus, a partition of the resulting DBM is required. In this work, we propose Prioritized Timed Automata (PTA) and resolve all the issues related to the model checking of PTA. Two algorithms are proposed including an optimal DBM subtraction algorithm that produces the minimal number of DBM partitions, and a DBM merging algorithm that reduces the DBM partitions after a series of DBM subtractions. Application examples show the advantages of the proposed method in terms of support for the efficient verification of prioritized timed systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NoC-Based Hardware Accelerator for Breakpoint Phylogeny

    Publication Year: 2012 , Page(s): 857 - 869
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1434 KB) |  | HTML iconHTML  

    Maximum Parsimony phylogenetic tree reconstruction is based on finding the breakpoint median, given a set of species, and is represented by a bounded edge-weight graph model. This reduces the breakpoint median problem to one of solving multiple instances of the Traveling Salesman Problem (TSP), which is a classical NP-complete problem in graph theory. Exponential time algorithms that apply efficient runtime heuristics, such as branch-and-bound, to dynamically prune the search space are used to solve TSP. In this paper, we present the design and performance evaluation of a network-on-chip (NoC)-based implementation for solving TSP under the bounded edge-weight model, as used in the computation of breakpoint phylogeny. Our approach takes advantage of fine-grain parallelism from the multiple processing elements (PEs) and uses efficient NoC architecture for inter-PE communication. To accelerate the application on hardware, our PE design optimizes a particular lower bound calculation operation which typically tends to be the serial bottleneck in computation of a TSP solution. We also explore two representative NoC architectures-mesh and quad-tree-and show that the latter is more energy-efficient for this application domain. Experimental results show that this new implementation is able to achieve speedups of up to three orders of magnitude over state-of-the-art multithreaded software implementations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated Generation of Performance and Dependability Models for the Assessment of Wireless Sensor Networks

    Publication Year: 2012 , Page(s): 870 - 884
    Cited by:  Papers (8)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2375 KB) |  | HTML iconHTML  

    Wireless Sensor Networks (WSNs) are widely recognized as a promising solution to build next-generation monitoring systems. Their industrial uptake is however still compromised by the low level of trust on their performance and dependability. Whereas analytical models represent a valid mean to assess nonfunctional properties via simulation, their wide use is still limited by the complexity and dynamicity of WSNs, which lead to unaffordable modeling costs. To reduce this gap between research achievements and industrial development, this paper presents a framework for the assessment of WSNs based on the automated generation of analytical models. The framework hides modeling details, and it allows designers to focus on simulation results to drive their design choices. Models are generated starting from a high-level specification of the system and by a preliminary characterization of its fault-free behavior, using behavioral simulators. The benefits of the framework are shown in the context of two case studies, based on the wireless monitoring of civil structures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GPS-Free, Boundary-Recognition-Free, and Reliable Double-Ruling-Based Information Brokerage Scheme in Wireless Sensor Networks

    Publication Year: 2012 , Page(s): 885 - 898
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1867 KB) |  | HTML iconHTML  

    We study the information brokerage schemes in wireless sensor networks, which allow consumers to obtain data from producers by replicating and retrieving data in a certain set of sensors, and propose a novel information brokerage scheme, termed RDRIB. Unlike existing information brokerage schemes, RDRIB guarantees successful data retrieval without using any boundary detection algorithm and the geographic location information acquired by the global positioning system (GPS). In RDRIB, the double-ruling technique is used to replicate and retrieve the data within a constructed virtual boundary, and simulations show that RDRIB has good performance in terms of the replication memory overhead, the replication message overhead, the retrieval message overhead, the retrieval latency, and the construction message overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concatenation of Functional Test Subsequences for Improved Fault Coverage and Reduced Test Length

    Publication Year: 2012 , Page(s): 899 - 904
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (570 KB) |  | HTML iconHTML  

    Functional test sequences have several advantages over structural tests when they are applied at-speed. A large pool of functional test sequences may be available for a circuit due to the application of a simulation-based design verification process. This paper describes a versatile procedure that uses a pool of functional test sequences as a basis for forming a single compact functional test sequence that achieves the same or higher gate-level fault coverage than the given pool. The procedure extracts test subsequences from the test sequences in the pool and concatenates them to form a single test sequence. It also employs an enhanced static test compaction process aimed at improving the fault coverage in addition to reducing the test sequence length. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Identification of Undetectable Transition Faults under Functional Broadside Tests

    Publication Year: 2012 , Page(s): 905 - 910
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (506 KB) |  | HTML iconHTML  

    This paper describes a fast procedure for identifying undetectable transition faults under functional broadside tests. By using reachable states as scan-in states, functional broadside tests avoid overtesting that may occur when scan-based tests are used for detecting delay faults. The proposed procedure is based only on logic simulation, and does not perform test generation of any type. In one of its variations, the procedure uses logic simulation of fully unspecified primary input vectors starting from a known initial state in order to identify a superset of broadside tests that covers all the functional broadside tests. It then uses this superset to identify undetectable transition faults. The procedure identifies large numbers of undetectable transition faults in certain benchmark circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Note on Diagnosability of Large Fault Sets on Star Graphs

    Publication Year: 2012 , Page(s): 911 - 912
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (139 KB) |  | HTML iconHTML  

    Diagnosability of a system directly refers to the maximum number of faulty vertices that can be identified by the system. Somani et al. [2] proposed a generalized measure to increase the degree of diagnosability of the hypercubes and star graphs. This paper provides counterexamples for the results of diagnosability of star graphs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Inside back cover]

    Publication Year: 2012 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (169 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2012 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org