By Topic

Computers, IEEE Transactions on

Issue 11 • Date Nov. 2011

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (148 KB)  
    Freely Available from IEEE
  • Multithreading in Java: Performance and Scalability on Multicore Systems

    Page(s): 1521 - 1534
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1969 KB) |  | HTML iconHTML  

    The performance and scalability issues of multithreaded Java programs on multicore systems are studied in this paper. First, we examine the performance scaling of benchmarks with various numbers of processor cores and application threads. Second, by correlating low-level hardware performance data to JVM threads and system components, the detail analyses of performance and scalability are presented, such as the hardware stall events and memory system latencies. Third, the usages of memory resource are detailed to observe the potential bottlenecks. Finally, the JVM tuning techniques are proposed to alleviate the bottlenecks, and improve the performance and scalability. Several key findings are revealed through this study. First, the lock contentions usually lead to a strong limitation of scalability. Second, in terms of memory access latencies, the most of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Finally, the overhead of minor garbage collections could be an important factor of throughput reductions. Based on these findings, the appropriate Java Virtual Machine (JVM) tuning techniques are examined in this study. We observe that the use of a parallel garbage collector and an appropriate ratio of young to old generation can alleviate the overhead of minor collection and improve the efficiency of garbage collections. Moreover, the cache utilizations could be enhanced with the use of thread-local allocation buffer, and then leads to the performance improvements significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable Latency Goldschmidt Algorithm Based on a New Rounding Method and a Remainder Estimate

    Page(s): 1535 - 1546
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1286 KB) |  | HTML iconHTML  

    A new variable latency Goldschmidt algorithm is presented. The algorithm is based on a new rounding method for division, square root, and their reciprocals that avoids the conventional remainder calculation in most of cases and improves previous proposals. The rounding decision is taken by checking the least significant bits of the output of the last Goldschmidt iteration without any other transformation. This helps to reduce the number of cases which need the calculation of the remainder. Additionally, we avoid the calculation of the remainder for most of those cases by using a remainder estimate that can be easily obtained from the Goldschmidt iteration. The calculation of the estimate is much simpler and less time consuming than the calculation of the remainder and this contributes to reducing the number of cases which need a large latency. The combination of both techniques allows us to define a variable latency algorithm which needs to compute the remainder in just nine percent of the total number of cases for reciprocal and division and in 12 percent for square root and square root reciprocal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores

    Page(s): 1547 - 1560
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1496 KB) |  | HTML iconHTML  

    In this work, we investigate state-retentive power gating of register files for leakage reduction in multicore processors supporting multithreading. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state through power gating for leakage reduction. When the memory stall gets resolved, the register file is activated for being accessed again. Since the contents of the register file are not lost and restored on wakeup, this is referred to as state-retentive power gating of register files. While state-retentive power gating in single cores has been studied in the literature, it is being investigated for multicore architectures for the first time in this work. We propose specific techniques to implement state-retentive power gating for three different multicore processor configurations based on the multithreading model: 1) coarse-grained multithreading, 2) fine-grained multithreading, and 3) simultaneous multithreading. The proposed techniques can be implemented as design extensions within the control units of the in-order cores. Each technique uses two different modes of leakage states: low-leakage savings and low wake-up and high-leakage savings and high wake-up latency. The overhead due to wake-up latency is completely avoided in two techniques while it is hidden for most part in the third approach, either by overlapping the wake-up process with the thread context switching latency or by executing instructions from other threads ready for execution. The proposed techniques were evaluated through simulations with multiprogrammed workloads comprised of SPEC 2000 integer benchmarks. Experimental results show that in an 8-core processor executing 64 threads, the average leakage savings were 42 percent in coarse-grained multithreading, while they were between seven percent and eight percent for finegrained and simultaneous multithreading. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RRR: Rapid Ring Recovery Submillisecond Decentralized Recovery for Ethernet Ring

    Page(s): 1561 - 1570
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1862 KB) |  | HTML iconHTML  

    Ethernet is the indisputable de facto technology for local area networks due to its simplicity, low cost, and wide-scale adoption. In recent times, Ethernet has entered new networking areas, such as Metro Area Network (MAN) and Industrial Area Network (IAN), where specialized protocols dominate the market. In addition to the well known advantages, Ethernet acts as the common platform to integrate multiple protocols. However, Ethernet falls short of the stringent resilience requirements mandated by applications in MEN and IAN, despite progress made by the community on additional standardization. We describe a new approach for swift failure detection and recovery in Ethernet ring topologies called Rapid Ring Recovery (RRR). RRR is based on the novel usage of multiple virtual rings. Our implementation augmenting an off-the-shelf Ethernet switch shows that RRR reconverges after a fault in 294 microseconds while sustaining the loss of only eight large frames at 95 percent traffic load. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Succinct Greedy Geometric Routing Using Hyperbolic Geometry

    Page(s): 1571 - 1580
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (449 KB) |  | HTML iconHTML  

    We describe a method for performing greedy geometric routing for any n-vertex simple connected graph G in the hyperbolic plane, so that a message M between any pair of vertices may be routed by having each vertex that receives M pass it to a neighbor that is closer to M's destination. Our algorithm produces succinct embeddings, where vertex positions are represented using O(log n) bits and distance comparisons may be performed efficiently using these representations. These properties are useful, for example, for routing in sensor networks, where storage and bandwidth are limited. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pipelined Statistical Cipher Feedback: A New Mode for High-Speed Self-Synchronizing Stream Encryption

    Page(s): 1581 - 1595
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1325 KB) |  | HTML iconHTML  

    In this paper, we introduce a new block cipher mode of operation targeted to providing high-speed hardware-based self-synchronizing stream encryption. The proposed mode is a modification of statistical cipher feedback (SCFB) mode and is designed to be implemented using pipeline architectures for the block cipher. We refer to the mode as pipelined SCFB mode or PSCFB. In this paper, we consider the implementation characteristics and show that PSCFB is able to achieve speeds that are very close to pipelined block cipher implementations configured for counter mode. Such speeds are achieved with modest latency through the system and a small amount of memory required for the system queues with a provable guarantee of no queue overflow. Further, we examine the characteristics of PSCFB mode in response to bit errors and synchronization losses in the communication channel. Specifically, we show that the error propagation factor is modest and comparable to conventional SCFB and that synchronization recovery delay is reasonable given the expectation that synchronization loss is infrequent. Given the high efficiency and good communication characteristics of the mode, it is concluded that PSCFB is an excellent choice for high-speed network applications requiring stream-oriented encryption with self-synchronizing capabilities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • String Searching Engine for Virus Scanning

    Page(s): 1596 - 1609
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (943 KB) |  | HTML iconHTML  

    A memory-efficient hardware string searching engine for antivirus applications is presented. The proposed QSV method is based on quick sampling of the input stream against fixed-length pattern prefixes, and on-demand verification of variable-length pattern suffixes. Patterns handled by the QSV method are required to have at least 16 bytes, and possess distinct 16-byte prefixes. The latter requirement can be fulfilled by a preprocessing procedure. The search engine uses the pipelined Aho-Corasick (P-AC) architecture developed by the first author to process 4 to 15-byte short patterns and a small number of exception cases. Our design was evaluated using the ClamAV virus database having 82,888 strings with a total size that exceeds 8 Mbyte. In terms of byte count, 99.3 percent of the pattern set is handled by the QSV method and 0.7 percent of the pattern set is handled by P-AC. A pattern with distinct 16-byte prefix only occupies up to three lookup table entries in QSV. The overall memory cost of our system is about 1.4 Mbyte, i.e., 1.4 bit per character of the ClamAV pattern set. The proposed method is memory-based, hence, updates to the pattern set can be accommodated by modifying the contents of the lookup tables without reconfiguring the hardware circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Dead-End Free Topology Maintenance Protocol for Geographic Forwarding in Wireless Sensor Networks

    Page(s): 1610 - 1621
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1653 KB) |  | HTML iconHTML  

    Minimizing energy consumption is a fundamental requirement when deploying wireless sensor networks. Accordingly, various topology control protocols have been proposed, which aim to conserve energy by turning off unnecessary sensors while simultaneously preserving a constant level of routing fidelity. However, although these protocols can generally be integrated with any routing scheme, few of them take specific account of the issues which arise when they are integrated with geographic routing mechanisms. Of these issues, the dead-end situation is a particular concern. The dead-end phenomenon (also known as the "local maximum” problem) poses major difficulties when performing geographic forwarding in wireless sensor networks since whenever a packet encounters a dead end, additional overheads must be paid to forward the packet to the destination via an alternative route. This paper presents a distributed dead-end free topology maintenance protocol, designated as DFTM, for the construction of dead-end free networks using a minimum number of active nodes. The performance of DFTM is compared with that of the conventional topology maintenance schemes GAF and Span, in a series of numerical simulations conducted using the ns2 simulator. The evaluation results reveal that DFTM significantly reduced the number of active nodes required in the network and thus prolonged the overall network lifetime. DFTM also successfully constructed a dead-end free topology in most of the simulated scenarios. Additionally, even when the locations of the sensors were not precisely known, DFTM still ensured that no more than a very few dead-end events occurred during packet forwarding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Autonomous Passive Localization Algorithm for Road Sensor Networks

    Page(s): 1622 - 1637
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4269 KB)  

    Road networks are one of important surveillance areas in military scenarios. In these road networks, sensors will be sparsely deployed (hundreds of meters apart) for the cost-effective deployment. This makes the existing localization solutions based on the ranging ineffective. To address this issue, this paper introduces a novel approach based on the passive vehicular traffic measurement, called Autonomous Passive Localization (APL). Our work is inspired by the fact that vehicles move along routes with a known map. Using binary vehicle-detection time stamps, we can obtain distance estimates between any pair of sensors on roadways to construct a virtual graph composed of sensor identifications (i.e., vertices) and distance estimates (i.e., edges). The virtual graph is then matched with the topology of the road map, in order to identify where sensors are located on roadways. We evaluate our design outdoors on Minnesota roadways and show that our distance estimate method works well despite traffic noises. In addition, we show that our localization scheme is effective in a road network with 18 intersections, where we found no location matching error, even with a maximum sensor time synchronization error of 0.07 sec and a vehicle speed deviation of 10 km/h. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Efficient Task Management in Wireless Sensor Networks

    Page(s): 1638 - 1651
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1121 KB)  

    In numerous applications of wireless sensor networks (WSN), the reliability of the data collected by sensors is cast as specific QoS requirements expressed in terms of the minimum number of sensors needed to perform various tasks. Designing a long-lived sensor network with reliable performance has always been challenging due to the modest nonrenewable energy budget of individual sensors. In such a context, energy-unaware task management protocols may result in uneven expenditure of sensor energy by assigning uneven workloads to sensors. This, in turn, often translates into reduced sensor density around those heavily loaded sensors and may, eventually, lead to the creation of energy holes that partition the network into disconnected islands. To avoid these problems and to promote network longevity, we propose two energy-aware task management protocols: our first protocol is centralized, while the second one is fully distributed. The proposed protocols assign tasks to sensors based on their remaining energy so that energy expenditure among neighboring sensors is almost even. We compare the reliable lifetime of the network achieved by assigning tasks to sensors using the proposed protocols against optimal task assignment and also against energy-unaware protocols. Extensive simulation results have revealed that the performance of the proposed protocols is very close to that of the optimal task assignment. Furthermore, our simulation has shown that the proposed protocols can increase the functional longevity of the network by about 16 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational Vector-Magnitude-Based Range Determination for Scientific Abstract Data Types

    Page(s): 1652 - 1663
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1321 KB) |  | HTML iconHTML  

    As interest mounts in using hardware accelerators to speed up numerical scientific calculations, automation tool support is required to aid designers in mapping applications to custom hardware. One key step in designing this custom hardware is bit-width allocation where the known-art faces challenges when dealing with applications from the scientific computing domain, thus motivating the use of computational methods based on Satisfiability-Modulo Theory. Many real-life applications are, however, specified in terms of vectors and matrices which are of sufficient size to make expansion into scalar equations infeasible. The proposed vector-magnitude method and its extension via block vectors enable computational methods to be leveraged in tackling calculations of practically relevant complexity. Application to case studies confirms that through a more compact computational instance, search efficiency is improved leading to tighter bounds and thus smaller bit-widths. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Transactions on Computers Essential Sets Available

    Page(s): 1664
    Save to Project icon | Request Permissions | PDF file iconPDF (157 KB)  
    Freely Available from IEEE
  • Cover3

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (148 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au