Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). We apologize for the inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 5 • Date May 1 2015

Filter Results

Displaying Results 1 - 23 of 23
  • Accurate and Efficient Estimation of Logic Circuits Reliability Bounds

    Publication Year: 2015 , Page(s): 1217 - 1229
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1483 KB) |  | HTML iconHTML  

    As the sizes of CMOS devices rapidly scale deep into the nanometer range, the manufacture of nanocircuits will become extremely complex and will inevitably introduce more defects, including more transient faults that appear during operation. For this reason, accurately calculating the reliability of future designs will be extremely critical for nanocircuit designers as they investigate design alternatives to optimize the tradeoffs between area-power-delay and reliability. However, accurate calculation of the reliability of large and highly connected circuits is complex and very time consuming. This paper presents a complete solution for estimating logic circuit reliability bounds with high accuracy in reasonable time, even for very large and complex circuits. The solution combines a novel criticality scoring algorithm to rank the reliability of individual input vectors with a heuristic search to find the input vector having the lowest reliability. The solution scales well with circuit size, and is independent of the interconnect complexity or the logic depth. Extensive computational results show that the speed of our method is orders of magnitude faster than exact solutions provided by Bayesian network exact inferences, while maintaining identical or sufficiently close accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Address Scrambling and Data Inversion Techniques for Yield Enhancement of NROM-Based ROMs

    Publication Year: 2015 , Page(s): 1230 - 1240
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1734 KB) |  | HTML iconHTML  

    Address scrambling and data inversion techniques are proposed for yield enhancement of NROM-based ROMs in this paper. Besides using the conventional fault replacement techniques, fault masking effects are also exploited to further improve the fabrication yield and reduce the amount of extra spare rows/columns. That is, we consider the logical effects of physical defects when the customer's code is to be programmed. A novel test and repair flow is also proposed. Based on the proposed techniques, possibilities of fault masking can be maximized. A row/column scrambling control word and a control column are used for the control of the scrambling techniques and the data inversion technique, respectively. The problem for determining the control word can be modeled with a bipartite graph. The proposed test and repair techniques can be easily incorporated into the ROM BIST architectures. This makes the proposed techniques more practical to be integrated into current design flow. According to experimental results, the fabrication yield can be improved significantly. Moreover, the incurred hardware overhead and timing penalty are almost negligible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Deadline-Floor Inheritance Protocol for EDF Scheduled Embedded Real-Time Systems with Resource Sharing

    Publication Year: 2015 , Page(s): 1241 - 1253
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (386 KB) |  | HTML iconHTML  

    Earliest Deadline First (EDF) is the most widely studied optimal dynamic scheduling algorithm for uniprocessor real-time systems. For realistic programs, tasks must be allowed to exchange data and use other forms of resources that must be accessed under mutual exclusion. With EDF scheduled systems, access to such resources is usually controlled by the use of Baker's Stack Resource Protocol (SRP). In this paper we propose an alternative scheme based on deadline inheritance. Shared resources are assigned a relative deadline equal to the minimum (floor) of the relative deadlines of all tasks that use the resource. On entry to the resource a task's current absolute deadline is subject to an immediately reduction to reflect the resource's deadline floor. On exit the original deadline for the task is restored. We show that the worst-case behaviour of the new protocol (termed DFP-Deadline Floor inheritance Protocol) is the same as SRP. Indeed it leads to the same blocking term in the scheduling analysis. We argue that the new scheme is however more intuitive, removes the need to support preemption levels and we demonstrate that it can be implemented more efficiently. View full abstract»

    Open Access
  • A Hardware Scheduler Based on Task Queues for FPGA-Based Embedded Real-Time Systems

    Publication Year: 2015 , Page(s): 1254 - 1267
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1852 KB) |  | HTML iconHTML  

    A hardware scheduler is developed to improve real-time performance of soft-core processor based computing systems. A hardware scheduler typically accelerates system performance at the cost of increased hardware resources, inflexibility and integration difficulty. However, the reprogrammability of FPGA-based systems removes the problems of inflexibility and integration difficulty. This paper introduces a new task-queue architecture to better support practical task controls and maintain good resource scaling. The scheduler can be configured to support various algorithms such as time sliced priority scheduling, Earliest Deadline First and Least Slack Time. The hardware scheduler reduces scheduling overhead by more than 1,000 clock cycles and raises the system utilization bound by a maximum 19.2 percent. Scheduling jitter is reduced from hundreds of clock cycles in software to just two or three cycles for most operations. The additional resource cost is no more than 17 percent of a typical softcore system for a small scale embedded application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Analytical Framework for Evaluating the Error Characteristics of Approximate Adders

    Publication Year: 2015 , Page(s): 1268 - 1281
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1596 KB) |  | HTML iconHTML  

    Approximate adders have been considered as a potential alternative for error-tolerant applications to trade off some accuracy for gains in other circuit-based metrics, such as power, area and delay. Existing approximate adder designs have shown substantial advantages in improving many of these operational features. However, the error characteristics of the approximate adders still remain an issue that is not very well understood. A simulation-based method requires both programming efforts and a time-consuming execution for evaluating the effect of errors. This method becomes particularly expensive when dealing with various sizes and types of approximate adders. In this paper, a framework based on analytical models is proposed for evaluating the error characteristics of approximate adders. Error features such as the error rate and the mean error distance are obtained using this framework without developing functional models of the approximate adders for time-consuming simulation. As an example, the estimate of peak signal-to-noise ratios (PSNRs) in image processing is considered to show the potential application of the proposed analysis. This analytical framework provides an efficient method to evaluate various designs of approximate adders for meeting different figures of merit in error-tolerant applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Promise of Realizable, Ultra-Scalable Communications at Nano-Scale:A Multi-Modal Nano-Machine Architecture

    Publication Year: 2015 , Page(s): 1282 - 1295
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1739 KB) |  | HTML iconHTML  

    Wireless networks of nano-nodes will play a critical role in future medical, quality control, environmental monitoring and military applications. Nano-nodes are invisible/marginally visible to the human eye, ranging in size from approximately 100 μm to few nanometers. Nano-networking poses unique challenges, requiring ground-breaking solutions. First, the nano-scale imposes severe restrictions to the computational and communication capabilities of the nodes. Second, nano-nodes are not accessible for programming, configuration and debugging in the classical sense. Thus, a nano-network should be self-configuring, resilient and adaptive to environmental changes. Finally, all nano-networking protocols should be ultra-scalable, since a typical nano-network may comprise billions of nodes. The study contributes a novel paradigm for data dissemination in networking nano-machines, addressing these unique challenges. Relying on innovative analytical results on lattice algebra and nature-inspired processes, a novel data dissemination method is proposed. The nano-nodes exploit their environmental feedback and mature adaptively into network backbone or remain single network users. Such a process can be implemented as an ultra-scalable, low complexity, multi-modal nano-node architecture (physical layer), providing efficient networking and application services at the same time. Requiring existing manufacturing technology, the proposed architecture constitutes the first candidate solution for realizable nano-networking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture Support for Task Out-of-Order Execution in MPSoCs

    Publication Year: 2015 , Page(s): 1296 - 1310
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (2000 KB) |  | HTML iconHTML  

    Multi-processor system on chip (MPSoC) has been widely applied in embedded systems in the past decades. However, it has posed great challenges to efficiently design and implement a rapid prototype for diverse applications due to heterogeneous instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, this paper proposes a novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs. The architecture support is composed of a hierarchical middleware with an automatic task level OoO parallel execution engine. Incorporated with a hierarchical OoO layer model, the middleware is able to identify the parallel regions and generate the sources codes automatically. Besides, a runtime middleware Task-Scoreboarding analyzes the inter-task data dependencies and automatically schedules and dispatches the tasks with parameter renaming techniques. The middleware has been verified by the prototype built on FPGA platform. Examples and a JPEG case study demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism. View full abstract»

    Open Access
  • A Stochastic Model for Estimating the Power Consumption of a Processor

    Publication Year: 2015 , Page(s): 1311 - 1322
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1247 KB) |  | HTML iconHTML  

    Quantitatively estimating the relationship between the workload and the corresponding power consumption of a multicore processor is an essential step towards achieving energy proportional computing. Most existing and proposed approaches use Performance Monitoring Counters (Hardware Monitoring Counters) for this task. In this paper we propose a complementary approach that employs the statistics of CPU utilization (workload) only. Hence, we model the workload and the power consumption of a multicore processor as random variables and exploit the monotonicity property of their distribution functions to establish a quantitative relationship between the random variables. We will show that for a single-core processor the relationship is best approximated by a quadratic function whereas for a dualcore processor, the relationship is best approximated by a linear function. We will demonstrate the plausibility of our approach by estimating the power consumption of both custom-made and standard benchmarks (namely, the SPEC power benchmark and the Apache benchmarking tool) for an Intel and AMD processors. View full abstract»

    Open Access
  • A Unified Framework for Line-Like Skeleton Extraction in 2D/3D Sensor Networks

    Publication Year: 2015 , Page(s): 1323 - 1335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1224 KB) |  | HTML iconHTML  

    In sensor networks, skeleton extraction has emerged as an appealing approach to support many applications such as load-balanced routing and location-free segmentation. While significant advances have been made for 2D cases, so far skeleton extraction for 3D sensor networks has not been thoroughly studied. In this paper, we conduct the first work of a unified framework providing a connectivity-based and distributed solution for line-like skeleton extraction in both 2D and 3D sensor networks. We highlight its practice as: 1) it has linear time/message complexity; 2) it provides reasonable skeleton results when the network has low node density; 3) the obtained skeletons are robust to shape variations, node densities, boundary noise and communication radio model. In addition, to confirm the effectiveness of the line-like skeleton, a 3D routing scheme is derived based on the extracted skeleton, which achieves balanced traffic load, guaranteed delivery, as well as low stretch factor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CloudGenius: A Hybrid Decision Support Method for Automating the Migration of Web Application Clusters to Public Clouds

    Publication Year: 2015 , Page(s): 1336 - 1348
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB) |  | HTML iconHTML  

    With the increase in cloud service providers, and the increasing number of compute services offered, a migration of information systems to the cloud demands selecting the best mix of compute services and virtual machine (VM ) images from an abundance of possibilities. Therefore, a migration process for web applications has to automate evaluation and, in doing so, ensure that Quality of Service (QoS) requirements are met, while satisfying conflicting selection criteria like throughput and cost. When selecting compute services for multiple connected software components, web application engineers must consider heterogeneous sets of criteria and complex dependencies across multiple layers, which is impossible to resolve manually. The previously proposed CloudGenius framework has proven its capability to support migrations of single-component web applications. In this paper, we expand on the additional complexity of facilitating migration support for multi-component web applications. In particular, we present an evolutionary migration process for web application clusters distributed over multiple locations, and clearly identify the most important criteria relevant to the selection problem. Moreover, we present a multi-criteria-based selection algorithm based on Analytic Hierarchy Process (AHP). Because the solution space grows exponentially, we developed a Genetic Algorithm (GA)-based approach to cope with computational complexities in a growing cloud market. Furthermore, a use case example proofs CloudGenius' applicability. To conduct experiments, we implemented CumulusGenius, a prototype of the selection algorithm and the GA deployable on hadoop clusters. Experiments with CumulusGenius give insights on time complexities and the quality of the GA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Implementation of a Journaling File System for Phase-Change Memory

    Publication Year: 2015 , Page(s): 1349 - 1360
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1983 KB) |  | HTML iconHTML  

    Journaling file systems are widely used in modern computer systems as they provide high reliability at reasonable cost. However, existing journaling file systems are not efficient for emerging PCM (phase-change memory) storage because they are optimized for hard disks. Specifically, the large amount of data that they write during journaling degrades the performance of PCM storage seriously as it has a long write latency. In this paper, we present a new journaling file system for PCM, called Shortcut-JFS, that reduces write traffic to PCM by more than half of existing journaling file systems running on block I/O interfaces. To do this, we devise two novel schemes that can be used under byte-addressable I/O interfaces: 1) differential logging that journals only the modified part of a block and 2) in-place checkpointing that eliminates the overhead of block copying. We implement Shortcut-JFS on Linux 2.6.32 and measure the performance of Shortcut-JFS compared to those of existing journaling and log-structured file systems. The results show that the performance improvement of Shortcut-JFS against Ext4 and LFS is 54 and 96 percent, respectively, on average. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • End-to-End Communication Delay Analysis in Industrial Wireless Networks

    Publication Year: 2015 , Page(s): 1361 - 1374
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (929 KB) |  | HTML iconHTML  

    WirelessHART is a new standard specifically designed for real-time and reliable communication between sensor and actuator devices for industrial process monitoring and control applications. End-to-end communication delay analysis for WirelessHART networks is required to determine the schedulability of real-time data flows from sensors to actuators for the purpose of acceptance test or workload adjustment in response to network dynamics. In this paper, we consider a network model based on WirelessHART, and map the scheduling of real-time periodic data flows in the network to real-time multiprocessor scheduling. We then exploit the response time analysis for multiprocessor scheduling and propose a novel method for the delay analysis that establishes an upper bound of the end-to-end communication delay of each real-time flow in the network. Simulation studies based on both random topologies and real network topologies of a 74-node physical wireless sensor network testbed demonstrate that our analysis provides safe and reasonably tight upper bounds of the end-to-end delays of real-time flows, and hence enables effective schedulability tests for WirelessHART networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MUCH: Multithreaded Content-Based File Chunking

    Publication Year: 2015 , Page(s): 1375 - 1388
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1528 KB) |  | HTML iconHTML  

    In this work, we developed a novel multithreaded variable size chunking method, MUCH, which exploits the multicore architecture of the modern microprocessors. The legacy single threaded variable size chunking method leaves much to be desired in terms of effectively exploiting the bandwidth of the state of the art storage devices. MUCH guarantees chunking invariability: The result of chunking does not change regardless of the degree of multithreading or the segment size. This is achieved by inter and intra-segment coalescing at the master thread and Dual Mode Chunking at the client thread. We developed an elaborate performance model to determine the optimal multithreading degree and the segment size. MUCH is implemented in the prototype deduplication system. By fully exploiting the available CPU cores (quad-core), we achieved up to ×4 increase in the chunking performance (MByte/sec). MUCH successfully addresses the performance issues of file chunking which is one of the performance bottlenecks in modern deduplication systems by parallelizing the file chunking operation while guaranteeing Chunking Invariability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PowerTracer: Tracing Requests in Multi-Tier Services to Reduce Energy Inefficiency

    Publication Year: 2015 , Page(s): 1389 - 1401
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2017 KB) |  | HTML iconHTML  

    As energy has become one of the key operating costs in running a data center and power waste commonly exists, it is essential to reduce energy inefficiency inside data centers. In this paper, we develop an innovative framework, called PowerTracer, for diagnosing energy inefficiency and saving power. Inside the framework, we first present a resource tracing method based on request tracing in multi-tier services of black boxes. Then, we propose a generalized methodology of applying a request tracing approach for energy inefficiency diagnosis and power saving in multi-tier service systems. With insights into service performance and resource consumption of individual requests, we develop (1) a bottleneck diagnosis tool that pinpoints the root causes of energy inefficiency, and (2) a power saving method that enables dynamic voltage and frequency scaling (DVFS) with online request tracing. We implement a prototype of PowerTracer, and conduct extensive experiments to validate its effectiveness. Our tool analyzes several state-of-the-practice and state-of-the-art DVFS control policies and uncovers existing energy inefficiencies. Meanwhile, the experimental results demonstrate that PowerTracer outperforms its peers in power saving. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability-Aware Speedup Models for Parallel Applications with Coordinated Checkpointing/Restart

    Publication Year: 2015 , Page(s): 1402 - 1415
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1623 KB) |  | HTML iconHTML  

    Speedup models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, the well-known speedup models like Amdahl's law and Gustafson's law do not take reliability into consideration and therefore cannot accurately account for application performance in the presence of failures. In this study, we enhance Amdahl's law and Gustafson's law by considering the impact of failures and the effect of coordinated checkpointing/restart. Unlike existing analytical studies relying on Exponential failure distribution alone, in this work we consider both Exponential and Weibull failure distributions in the construction of our reliability-aware speedup models. The derived reliability-aware models are validated through trace-based simulations under a variety of parameter settings. Our trace-based simulations demonstrate these models can effectively quantify failure impact on application speedup. Moreover, we present two case studies to illustrate the use of these reliability-aware speedup models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robot Coordination for Energy-Balanced Matching and Sequence Dispatch of Robots to Events

    Publication Year: 2015 , Page(s): 1416 - 1428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1779 KB) |  | HTML iconHTML  

    Given a set of events and a set of robots, the dispatch problem is to allocate one robot for each event to visit it. In a single round, each robot may be allowed to visit only one event (matching dispatch), or several events in a sequence (sequence dispatch). In a distributed setting, each event is discovered by a sensor and reported to a robot. Here, we present novel algorithms aimed at overcoming the shortcomings of several existing solutions. We propose pairwise distance based matching algorithm (PDM) to eliminate long edges by pairwise exchanges between matching pairs. Our sequence dispatch algorithm (SQD) iteratively finds the closest event-robot pair, includes the event in dispatch schedule of the selected robot and updates its position accordingly. When event-robot distances are multiplied by robot resistance (inverse of the remaining energy), the corresponding energy-balanced variants are obtained. We also present generalizations which handle multiple visits and timing constraints. Our localized algorithm MAD is based on information mesh infrastructure and local auctions within the robot network for obtaining the optimal dispatch schedule for each robot. The simulations conducted confirm the advantages of our algorithms over other existing solutions in terms of average robot-event distance and lifetime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software Implementation of an Attribute-Based Encryption Scheme

    Publication Year: 2015 , Page(s): 1429 - 1441
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (301 KB) |  | HTML iconHTML  

    A ciphertext-policy attribute-based encryption protocol uses bilinear pairings to provide control access mechanisms, where the set of user's attributes is specified by means of a linear secret sharing scheme. In this paper we present the design of a software cryptographic library that achieves record timings for the computation of a 126-bit security level attribute-based encryption scheme. We developed all the required auxiliary building blocks and compared the computational weight that each of them adds to the overall performance of this protocol. In particular, our single pairing and multi-pairing implementations achieve state-of-the-art time performance at the 126-bit security level. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Performance Comparisons of Computers

    Publication Year: 2015 , Page(s): 1442 - 1455
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1441 KB) |  | HTML iconHTML  

    As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e., the means of performance observations are compared regardless of the variability), or in the few cases directly addressed with t -statistics without checking the number and normality of performance observations. In this paper, we formulate a performance comparison as a statistical task, and empirically illustrate why and how common practices can lead to incorrect comparisons. We propose a non-parametric hierarchical performance testing (HPT) framework for performance comparison, which is significantly more practical than standard t -statistics because it does not require to collect a large number of performance observations in order to achieve a normal distribution of sample mean. In particular, the proposed HPT can facilitate quantitative performance comparison, in which the performance speedup of one computer over another is statistically evaluated. Compared with the HPT, a common practice which uses geometric mean performance scores to estimate the performance speedup has errors of 8.0 to 56.3 percent on SPEC CPU2006 or SPEC MPI2007, which demonstrates the necessity of using appropriate sta- istical techniques. This HPT framework has been implemented as an open-source software, and integrated in the PARSEC 3.0 benchmark suite. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-Series Pattern Based Effective Noise Generation for Privacy Protection on Cloud

    Publication Year: 2015 , Page(s): 1456 - 1469
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1351 KB)  

    Cloud computing is proposed as an open and promising computing paradigm where customers can deploy and utilize IT services in a pay-as-you-go fashion while saving huge capital investment in their own IT infrastructure. Due to the openness and virtualization, various malicious service providers may exist in these cloud environments, and some of them may record service data from a customer and then collectively deduce the customer's private information without permission. Therefore, from the perspective of cloud customers, it is essential to take certain technical actions to protect their privacy at client side. Noise obfuscation is an effective approach in this regard by utilizing noise data. For instance, noise service requests can be generated and injected into real customer service requests so that malicious service providers would not be able to distinguish which requests are real ones if these requests' occurrence probabilities are about the same, and consequently related customer privacy can be protected. Currently, existing representative noise generation strategies have not considered possible fluctuations of occurrence probabilities. In this case, the probability fluctuation could not be concealed by existing noise generation strategies, and it is a serious risk for the customer's privacy. To address this probability fluctuation privacy risk, we systematically develop a novel time-series pattern based noise generation strategy for privacy protection on cloud. First, we analyze this privacy risk and present a novel cluster based algorithm to generate time intervals dynamically. Then, based on these time intervals, we investigate corresponding probability fluctuations and propose a novel time-series pattern based forecasting algorithm. Lastly, based on the forecasting algorithm, our novel noise generation strategy can be presented to withstand the probability fluctuation privacy risk. The simulation evaluation demonstrates that our strategy can significant- y improve the effectiveness of such cloud privacy protection to withstand the probability fluctuation privacy risk. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TSAC: Enforcing Isolation ofVirtual Machines in Clouds

    Publication Year: 2015 , Page(s): 1470 - 1482
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1249 KB) |  | HTML iconHTML  

    Virtualization plays a vital role in building the infrastructure of clouds, and isolation is considered as one of its important features. However, we demonstrate with practical measurements that there exist two kinds of isolation problems in current virtualized systems, due to cache interference in a multi-core processor. That is, one virtual machine could degrade the performance or obtain the load information of another virtual machine, which running on a same physical machine. Then we present a time-sensitive contention management approach (TSAC) for allocating resources dynamically in the virtual machine monitor, in which virtual machines are controlled to share some physical resources (e.g., CPU or page color) in a dynamical manner, in order to enforce isolation between the virtual machines without sacrificing performance of the virtualized system. We have implemented a working prototype based on Xen, evaluated the implemented prototype with experiments, and experimental results show that TSAC could significantly improve isolation of virtualization. Specifically, compared to the default Xen, TSAC could improve the performance of the victim virtual machine by up to about 78 percent, and perform well in blocking its cache-based load information leakage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ViPZonE: Hardware Power Variability-Aware Virtual Memory Management for Energy Savings

    Publication Year: 2015 , Page(s): 1483 - 1496
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1855 KB) |  | HTML iconHTML  

    Hardware variability is predicted to increase dramatically over the coming years as a consequence of continued technology scaling. In this paper, we apply the Underdesigned and Opportunistic Computing (UnO) paradigm by exposing system-level power variability to software to improve energy efficiency. We present ViPZonE, a memory management solution in conjunction with application annotations that opportunistically performs memory allocations to reduce DRAM energy. ViPZonE's components consist of a physical address space with DIMM-aware zones, a modified page allocation routine, and a new virtual memory system call for dynamic allocations from userspace. We implemented ViPZonE in the Linux kernel with GLIBC API support, running on a real x86-64 testbed with significant access power variation in its DDR3 DIMMs. We demonstrate that on our testbed, ViPZonE can save up to 27.80 percent memory energy, with no more than 4.80 percent performance degradation across a set of PARSEC benchmarks tested with respect to the baseline Linux software. Furthermore, through a hypothetical “what-if” extension, we predict that in future non-volatile memory systems which consume almost no idle power, ViPZonE could yield even greater benefits, demonstrating the ability to exploit memory hardware variability through opportunistic software. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low Delay Single Symbol Error Correction Codes Based on Reed Solomon Codes

    Publication Year: 2015 , Page(s): 1497 - 1501
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (493 KB) |  | HTML iconHTML  

    To avoid data corruption, error correction codes (ECCs) are widely used to protect memories. ECCs introduce a delay penalty in accessing the data as encoding or decoding has to be performed. This limits the use of ECCs in high-speed memories. This has led to the use of simple codes such as single error correction double error detection (SEC-DED) codes. However, as technology scales multiple cell upsets (MCUs) become more common and limit the use of SEC-DED codes unless they are combined with interleaving. A similar issue occurs in some types of memories like DRAM that are typically grouped in modules composed of several devices. In those modules, the protection against a device failure rather than isolated bit errors is also desirable. In those cases, one option is to use more advanced ECCs that can correct multiple bit errors. The main challenge is that those codes should minimize the delay and area penalty. Among the codes that have been considered for memory protection are Reed-Solomon (RS) codes. These codes are based on non-binary symbols and therefore can correct multiple bit errors. In this paper, single symbol error correction codes based on Reed-Solomon codes that can be implemented with low delay are proposed and evaluated. The results show that they can be implemented with a substantially lower delay than traditional single error correction RS codes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reviewing High-Radix Signed-Digit Adders

    Publication Year: 2015 , Page(s): 1502 - 1505
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (572 KB) |  | HTML iconHTML  

    Higher radix values of the form β = 2r have been employed traditionally for recoding of multipliers, and for determining quotientand root-digits in iterative division and square root algorithms, usually only for quite moderate values of r, like 2 or 3. For fast additions, in particular for the accumulation of many terms, generally redundant representations are employed, most often binary carry-save or borrow-save, but in a number of publications it has been suggested to recode the addends into a higher radix. It is shown that there are no speed advantages in doing so if the radix is a power of 2, on the contrary, there are significant savings in using standard 4-to-2 adders, even saving half of the operations in multi-operand addition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org