Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 6 • Date June 1 2015

Filter Results

Displaying Results 1 - 24 of 24
  • State of the Journal

    Publication Year: 2015 , Page(s): 1506 - 1508
    Save to Project icon | Request Permissions | PDF file iconPDF (199 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Accelerating Fully Homomorphic Encryption in Hardware

    Publication Year: 2015 , Page(s): 1509 - 1521
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB) |  | HTML iconHTML  

    We present a custom architecture for realizing the Gentry-Halevi fully homomorphic encryption (FHE) scheme. This contribution presents the first full realization of FHE in hardware. The architecture features an optimized multi-million bit multiplier based on the Schönhage Strassen multiplication algorithm. Moreover, a number of optimizations including spectral techniques as well as a precomputation strategy is used to significantly improve the performance of the overall design. When synthesized using 90 nm technology, the presented architecture achieves to realize the encryption, decryption, and recryption operations in 18.1 msec, 16.1 msec, and 3.1 sec, respectively, and occupies a footprint of less than 30 million gates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive-Acceleration Data Center TCP

    Publication Year: 2015 , Page(s): 1522 - 1533
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1973 KB) |  | HTML iconHTML  

    Providing deadline-sensitive services is a challenge in data centers. Because of the conservativeness in additive increase congestion avoidance, current transmission control protocols are inefficient in utilizing the super high bandwidth of data centers. This may cause many deadline-sensitive flows to miss their deadlines before achieving their available bandwidths. We propose an Adaptive-Acceleration Data Center TCP, A !^2 DTCP, which takes into account both network congestion and latency requirement of application service. By using congestion avoidance with an adaptive increase rate that varies between additive and multiplicative, A !^2 DTCP accelerates bandwidth detection thus achieving high bandwidth utilization efficiency. At-scale simulations and real testbed implementations show that A !^2 DTCP significantly reduces the missed deadline ratio compared to D !^2 TCP and DCTCP. In addition, A !^2 DTCP can co-exist with conventional TCP as well without requiring more changes in switch hardware than D !^2 TCP and DCTCP. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses

    Publication Year: 2015 , Page(s): 1534 - 1547
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1498 KB) |  | HTML iconHTML  

    The design of cache memories is a crucial part of the design cycle of a modern processor, since they are able to bridge the performance gap between the processor and the memory. Unfortunately, caches with low degrees of associativity suffer a large amount of conflict misses. Although by increasing their associativity a significant fraction of these misses can be removed, this comes at a high cost in both power, area, and access time. In this work, we address the problem of high number of conflict misses in low-associative caches, by proposing an indexing policy that adaptively selects the bits from the block address used to index the cache. The basic premise of this work is that the non-uniformity in the set usage is caused by a poor selection of the indexing bits. Instead, by selecting at run time those bits that disperse the working set more evenly across the available sets, a large fraction of the conflict misses (85 percent, on average) can be removed. This leads to IPC improvements of 10.9 percent for the SPEC CPU2006 benchmark suite. By having less accesses in the L2 cache, our proposal also reduces the energy consumption of the cache hierarchy by 13.2 percent. These benefits come with a negligible area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Nearly Optimal Packet Scheduling Algorithm for Input Queued Switches with Deadline Guarantees

    Publication Year: 2015 , Page(s): 1548 - 1563
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1231 KB) |  | HTML iconHTML  

    Deadline guaranteed packet scheduling for switches is a fundamental issue for providing guaranteed QoS in digital networks. It is a historically difficult NP-hard problem if three or more deadlines are involved. All existing algorithms have too low throughput to be used in practice. A key reason is they use packet deadlines as default priorities to decide which packets to drop whenever conflicts occur. Although such a priority structure can ease the scheduling by focusing on one deadline at a time, it hurts the throughput greatly. Since deadlines do not necessarily represent the actual importance of packets, we can greatly improve the throughput if deadline induced priority is not enforced. This paper first presents an algorithm that guarantees the maximum throughput for the case where only two different deadlines are allowed. Then, an algorithm called iterative scheduling with no priority (ISNOP) is proposed for the general case where k > 2 different deadlines may occur. Not only does this algorithm have dramatically better average performance than all existing algorithms, but also guarantees approximation ratio of 2. ISNOP would provide a good practical solution for the historically difficult packet scheduling problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Scalable Formal Debugging Approach with Auto-Correction Capability Based on Static Slicing and Dynamic Ranking for RTL Datapath Designs

    Publication Year: 2015 , Page(s): 1564 - 1578
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1616 KB) |  | HTML iconHTML  

    By increasing the complexity of digital systems, verification and debugging of such systems have become a major problem and economic issue. Although many computer aided design (CAD) solutions have been suggested to enhance efficiency of existing debugging approaches, they are still suffering from lack of providing a small set of potential error locations and also automatic correction mechanisms. On the other hand, the ever-growing usage of digital signal processing (DSP), computer graphics and embedded systems applications that can be modeled as polynomial computations in their datapath designs, necessitate an effective method to deal with their verification, debugging and correction. In this paper, we introduce a formal debugging approach based on static slicing and dynamic ranking methods to derive a reduced ordered set of potential error locations. In addition, to speed up finding true errors in the presence of multiple design errors, error candidates are sorted in decreasing order of their probability of being an error. After that, a mutation-based technique is employed to automatically correct bugs even in the case of multiple bugs. In order to evaluate the effectiveness of our approach, we have applied it to several industrial designs. The experimental results show that the proposed technique enables us to locate and correct even multiple bugs with high confidence in a short run time even for complex designs of up to several thousand lines of RTL code. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic High-Level Data-Flow Synthesis and Optimization of Polynomial Datapaths Using Functional Decomposition

    Publication Year: 2015 , Page(s): 1579 - 1593
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1544 KB) |  | HTML iconHTML  

    This paper concentrates on high-level data-flow optimization and synthesis techniques for datapath intensive designs such as those in Digital Signal Processing (DSP), computer graphics and embedded systems applications, which are modeled as polynomial computations over Z_{2^{n_1 } } \times Z_{2^{n_2 } } \times \cdots \times Z_{2^{n_d } } to Z_{2^m } . Our main contribution in this paper is proposing an optimization method based on functional decomposition of multivariate polynomial in the form of f(x) = g(x) ;o ;h(x) + f_{0} = g(h(x)) + f_{0} to obtain good building blocks, and vanishing polynomials over Z_{2^m } to add/delete redundancy to/from given polynomial functions to extract further common sub-expressions. Experimental results for combinational implementation of the designs have shown an average saving of 38.85 and 18.85 percent in the number of gates and critical path delay, respectively, compared with the state-of-the-art techniques. Regarding the comparison with our previous works, the area and delay are improved by 10.87 and 11.22 percent, respectively. Furthermore, experimental results of sequential implementations have shown an average saving of 39.26 and 34.70 percent in the area and the latency, respectively, compared with the state-of-the-art techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • By-Passing Infected Areas in Wireless Sensor Networks Using BPR

    Publication Year: 2015 , Page(s): 1594 - 1606
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1455 KB) |  | HTML iconHTML  

    Abnormalities in sensed data streams indicate the spread of malicious attacks, hardware failure and software corruption among the different nodes in a wireless sensor network. These factors of node infection can affect generated and incoming data streams resulting in high chances of inaccurate data, misleading packet translation, wrong decision making and severe communication disruption. This problem is detrimental to real-time applications having stringent quality-of-service (QoS) requirements. The sensed data from other uninfected regions might also get stuck in an infected region should no prior alternative arrangements are made. Although several existing methods (BOUNDHOLE and GAR) can be used to mitigate these issues, their performance is bounded by some limitations, mainly the high risk of falling into routing loops and involvement in unnecessary transmissions. This paper provides a solution to by-pass the infected nodes dynamically using a twin rolling balls technique and also divert the packets that are trapped inside the identified area. The identification of infected nodes is done by adapting a Fuzzy data clustering approach which classifies the nodes based on the fraction of anomalous data that is detected in individual data streams. This information is then used in the proposed by-passed routing (BPR) which rotates two balls in two directions simultaneously: clockwise and counter-clockwise. The first node that hits any ball in any direction and is uninfected, is selected as the next hop. We are also concerned with the incoming packets or the packets-on-the-fly that may be affected when this problem occurs. Besides solving both of the problems in the existing methods, the proposed BPR technique has greatly improved the studied QoS parameters as shown by almost 40 percent increase in the overall performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compositional Model Checking of Concurrent Systems

    Publication Year: 2015 , Page(s): 1607 - 1621
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1147 KB) |  | HTML iconHTML  

    This paper presents a compositional framework to address the state explosion problem in model checking of concurrent systems. This framework takes as input a system model described as a network of communicating components in a high-level description language, finds the local state transition models for each individual component where local properties can be verified, and then iteratively reduces and composes the component state transition models to form a reduced global model for the entire system where global safety properties can be verified. The state space reductions used in this framework result in a reduced model that contains the exact same set of observably equivalent executions as in the original model, therefore, no false counter-examples result from the verification of the reduced model. This approach allows designs that cannot be handled monolithically or with partial-order reduction to be verified without difficulty. The experimental results show significant scale-up of this compositional verification framework on a number of non-trivial concurrent system models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conditional (t,k) -Diagnosis in Graphs by Using the Comparison Diagnosis Model

    Publication Year: 2015 , Page(s): 1622 - 1632
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (530 KB) |  | HTML iconHTML  

    $(t,k)$ -Diagnosis, which is a generalization of sequential diagnosis, requires that at least $k$ faulty processors be identified and repaired in each iteration when there are at most $t$ faulty processors, where $tge k$ . Based on the assumption that each vertex is adjacent to at least one fault-free vertex, the conditional $(t,k)$ -diagnosis of graphs was investigated by using the comparison diagnosis model. Lower bounds on the conditional $(t, k)$ -diagnosability of graphs were derived, and applied to obtain the following results. 1) Symmetric $d$ -dimensional grids are conditionally $(frac{N}{2d+1}-1,2d-1)$ -diagnosable when $dge 2$ and $N$ (the number of vertices)$ge 4^d$ . 2) Symmetric $d$ -dimensional tori are conditionally $(frac{1}{5}(N+min lbrace frac{8}{5} N^{frac{2}{3}},frac{2N-20}{15}rbrace -2),6)$ -diagnosable when $d=2$ and $Nge 49$ and $(frac{N}{2d+1}-1,4d-2)$ -diagnosable when $dge 3$ and $Nge 4^d$ View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Credit-Based Runtime Placement of Virtual Machines on a Single NUMA System for QoS of Data Access Performance

    Publication Year: 2015 , Page(s): 1633 - 1646
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3430 KB) |  | HTML iconHTML  

    While a NUMA system is being widely used as a target machine for virtualization, each data access request produced by a virtual machine (VM) on the NUMA system may have a different access time depending on not only remote access condition, but also shared resource contentions. Mainly due to this, each VM running on the NUMA system will have irregular data access performance at different times. Because existing hypervisors, such as KVM, VMware, and Xen, have yet to consider this, users of VMs cannot predict their data access performance or even recognize the data access performance they have experienced. In this paper, we propose a novel VM placement technique to resolve this issue pertaining to irregular data access performance of VMs running on the NUMA system. The hypervisor with our technique provides the illusion of a private memory subsystem to each VM, which guarantees the data access latency required by each VM on average. To enable this feature, we periodically evaluates the average data access latency of each VM using hardware performance monitoring units. After every evaluation, our Mcredit-based VM migration algorithm tries to migrate the VCPU or memory of the VM not meeting with its required data access latency to another node, giving the VM less data access latency. We implemented the prototype for KVM hypervisor on Linux 3.10.10. Experimental results show that, in the four-node NUMA system, our technique keeps the required data access performance levels of VMs running various workloads while it only consumes less than 1 percent of the cycles of a core and 0.3 percent of the system memory bandwidth. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Differential Fault Attack against Grain Family with Very Few Faults and Minimal Assumptions

    Publication Year: 2015 , Page(s): 1647 - 1657
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (297 KB) |  | HTML iconHTML  

    The series of published works, related to differential fault attack (DFA) against the Grain family, require quite a large number (hundreds) of faults and also several assumptions on the locations and the timings of the faults injected. In this paper, we present a significantly improved scenario from the adversarial point of view for DFA against the Grain family of stream ciphers. Our model is the most realistic one so far as it considers that the cipher has to be re-keyed only a few times and faults can be injected at any random location and at any random point of time, i.e., no precise control is needed over the location and timing of fault injections. We construct equations based on the algebraic description of the cipher by introducing new variables so that the degrees of the equations do not increase. In line of algebraic cryptanalysis, we accumulate such equations based on the fault-free and faulty key-stream bits and solve them using the SAT Solver Cryptominisat-2.9.5 installed with SAGE 5.7. In a few minutes we can recover the state of Grain v1, Grain-128 and Grain-128a with as little as 10, 4 and 10 faults respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Rule Engine for Smart Building Systems

    Publication Year: 2015 , Page(s): 1658 - 1669
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1287 KB) |  | HTML iconHTML  

    In smart building systems, the automatic control of devices relies on matching the sensed environment information to customized rules. With the development of wireless sensor and actuator networks (WSANs), low-cost and self-organized wireless sensors and actuators can enhance smart building systems, but produce abundant sensing data. Therefore, a rule engine with ability of efficient rule matching is the foundation of WSANs based smart building systems. However, traditional rule engines mainly focus on the complex processing mechanism and omit the amount of sensing data, which are not suitable for large scale WSANs based smart building systems. To address these issues, we build an efficient rule engine. Specifically, we design an atomic event extraction module for extracting atomic event from data messages, and then build a \beta -network to acquire the atomic conditions for parsing the atomic trigger events. Taking the atomic trigger events as the key set of MPHF, we construct the minimal perfect hash table which can filter the majority of the unused atomic event with O (1) time overhead. Moreover, a rule engine adaption scheme is proposed to minimize the rule matching overhead. We implement the proposed rule engine in a practical smart building system. The experimental results show that the rule engine can perform efficiently and flexibly with high data throughput and large rule set. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Cooperative Communications for Multimedia Applications in Multi-Channel Wireless Networks

    Publication Year: 2015 , Page(s): 1670 - 1679
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (576 KB) |  | HTML iconHTML  

    The dramatic growth of mobile multimedia communications imposes new requirements on quality-of-service and energy efficiency in wireless networks. In this paper, we study the energy- and spectrum-efficient cooperative communication (ESCC) problem by exploiting the benefits of cooperative communication (CC) for mobile multimedia applications in multi-channel wireless networks.In a static network, it is formulated as a mixed-integer nonlinear programming problem. To solve this problem, we use linearizationand reformulation techniques to transform it into a mixed-integer linear programming problem that is solved by a branch-and-bound algorithm with enhanced performance. To deal with the problem in dynamic networks, we propose an online algorithm with lowcomputational complexity and deployment overhead. Extensive simulations are conducted to show that the proposed algorithmcan significantly improve the performance of energy efficiency in both static and dynamic networks. View full abstract»

    Open Access
  • Energy-Efficient Real-Time Human Mobility State Classification Using Smartphones

    Publication Year: 2015 , Page(s): 1680 - 1693
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1022 KB) |  | HTML iconHTML  

    The key benefits of using the smartphone accelerometer for human mobility analysis, with or without location determination based upon GPS, Wi-Fi or GSM is that it is energy-efficient, provides real-time contextual information and has high availability. Using measurements from an accelerometer for human mobility analysis presents its own challenges as we all carry our smartphonesdifferently and the measurements are body placement dependent. Also it often relies on an on-demand remote data exchangefor analysis and processing; which is less energy-efficient, has higher network costs and is not real-time. We present a novelaccelerometer framework based upon a probabilistic algorithm that neutralizes the effect of different smartphone on-body placements and orientations to allow human movements to be more accurately and energy-efficiently identified. Using solely the embeddedsmartphone accelerometer without need for referencing historical data and accelerometer noise filtering, our method can in real-time with a time constraint of 2 seconds identify the human mobility state. The method achieves an overall average classification accuracyof 92 percent when evaluated on a dataset gathered from fifteen individuals that classified nine different urban human mobility states. View full abstract»

    Open Access
  • Ensuring Cache Reliability and Energy Scaling at Near-Threshold Voltage With Macho

    Publication Year: 2015 , Page(s): 1694 - 1706
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1910 KB) |  | HTML iconHTML  

    Nanoscale process variations in conventional SRAM cells are known to limit voltage scaling in microprocessor caches. Recently, a number of novel cache architectures have been proposed which substitute faulty words of one cache line with healthy words of others, to tolerate these failures at low voltages. These schemes rely on the fault maps to identify faulty words, inevitably increasing the chip area. Besides, the relationship between word sizes and the cache failure rates is not well studied in these works. In this paper, we analyze the word substitution schemes by employing Fault Tree Model and Collision Graph Model. A novel cache architecture (Macho) is then proposed based on this model. Macho is dynamically reconfigurable and is locally optimized (tailored to local fault density) using two algorithms: 1) a graph coloring algorithm for moderate fault densities and 2) a bipartite matching algorithm to support high fault densities. An adaptive matching algorithm enables on-demand reconfiguration of Macho to concentrate available resources on cache working sets. As a result, voltage scaling down to 400 mV is possible, tolerating bit failure rates reaching 1 percent (one failure in every 100 cells). This near-threshold voltage (NTV) operation achieves 44 percent energy reduction in our simulated system (CPU + DRAM models) with a 1 MB L2 cache. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault Attacks on Pairing-Based Protocols Revisited

    Publication Year: 2015 , Page(s): 1707 - 1714
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (162 KB) |  | HTML iconHTML  

    Several papers have studied fault attacks on computing a pairing value e(P,Q) , where P is a public point and Q is a secret point. In this paper, we observe that these attacks are in fact effective only on a small number of pairing-based protocols, and that too only when the protocols are implemented with specific symmetric pairings. We demonstrate the effectiveness of the fault attacks on a public-key encryption scheme, an identity-based encryption scheme, and an oblivious transfer protocol when implemented with a symmetric pairing derived from a supersingular elliptic curve with embedding degree 2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Knowledge Sharing in the Online Social Network of Yahoo! Answers and Its Implications

    Publication Year: 2015 , Page(s): 1715 - 1728
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4549 KB) |  | HTML iconHTML  

    Question and Answer (Q&A) websites such as Yahoo! Answers provide a platform where users can post questions and receive answers. These systems take advantage of the collective intelligence of users to find information. In this paper, we analyze the online social network (OSN) in Yahoo! Answers. Based on a large amount of our collected data, we studied the OSN’s structural properties, which reveals strikingly distinct properties such as low link symmetry and weak correlation between indegree and outdegree. After studying the knowledge base and behaviors of the users, we find that a small number of top contributors answer most of the questions in the system. Also, each top contributor focuses only on a few knowledge categories. In addition, the knowledge categories of the users are highly clustered. We also study the knowledge base in a user’s social network, which reveals that the members in a user’s social network share only a few knowledge categories. Based on the findings, we provide guidance in the design of spammer detection algorithms and distributed Q&A systems. We also propose a friendship-knowledge oriented Q&A framework that synergistically combines current OSN-based Q&A and web Q&A. We believe that the results presented in this paper are crucial in understanding the collective intelligence in the web Q&A OSNs and lay a cornerstone for the evolution of next-generation Q&A systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-Demand Block-Level Address Mapping in Large-Scale NAND Flash Storage Systems

    Publication Year: 2015 , Page(s): 1729 - 1741
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1410 KB) |  | HTML iconHTML  

    The density of flash memory chips has doubled every two years in the past decade and the trend is expected to continue. The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping management. This paper proposes a novel Demand-based block-level Address mapping scheme with a two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without excessively compromising system response time. In our technique, the block-level address mapping table is stored in fixed pages (called the translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand address mapping entries. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages. In such a way, both the most-frequently-accessed and sequentially accessed address mapping entries can be stored in the cache so the cache hit ratio can be increased and the system response time can be improved. To the best of our knowledge, this is the first work to reduce the RAM cost by employing the demand-based approach on block-level address mapping schemes. The experiments have been conducted on a real embedded platform. The experimental results show that our technique can effectively reduce the RAM footprint while maintaining similar average system response time compared with previous work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Interplay between Global DVFS and Scheduling Tasks with Precedence Constraints

    Publication Year: 2015 , Page(s): 1742 - 1754
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (502 KB) |  | HTML iconHTML  

    Many multicore processors are capable of decreasing the voltage and clock frequency to save energy at the cost of an increased delay. While a large part of the theory oriented literature focuses on local dynamic voltage and frequency scaling (local DVFS), where every core’s voltage and clock frequency can be set separately, this article presents an in-depth theoretical study of the more commonly available global DVFS that makes such changes for the entire chip. This article shows how to choose the optimal clock frequencies that minimize the energy for global DVFS, and it discusses the relationship between scheduling and optimal global DVFS. Formulas are given to find this optimum under time constraints, including proofs thereof. The problem of simultaneously choosing clock frequencies and a schedule that together minimize the energy consumption is discussed, and based on this a scheduling criterion is derived that implicitly assigns frequencies and minimizes energy consumption. Furthermore, this article studies the effectivity of a large class of scheduling algorithms with regard to the derived criterion, and a bound on the maximal relative deviation is given. Simulations show that with our techniques an energy reduction of 30% can be achieved with respect to state-of-the-art research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization of Composite Cloud Service Processing with Virtual Machines

    Publication Year: 2015 , Page(s): 1755 - 1768
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1485 KB) |  | HTML iconHTML  

    By leveraging virtual machine (VM) technology, we optimize cloud system performance based on refined resource allocation, in processing user requests with composite services. Our contribution is three-fold. (1) We devise a VM resource allocation scheme with a minimized processing overhead for task execution. (2) We comprehensively investigate the best-suited task scheduling policy with different design parameters. (3) We also explore the best-suited resource sharing scheme with adjusted divisible resource fractions on running tasks in terms of Proportional-share model (PSM), which can be split into absolute mode (called AAPSM) and relative mode (RAPSM). We implement a prototype system over a cluster environment deployed with 56 real VM instances, and summarized valuable experience from our evaluation. As the system runs in short supply, lightest workload first (LWF) is mostly recommended because it can minimize the overall response extension ratio (RER) for both sequential-mode tasks and parallel-mode tasks. In a competitive situation with over-commitment of resources, the best one is combining LWF with both AAPSM and RAPSM. It outperforms other solutions in the competitive situation, by 16 ;+; % w.r.t. the worst-case response time and by 7.4 ;+; % w.r.t. the fairness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SpiNNaker—Programming Model

    Publication Year: 2015 , Page(s): 1769 - 1782
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1609 KB) |  | HTML iconHTML  

    SpiNNaker is a multi-core computing engine, with a bespoke and specialised communication infrastructure that supports almost perfect scalability up to a hard limit of 2^{16} \times 18 = 1,!179,!648 cores. This remarkable property is achieved at the cost of ignoring memory coherency, global synchronisation and even deterministic message passing, yet it is still possible to perform meaningful computations. Whilst we have yet to assemble the full machine, the scalability properties make it possible to demonstrate the capabilities of the machine whilst it is being assembled; the more cores we connect, the larger the problems become that we are able to attack. Even with isolated printed circuit boards of 864 cores, interesting capabilities are emerging. This paper is the third of a series charting the development trajectory of the system. In the first two, we outlined the hardware build. Here, we lay out the (rather unusual) low-level foundation software developed so far to support the operation of the machine. View full abstract»

    Open Access
  • Unified Mitchell-Based Approximation for Efficient Logarithmic Conversion Circuit

    Publication Year: 2015 , Page(s): 1783 - 1797
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1348 KB) |  | HTML iconHTML  

    This paper presents a novel method named the Unified Mitchell-based Approximation (UMA) to obtain an optimized Mitchell-based logarithmic conversion circuit for any desired conversion accuracy up to 14 bits. UMA is the first method that is able to obtain a conversion circuit when a specific accuracy is required. In this work, we studied and analyzed five design parameters and their impact on accuracy and hardware merits. We formulate the hardware model of the error correction circuit in the conversion circuit for performance evaluation. Given an accuracy requirement, the proposed method explores the design space of the five design parameters. As the design space is theoretically huge, we propose constraints for the range of the parameter values and develop a systematical search algorithm for exploring the design space. UMA is able to obtain an area-delay product optimized circuit for each of the conversion accuracies achieved by the existing Mitchell-based designs. Synthesis results in 90 nm CMOS technology show that the circuits obtained are comparable or better than the existing Mitchell-based designs with the same accuracy objective. Nine of the fifteen circuits obtained achieve better area-delay product by more than 50 percent. In addition, UMA is able to obtain circuits for any accuracy from 4 to 14 bits, while the best accuracy achieved by the existing Mitchell-based methods is less than 12 bits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • JEM: Just in Time/Just Enough Energy Management Methodology for Computing Systems

    Publication Year: 2015 , Page(s): 1798 - 1804
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (818 KB) |  | HTML iconHTML  

    This paper presents the Just in Time/Just Enough Energy Management (JEM) methodology that is applicable to a broad range of computing systems. The conventional concept of a fixed voltage supply (VDD) scheme for both performance and power saving modes of computing systems is revisited and is improved with JEM. The JEM consists of an efficient DC/DC converter and a Power Management Integrated Circuit (PMIC) with a feedback to monitor the activities within a given computing system, providing a new means for dynamic voltage scaling at the system level. The JEM is tested and validated on a blade server that results in 15.11 percent power savings at the motherboard level. A significant thermal improvement of 9.0°C is measured in a 16 GB memory module of the blade server, as well. Moreover, a JEM enabled CMOS circuit depicts a remarkable reduction in the supply current. Furthermore, the JEM is compared to a conventional power supply design, with significant improvement in the processor performance and considerable power savings in the blade server. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org