• ### State of the Journal

Publication Year: 2016, Page(s):2014 - 2018
| |PDF (305 KB) | HTML
• ### A Systematic Methodology for Optimization of Applications Utilizing Concurrent Data Structures

Publication Year: 2016, Page(s):2019 - 2031
| |PDF (1764 KB) | HTML

Modern multicore embedded systems often execute applications that rely heavily on concurrent data structures. The selection of efficient concurrent data structure implementations for a specific application is usually a complex and time consuming task, because each design decision often affects the performance and the energy consumption of the embedded system in various and occasionally unpredictab... View full abstract»

• ### Comparison between Binary and Decimal Floating-Point Numbers

Publication Year: 2016, Page(s):2032 - 2044
Cited by:  Papers (1)
| |PDF (299 KB) | HTML

We introduce an algorithm to compare a binary floating-point (FP) number and a decimal FP number, assuming the “binary encoding” of the decimal formats is used, and with a special emphasis on the basic interchange formats specified by the IEEE 754-2008 standard for FP arithmetic. It is a two-step algorithm: a first pass, based on the exponents only, quickly eliminates most cases, the... View full abstract»

• ### Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs

Publication Year: 2016, Page(s):2045 - 2058
Cited by:  Papers (1)
| |PDF (1108 KB) | HTML

Scratchpad memories in GPU architectures are employed as software-controlled caches to increase the effective GPU memory bandwidth. Through the use of well-known optimization techniques, such as privatization and tiling, they are properly exploited. Typically, they are banked memories which are addressed with a $\text{mod}(2^N)$ ... View full abstract»

• ### Efficient Resource Constrained Scheduling Using Parallel Structure-Aware Pruning Techniques

Publication Year: 2016, Page(s):2059 - 2073
Cited by:  Papers (1)
| |PDF (1016 KB) | HTML

Branch-and-bound approaches are promising in pruning fruitless search space during the resource constrained scheduling. However, such approaches only compare the estimated upper and lower bounds of an incomplete schedule to the length of the best feasible schedule at that iteration, which does not fully exploit the potential of the pruning during the search. Aiming to improve the performance of re... View full abstract»

• ### Extended Generalized Feistel Networks Using Matrix Representation to Propose a New Lightweight Block Cipher: Lilliput

Publication Year: 2016, Page(s):2074 - 2089
Cited by:  Papers (5)
| |PDF (965 KB) | HTML Media

While Generalized Feistel Networks (GFNs) have been widely studied in the literature as a building block of a block cipher, we recall in this paper the results of [1] where a unified vision to easily represent them through a matrix representation is proposed. We also introduce a new class of such schemes called Extended Generalized Feistel Networks well suited for cryptographic applications. We in... View full abstract»

• ### Fair Flow Control and Fairness Evaluation in Computer Networks and Systems

Publication Year: 2016, Page(s):2090 - 2103
| |PDF (1532 KB) | HTML Media

Fairness is an important property of computer networks and systems. In a wide range of these systems such as distributed multi-hop wireless networks, multihomed networks, and cloud computing, each user may be allocated a number of system resources; this resembles a many-to-many relationship between the sets of users and resources, which raises the problem of system-wide fair resource allocation. I... View full abstract»

• ### Global Optimization for Multi-Channel Wireless Data Broadcast with AH-Tree Indexing Scheme

Publication Year: 2016, Page(s):2104 - 2117
Cited by:  Papers (4)
| |PDF (1197 KB) | HTML

Multi-channel wireless data broadcast is an appropriate approach to disseminate data to a mass number of mobile clients. In this paper, we present a global optimization for multi-channel wireless data broadcast with Alphabetic Huffman Tree (AH-Tree) Indexing scheme, which can deal with skewed access frequencies well. We present three novel designs to reduce the access latency and tuning time of th... View full abstract»

• ### Link-Layer Multicast in Large-Scale 802.11n Wireless LANs with Smart Antennas

Publication Year: 2016, Page(s):2118 - 2133
Cited by:  Papers (1)
| |PDF (1362 KB) | HTML

In wireless local area networks (WLANs), link-layer multicast is a promising technology for many multimedia applications, e.g., video streaming, as multicast frames can reach multiple clients simultaneously. However, the efficiency of multicast in WLANs is usually low since multicast frames are transmitted at a basic data rate to reach clients with poor channel quality. Moreover, the reliability o... View full abstract»

• ### Modeling of Gaussian Network-Based Reconfigurable Network-on-Chip Designs

Publication Year: 2016, Page(s):2134 - 2142
| |PDF (657 KB) | HTML

In network on chips (NoCs) design, reconfiguration of NoC is a very effective option for minimizing power consumption, and Gaussian networks can provide significant advantage over the mesh networks in terms of network diameter, average hop distance and so on. In this paper, based on the special topology structure and the static connection rules within Gaussian networks, we present the reconfigurat... View full abstract»

• ### Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation

Publication Year: 2016, Page(s):2143 - 2157
Cited by:  Papers (1)
| |PDF (945 KB) | HTML

As a result of huge advancements in VLSI technology, more and more complex circuits are being implemented making not only the whole digital system more prone to faults, but also the fault detector itself susceptible to faults resulting in the requirement of concurrent fault detection architecture of the encoders and decoders. In this paper, we present a multiple-bit parity-based fault detection ar... View full abstract»

• ### New Formats for Computing with Real-Numbers under Round-to-Nearest

Publication Year: 2016, Page(s):2158 - 2168
Cited by:  Papers (4)
| |PDF (517 KB) | HTML

In this paper, a new family of formats to deal with real number for applications requiring round to nearest is proposed. They are based on shifting the set of exactly represented numbers which are used in conventional radix- $\beta$ View full abstract»

• ### NV-Tree: A Consistent and Workload-Adaptive Tree Structure for Non-Volatile Memory

Publication Year: 2016, Page(s):2169 - 2183
| |PDF (1334 KB) | HTML

The non-volatile memory (NVM) which can provide DRAM-like performance and disk-like persistency has the potential to build single-level systems by replacing both DRAM and disk. Keeping data consistency in such systems is non-trivial because memory writes may be reordered by CPU. Although ordered memory writes for achieving data consistency can be implemented using the memory fence and the CPU cach... View full abstract»

• ### Performance Prediction for Large-Scale Parallel Applications Using Representative Replay

Publication Year: 2016, Page(s):2184 - 2198
| |PDF (2001 KB) | HTML

Automatically predicting performance of parallel applications has been a long-standing goal in the area of high performance computing. However, accurate performance prediction is challenging, since the execution time of parallel applications is determined by several factors, such as sequential computation time, communication time and their complex interactions. Despite previous efforts, accurately... View full abstract»

• ### PSBS: Practical Size-Based Scheduling

Publication Year: 2016, Page(s):2199 - 2212
Cited by:  Papers (3)
| |PDF (1340 KB) | HTML Media

Size-based schedulers have very desirable performance properties: optimal or near-optimal response time can be coupled with strong fairness. Despite this, however, such systems are rarely implemented in practical settings, because they require knowing a priori the amount of work needed to complete jobs: this assumption is difficult to satisfy in concrete systems. It is definitely more likely to in... View full abstract»

• ### Q-DRAM: Quick-Access DRAM with Decoupled Restoring from Row-Activation

Publication Year: 2016, Page(s):2213 - 2227
Cited by:  Papers (2)
| |PDF (3031 KB) | HTML

The relatively high latency of DRAM is mostly caused by the long row-activation time which in fact consists of sensing and restoring time. Memory controllers cannot distinguish between them since they are performed consecutively by a single row-activation command. If these two steps are separated, the restoring can be delayed until DRAM access is uncongested. Hence, we propose Quick-Access DRAM (Q... View full abstract»

• ### Remote Transaction Commit: Centralizing Software Transactional Memory Commits

Publication Year: 2016, Page(s):2228 - 2240
| |PDF (914 KB) | HTML Media

Software Transactional Memory (STM) has recently emerged as a promising synchronization abstraction for multicore architectures. State-of-the-art STM algorithms, however, suffer from performance challenges due to contention and spinning on locks during the transaction commit phase. In this paper, we introduce Remote Transaction Commit (or RTC), a mechanism for executing commit phases of STM transa... View full abstract»

• ### Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults

Publication Year: 2016, Page(s):2241 - 2256
| |PDF (2380 KB) | HTML Media

Networks-on-chip (NoCs) have been increasingly adopted in recent years due to the extensive integration of many components in modern multicore processors and system-on-chip designs. At the same time, transistor reliability is becoming a major concern due to the continuous scaling of silicon. As the sole medium of on-chip communication, it is critical for a NoC to be able to tolerate many permanent... View full abstract»

• ### Scalable Multi-Match Packet Classification Using TCAM and SRAM

Publication Year: 2016, Page(s):2257 - 2269
| |PDF (1467 KB) | HTML

Packet classification is an enabling technology for various network services. Fast single-match packet classification can be achieved by using ternary content addressable memory (TCAM) because of the superior speed performance. TCAM has some drawbacks including incapability to store arbitrary ranges, confined TCAM capacity and limited choices of entry lengths. Moreover, TCAM only reports the first... View full abstract»

• ### Symbol Shifting: Tolerating More Faults in PCM Blocks

Publication Year: 2016, Page(s):2270 - 2283
| |PDF (1117 KB) | HTML

Phase-change memory (PCM) has emerged as a candidate that overcomes the physical limitations faced by DRAM and NAND flash memory. While PCM has desirable properties in terms of scalability and density, it suffers from limited endurance. Repeated writes cause PCM cells to wear out and get permanently stuck at a specific value. Recovering from stuck-at faults through a proactive error correcting sch... View full abstract»

• ### Test Algorithms for ECC-Based Memory Repair in Ultimate CMOS and Post-CMOS

Publication Year: 2016, Page(s):2284 - 2298
Cited by:  Papers (4)
| |PDF (657 KB) | HTML Media

In modern SoCs embedded memories should be protected by ECC against field failures to achieve acceptable reliability. They should also be repaired after fabrication to achieve acceptable fabrication yield. In technologies affected by high defect densities, conventional repair induces very high costs. To reduce it, we can use ECC-based repair, consisting in using the ECC for fixing words comprising... View full abstract»

• ### Truthful Mechanisms for Competitive Reward-Based Scheduling

Publication Year: 2016, Page(s):2299 - 2312
Cited by:  Papers (1)
| |PDF (471 KB) | HTML

We consider a competitive environment for reward-based scheduling of periodic tasks, where the execution of each task consists of a mandatory and an optional part. Each task obtains a value if the processor successfully schedules all its mandatory part, and also an additional reward value if the processor successfully schedules a part of its optional execution. Each task is owned by a self-interes... View full abstract»

• ### Write Mode Aware Loop Tiling for High Performance Low Power Volatile PCM in Embedded Systems

Publication Year: 2016, Page(s):2313 - 2324
Cited by:  Papers (1)
| |PDF (1322 KB) | HTML

Architecting PCM, especially MLC PCM, as main memory for MCUs is a promising technique to replace conventional DRAM deployment. However, PCM/MLC PCM suffers from long write latency and large write energy. Recent work has proposed a compiler directed dual-write (CDDW) scheme to combat the drawbacks of PCM by adopting fast or slow mode for different write operations. For large-scale loops, we observ... View full abstract»

• ### Achieving Simple, Secure and Efficient Hierarchical Access Control in Cloud Computing

Publication Year: 2016, Page(s):2325 - 2331
Cited by:  Papers (5)
| |PDF (584 KB) | HTML Media

Access control is an indispensable security component of cloud computing, and hierarchical access control is of particular interest since in practice one is entitled to different access privileges. This paper presents a hierarchical key assignment scheme based on linear-geometry as the solution of flexible and fine-grained hierarchical access control in cloud computing. In our scheme, the encrypti... View full abstract»

• ### Improving the Accuracy of Defect Diagnosis with Multiple Sets of Candidate Faults

Publication Year: 2016, Page(s):2332 - 2338
Cited by:  Papers (1)
| |PDF (193 KB) | HTML

Given a chip that produced a faulty output response to a test set, a defect diagnosis procedure produces a set of candidate faults that is expected to identify the defects that are present in the chip. The accuracy of the set of candidate faults is higher when the set is smaller or when its overlap with the defects that are present in the chip is larger. To increase the accuracy of a set of candid... View full abstract»

