• ### Approximate DCT Image Compression Using Inexact Computing

Publication Year: 2018, Page(s):149 - 159
This paper proposes a new framework for digital image processing; it relies on inexact computing to address some of the challenges associated with the discrete cosine transform (DCT) compression. The proposed framework has three levels of processing; the first level uses approximate DCT for image compressing to eliminate all computational intensive floating-point multiplications and executing the ... View full abstract»

• ### DaDianNao: A Neural Network Supercomputer

Publication Year: 2017, Page(s):73 - 88
Cited by:  Papers (5)
Many companies are deploying services largely based on machine-learning algorithms for sophisticated processing of large amounts of data, either for consumers or industry. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be computationally and memory intensive. A number of neural network accelerato... View full abstract»

• ### Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning

Publication Year: 2017, Page(s):1946 - 1960
Cited by:  Papers (1)
Recent years, Software Defined Routers (SDRs) (programmable routers) have emerged as a viable solution to provide a cost-effective packet processing platform with easy extensibility and programmability. Multi-core platforms significantly promote SDRs' parallel computing capacities, enabling them to adopt artificial intelligent techniques, i.e., deep learning, to manage routing paths. In this paper... View full abstract»

• ### Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution

Publication Year: 2017, Page(s):834 - 847
GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. As a result register file consumes a large fraction of the total GPU chip power. This paper explores register file data compression for GPUs to improve power efficiency. Compression reduces the width of the register file read and write operations, which in turn reduces dynamic... View full abstract»

• ### Design of Approximate Radix-4 Booth Multipliers for Error-Tolerant Computing

Publication Year: 2017, Page(s):1435 - 1441
Approximate computing is an attractive design methodology to achieve low power, high performance (low delay) and reduced circuit complexity by relaxing the requirement of accuracy. In this paper, approximate Booth multipliers are designed based on approximate radix-4 modified Booth encoding (MBE) algorithms and a regular partial product array that employs an approximate Wallace tree. Two approxima... View full abstract»

• ### Hybrid Method for Minimizing Service Delay in Edge Cloud Computing Through VM Migration and Transmission Power Control

Publication Year: 2017, Page(s):810 - 819
Cited by:  Papers (3)
Due to physical limitations, mobile devices are restricted in memory, battery, processing, among other characteristics. This results in many applications that cannot be run in such devices. This problem is fixed by Edge Cloud Computing, where the users offload tasks they cannot run to cloudlet servers in the edge of the network. The main requirement of such a system is having a low Service Delay, ... View full abstract»

• ### Privacy-Preserving Public Auditing for Secure Cloud Storage

Publication Year: 2013, Page(s):362 - 375
Cited by:  Papers (253)
Using cloud storage, users can remotely store their data and enjoy the on-demand high-quality applications and services from a shared pool of configurable computing resources, without the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection in cloud computing a formidable task, espec... View full abstract»

• ### High Performance Parallel Decimal Multipliers Using Hybrid BCD Codes

Publication Year: 2017, Page(s):1994 - 2004
A parallel decimal multiplier with improved performance is proposed in this paper by exploiting the properties of three different binary coded decimal (BCD) codes, namely the redundant BCD excess-3 code (XS-3), the overloaded decimal digit set (ODDS) code and the BCD-4221/5211 code. The signed-digit radix-10 recoding is used to recode the BCD multiplier to the digit set [-5, 5] from [0, 9]. The re... View full abstract»

• ### Genetic Programming for Energy-Efficient and Energy-Scalable Approximate Feature Computation in Embedded Inference Systems

Publication Year: 2018, Page(s):222 - 236
With the increasing interest in deploying embedded sensors in a range of applications, there is also interest in deploying embedded inference capabilities. Doing so under the strict and often variable energy constraints of the embedded platforms requires algorithmic, in addition to circuit and architectural, approaches to reducing energy. A broad approach that has recently received considerable at... View full abstract»

• ### Efficient Protection of the Register File in Soft-Processors Implemented on Xilinx FPGAs

Publication Year: 2018, Page(s):299 - 304
Soft-processors implemented on SRAM-based FPGAs are increasingly being adopted in on-board computing for space and avionics applications due to their flexibility and ease of integration. However, efficient component-level protection techniques for these processors against radiation-induced upsets are necessary otherwise as system failures could manifest. A register file is one of the critical stru... View full abstract»

• ### Multi-User Computation Partitioning for Latency Sensitive Mobile Cloud Applications

Publication Year: 2015, Page(s):2253 - 2266
Cited by:  Papers (16)
Elastic partitioning of computations between mobile devices and cloud is an important and challenging research topic for mobile cloud computing. Existing works focus on the single-user computation partitioning, which aims to optimize the application completion time for one particular single user. These works assume that the cloud always has enough resources to execute the computations immediately ... View full abstract»

• ### Elliptic Curve Cryptography with Efficiently Computable Endomorphisms and Its Hardware Implementations for the Internet of Things

Publication Year: 2017, Page(s):773 - 785
Cited by:  Papers (5)
Verification of an ECDSA signature requires a double scalar multiplication on an elliptic curve. In this work, we study the computation of this operation on a twisted Edwards curve with an efficiently computable endomorphism, which allows reducing the number of point doublings by approximately 50 percent compared to a conventional implementation. In particular, we focus on a curve defined over the... View full abstract»

• ### Provably Secure Key-Aggregate Cryptosystems with Broadcast Aggregate Keys for Online Data Sharing on the Cloud

Publication Year: 2017, Page(s):891 - 904
Cited by:  Papers (2)
Online data sharing for increased productivity and efficiency is one of the primary requirements today for any organization. The advent of cloud computing has pushed the limits of sharing across geographical boundaries, and has enabled a multitude of users to contribute and collaborate on shared data. However, protecting online data is critical to the success of the cloud, which leads to the requi... View full abstract»

• ### A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns

Publication Year: 2014, Page(s):807 - 819
Cited by:  Papers (33)
Host-based anomaly intrusion detection system design is very challenging due to the notoriously high false alarm rate. This paper introduces a new host-based anomaly intrusion detection methodology using discontiguous system call patterns, in an attempt to increase detection rates whilst reducing false alarm rates. The key concept is to apply a semantic structure to kernel level system calls in or... View full abstract»

• ### A Karatsuba-Based Algorithm for Polynomial Multiplication in Chebyshev Form

Publication Year: 2010, Page(s):835 - 841
Cited by:  Papers (5)
In this paper, we present a new method for multiplying polynomials in Chebyshev form. Our approach has two steps. First, the well-known Karatsuba's algorithm is applied to polynomials constructed by using Chebyshev coefficients. Then, from the obtained result, extra arithmetic operations are used to write the final result in Chebyshev form. The proposed algorithm has a quadratic computational comp... View full abstract»

• ### DuCNoC: A High-Throughput FPGA-Based NoC Simulator Using Dual-Clock Lightweight Router Micro-Architecture

Publication Year: 2018, Page(s):208 - 221
On-chip interconnections play an important role in multi/many-processor systems-on-chip (MPSoCs). In order to achieve efficient optimization, each specific application must utilize a specific architecture, and consequently a specific interconnection network. For design space exploration and finding the best NoC solution for each specific application, a fast and flexible NoC simulator is necessary,... View full abstract»

• ### EXTREME: Exploiting Page Table for Reducing Refresh Power of 3D-Stacked DRAM Memory

Publication Year: 2018, Page(s):32 - 44
For future exascale computing systems, ultra-high-density memories would be required that consume low power to process massive data. Of the various memory devices, 3D-stacked DRAMs using TSVs are a perfect solution for this purposes. In addition to providing high capacity, these provide functional flexibility to the computing system by attaching a logic die in each 3D-stacked DRAM chip. However, t... View full abstract»

• ### Probabilistic Error Analysis of Approximate Recursive Multipliers

Publication Year: 2017, Page(s):1982 - 1990
Approximate multipliers are gaining importance in energy-efficient computing and require careful error analysis. In this paper, we present the error probability analysis for recursive approximate multipliers with approximate partial products. Since these multipliers are constructed from smaller approximate multiplier building blocks, we propose to derive the error probability in an arbitrary bit-w... View full abstract»

• ### Joint Optimization of Task Scheduling and Image Placement in Fog Computing Supported Software-Defined Embedded System

Publication Year: 2016, Page(s):3702 - 3712
Cited by:  Papers (10)
Traditional standalone embedded system is limited in their functionality, flexibility, and scalability. Fog computing platform, characterized by pushing the cloud services to the network edge, is a promising solution to support and strengthen traditional embedded system. Resource management is always a critical issue to the system performance. In this paper, we consider a fog computing supported s... View full abstract»

• ### A Generic Construction of Quantum-Oblivious-Key-Transfer-Based Private Query with Ideal Database Security and Zero Failure

Publication Year: 2018, Page(s):2 - 8
Higher security and lower failure probability have always been people’s pursuits in quantum-oblivious-key-transfer-based private query (QOKT-PQ) protocols since Jacobi et al. [Phys. Rev. A 83, 022301 (2011)] proposed the first protocol of this kind. However, higher database security generally has to be obtained at the cost of a higher failure probability, and vice v... View full abstract»

• ### Privacy Protection for Preventing Data Over-Collection in Smart City

Publication Year: 2016, Page(s):1339 - 1350
Cited by:  Papers (41)
In smart city, all kinds of users' data are stored in electronic devices to make everything intelligent. A smartphone is the most widely used electronic device and it is the pivot of all smart systems. However, current smartphones are not competent to manage users' sensitive data, and they are facing the privacy leakage caused by data over-collection. Data over-collection, which means smartphones ... View full abstract»

• ### Principal Component Analysis Based Filtering for Scalable, High Precision k-NN Search

Publication Year: 2018, Page(s):252 - 267
Approximate $k$ Nearest Neighbours (A $k$ NN) search is widely used in domains such as comput... View full abstract»

• ### Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy

Publication Year: 2018, Page(s):160 - 177
Performance and energy are now the most dominant objectives for optimization on modern parallel platforms composed of multicore CPU nodes. The existing intra-node and inter-node optimization methods employ a large set of decision variables but do not consider problem size as a decision variable and assume a linear relationship between performance and problem size and between energy consumption and... View full abstract»

• ### Cost Aware Service Placement and Load Dispatching in Mobile Cloud Systems

Publication Year: 2016, Page(s):1440 - 1452
Cited by:  Papers (7)
With proliferation of smart phones and an increasing number of services provisioned by clouds, it is commonplace for users to request cloud services from their mobile devices. Accessing services directly from the Internet data centers inherently incurs high latency due to long RTTs and possible congestions in WAN. To lower the latency, some researchers propose to `cache' the services at edge cloud... View full abstract»

• ### New Metrics for the Reliability of Approximate and Probabilistic Adders

Publication Year: 2013, Page(s):1760 - 1771
Cited by:  Papers (64)
Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact des... View full abstract»

• ### Overview of the SpiNNaker System Architecture

Publication Year: 2013, Page(s):2454 - 2467
Cited by:  Papers (127)  |  Patents (2)
SpiNNaker (a contraction of Spiking Neural Network Architecture) is a million-core computing engine whose flagship goal is to be able to simulate the behavior of aggregates of up to a billion neurons in real time. It consists of an array of ARM9 cores, communicating via packets carried by a custom interconnect fabric. The packets are small (40 or 72 bits), and their transmission is brokered entire... View full abstract»

• ### On-Chip Fault Monitoring Using Self-Reconfiguring IEEE 1687 Networks

Publication Year: 2018, Page(s):237 - 251
Efficient handling of faults during operation is highly dependent on the interval (latency) from the time embedded monitoring instruments detect faults to the time when the fault manager localizes the faults. In this article, we propose a self-reconfiguring IEEE 1687 network in which all instruments that have detected faults are automatically included in the scan path, and a fault detection and lo... View full abstract»

• ### PowerCool: Simulation of Cooling and Powering of 3D MPSoCs with Integrated Flow Cell Arrays

Publication Year: 2018, Page(s):73 - 85
Integrated Flow-Cell Arrays (FCAs) represent a combination of integrated liquid cooling and on-chip power generation, converting chemical energy of the flowing electrolyte solutions to electrical energy. The FCA technology provides a promising way to address both heat removal and power delivery issues in 3D Multiprocessor Systems-on-Chips (MPSoCs). In this paper we motivate the benefits of ... View full abstract»

• ### D$^{3}$ : A Dynamic Dual-Phase Deduplication Framework for Distributed Primary Storage

Publication Year: 2018, Page(s):193 - 207
Deploying deduplication for distributed primary storage is a sophisticated and challenging task, considering that the demands of low read/write latency, stable read/write performance, and efficient space saving are all of paramount importance. Unfortunately, existing schemes cannot present a satisfactory solution for the aforementioned requirements simultaneously. In this article, we propose D View full abstract»

• ### Extending Unix Pipelines to DAGs

Publication Year: 2017, Page(s):1547 - 1561
The Unix shell dgsh provides an expressive way to construct sophisticated and efficient non-linear pipelines. Such pipelines can use standard Unix tools, as well as third-party and custom-built components. Dgsh allows the specification of pipelines that perform non-uniform non-linear processing. These form a directed acyclic process graph, which is typically executed by multiple processor cores, t... View full abstract»

• ### Two-Factor Data Security Protection Mechanism for Cloud Storage System

Publication Year: 2016, Page(s):1992 - 2004
Cited by:  Papers (13)
In this paper, we propose a two-factor data security protection mechanism with factor revocability for cloud storage system. Our system allows a sender to send an encrypted message to a receiver through a cloud storage server. The sender only needs to know the identity of the receiver but no other information (such as its public key or its certificate). The receiver needs to possess two things in ... View full abstract»

• ### Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm

Publication Year: 2016, Page(s):2986 - 2998
Cited by:  Papers (10)
Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification... View full abstract»

• ### Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application

Publication Year: 2017, Page(s):1717 - 1733
Phishing is a major problem on the Web. Despite the significant attention it has received over the years, there has been no definitive solution. While the state-of-the-art solutions have reasonably good performance, they suffer from several drawbacks including potential to compromise user privacy, difficulty of detecting phishing websites whose content change dynamically, and reliance on features ... View full abstract»

• ### Phase-Change Memory Optimization for Green Cloud with Genetic Algorithm

Publication Year: 2015, Page(s):3528 - 3540
Cited by:  Papers (136)
Green cloud is an emerging new technology in the computing world in which memory is a critical component. Phase-change memory (PCM) is one of the most promising alternative techniques to the dynamic random access memory (DRAM) that faces the scalability wall. Recent research has been focusing on the multi-level cell (MLC) of PCM. By precisely arranging multiple levels of resistance inside a PCM ce... View full abstract»

• ### Compact CA-Based Single Byte Error Correcting Codec

Publication Year: 2018, Page(s):291 - 298
Memory contents are usually corrupted due to soft errors caused by external radiation and hence the reliability of memory systems is reduced. In order to enhance the reliability of memory systems, error correcting codes (ECC) are widely used to detect and correct errors. Single bit error correcting with double bits errors detecting codes are generally used in memory systems. But in case of multipl... View full abstract»

• ### Graph-Based Algorithms for Boolean Function Manipulation

Publication Year: 1986, Page(s):677 - 691
Cited by:  Papers (3490)  |  Patents (140)
In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions on the ordering of decision variables in the graph. Although a function requires, in the worst case, a gr... View full abstract»

• ### DFT Computation Using Gauss-Eisenstein Basis: FFT Algorithms and VLSI Architectures

Publication Year: 2017, Page(s):1442 - 1448
Cited by:  Papers (1)
A joint numerical representation based on both Gaussian and Eisenstein integers is proposed. This Gauss-Eisenstein representation maps complex numbers into four-tuples of integers with arbitrarily high precision. The representation furnishes the computation of the 3-, 6-, and 12-point discrete Fourier transform (DFT) at any desired accuracy. The associated fast algorithms based on the Gauss-Eisens... View full abstract»

• ### Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation

Publication Year: 2016, Page(s):2638 - 2644
Cited by:  Papers (4)
The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix-$4$ (or modified Booth) algorithm is very efficie... View full abstract»

• ### Customizing Clos Network-on-Chip for Neural Networks

Publication Year: 2017, Page(s):1865 - 1877
Large-scale neural network accelerators are often implemented as a many-core chip and rely on a network-on-chip to manage the huge amount of inter-neuron traffic. The baseline and different variations of the well-known mesh and tree topologies are the most popular topologies in prior many-core implementations of neural networks. However, the grid-like mesh and hierarchical tree topologies suffer f... View full abstract»

• ### Aging-aware Workload Management on Embedded GPU Under Process Variation

Publication Year: 2018, Page(s): 1
Graphics Processing Units (GPUs) have been employed in embedded systems to handle increased amounts of computation and to satisfy the timing requirement. Due to the small feature size, chip aging and within-die parameter variations have been considered to be among the challenging problems for state-of-the-art processors, including GPUs. In order to deal with the process variation, several processo... View full abstract»

• ### Stochastic neural computation. I. Computational elements

Publication Year: 2001, Page(s):891 - 905
Cited by:  Papers (186)  |  Patents (1)
This paper examines a number of stochastic computational elements employed in artificial neural networks, several of which are introduced for the first time, together with an analysis of their operation. We briefly include multiplication, squaring, addition, subtraction, and division circuits in both unipolar and bipolar formats, the principles of which are well-known, at least for unipolar signal... View full abstract»

• ### Cross-Platform Resource Scheduling for Spark and MapReduce on YARN

Publication Year: 2017, Page(s):1341 - 1353
Cited by:  Papers (1)
While MapReduce is inherently designed for batch and high throughput processing workloads, there is an increasing demand for non-batch processes on big data, e.g., interactive jobs, real-time queries, and stream computations. Emerging Apache Spark fills in this gap, which can run on an established Hadoop cluster and take advantages of existing HDFS. As a result, the deployment model of Spark-on-YA... View full abstract»

• ### Discrete Cosine Transform

Publication Year: 1974, Page(s):90 - 93
Cited by:  Papers (1645)  |  Patents (58)
A discrete cosine transform (DCT) is defined and an algorithm to compute it using the fast Fourier transform is developed. It is shown that the discrete cosine transform can be used in the area of digital processing for the purposes of pattern recognition and Wiener filtering. Its performance is compared with that of a class of orthogonal transforms and is found to compare closely to that of the K... View full abstract»

• ### STABLE: Stress-Aware Boolean Matching to Mitigate BTI-Induced SNM Reduction in SRAM-Based FPGAs

Publication Year: 2018, Page(s):102 - 114
Biased-Temperature-Instability (BTI) aging mechanism reduces Static-Noise-Margin (SNM) of SRAM cells. This leads to a higher Soft-Error-Rate (SER), lower reliability, and lower SRAMs’ stability in FPGAs. SNM partially improves by leveraging the recovery phase of BTI through flipping SRAM content. We propose STABLE, a three-step post-synthes... View full abstract»

• ### Non-Volatile Memory Based Page Swapping for Building High-Performance Mobile Devices

Publication Year: 2017, Page(s):1918 - 1931
Smartphones are getting increasingly high-performance with advances in mobile processors and larger main memories to support feature-rich applications. However, the storage subsystem has always been a prohibitive factor that slows down the pace of reaching even higher performance while maintaining good user experience. Despite today's smartphones are equipped with larger-than-ever main memories, t... View full abstract»

• ### Majority Logic Formulations for Parallel Adder Designs at Reduced Delay and Circuit Complexity

Publication Year: 2017, Page(s):1824 - 1830
The design of high-performance adders has experienced a renewed interest in the last few years; among high performance schemes, parallel prefix adders constitute an important class. They require a logarithmic number of stages and are typically realized using AND-OR logic; moreover with the emergence of new device technologies based on majority logic, new and improved adder designs are possible. Ho... View full abstract»

• ### Identity-Based Encryption with Outsourced Revocation in Cloud Computing

Publication Year: 2015, Page(s):425 - 437
Cited by:  Papers (60)
Identity-Based Encryption (IBE) which simplifies the public key and certificate management at Public Key Infrastructure (PKI) is an important alternative to public key encryption. However, one of the main efficiency drawbacks of IBE is the overhead computation at Private Key Generator (PKG) during user revocation. Efficient revocation has been well studied in traditional PKI setting, but the cumbe... View full abstract»

• ### Public Integrity Auditing for Shared Dynamic Cloud Data with Group User Revocation

Publication Year: 2016, Page(s):2363 - 2373
Cited by:  Papers (5)
The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for shared dynamic data. However, these schemes are still not secure against the collusion of cloud storage server an... View full abstract»

• ### Latch-Based Structure: A High Resolution and Self-Reference Technique for Hardware Trojan Detection

Publication Year: 2017, Page(s):100 - 113
Hardware Trojan detection has been the subject of many studies in the realm of hardware security in the recent years. The effectiveness of current techniques proposed for Trojan detection is limited by some factors, process variation noise being a major one. This paper introduces latch-based structures as a self-reference detection technique which uses in-circuit path delays as golden reference mo... View full abstract»

• ### A Secure Phase-Encrypted IEEE 802.15.4 Transceiver Design

Publication Year: 2017, Page(s):1421 - 1427
Cited by:  Papers (1)
With the proliferation of Internet of Things (IoT), the IEEE 802.15.4 physical layer is becoming increasingly popular due to its low power consumption. However, secure data communication over the network is a challenging issue because vulnerabilities in the existing security primitives lead to several attacks. The mitigation of these attacks separately adds significant computing burden on the legi... View full abstract»

