• ### Efficient digital implementation of the sigmoid function for reprogrammable logic

Publication Year: 2003, Page(s):403 - 411



Special attention must be paid to an efficient approximation of the sigmoid function in implementing FPGA-based reprogrammable hardware-based artificial neural networks. Four previously published piecewise linear and one piecewise second-order approximation of the sigmoid function are compared with SIG-sigmoid, a purely combinational approximation. The approximations are compared in terms of speed... View full abstract»

• ### Modified Montgomery modular multiplication and RSA exponentiation techniques

Publication Year: 2004, Page(s):402 - 408



Modified Montgomery multiplication and associated RSA modular exponentiation algorithms and circuit architectures are presented. These modified multipliers use carry save adders (CSAs) to perform large word length additions. These have the attraction that, when repeatedly used to perform RSA modular exponentiation, the (carry save) format of the output words is compatible with that required by the... View full abstract»

• ### Finite state machine encoding for VHDL synthesis

Publication Year: 2001, Page(s):23 - 30



Finite state machine (FSM) optimisation has usually been studied through state assignment, state vector encoding, and combinational logic optimisation. Such details should not be consequential in behavioural descriptions. On the other hand, describing correct and efficient hardware structures in VHDL (VHSIC hardware description language), or generally in any high-level description language, is mor... View full abstract»

• ### Retargetable compilers and architecture exploration for embedded processors

Publication Year: 2005, Page(s):209 - 223



Retargetable compilers can generate assembly code for a variety of different target processor architectures. Owing to their use in the design of application-specific embedded processors, they bridge the gap between the traditionally separate disciplines of compiler construction and electronic design automation. In particular, they assist in architecture exploration for tailoring processors towards... View full abstract»

• ### Relationship between user models in HCI and AI

Publication Year: 1994, Page(s):99 - 103



Considers the relationship between the two independent lines of research into user modelling activities in human-computer interaction (HCI) and artificial intelligence (AI). The paper begins by considering the potential use of AI user models in HCI. The user modelling research conducted in HCI and AI respectively is then discussed along with the goals of HCI and AI. Considerable differences underl... View full abstract»

• ### Alternative systolic array for non-square-root Cholesky decomposition

Publication Year: 1997, Page(s):57 - 64



A novel systolic array for the non-square-root LDLT method of Cholesky decomposition is presented. The systolic array is an alternative to that proposed by Brent and Luk (1983); displaying an improvement in efficiency and a lower hardware cost, arising from a reduction in the number of multipliers required. An application for the array is in real-time digital signal processing hardware ... View full abstract»

• ### Comparison of dynamic and static load-balancing strategies in heterogeneous distributed systems

Publication Year: 1997, Page(s):100 - 106



Although dynamic load-balancing strategies have the potential of performing better than static strategies, they are inevitably more complex. Their complexity and the overheads involved may negate their benefits. A heterogeneous distributed system, with computers of different processing capability but the same functionality, has been examined for two dynamic and two static policies. The results sho... View full abstract»

• ### Modular dynamic reconfiguration in Virtex FPGAs

Publication Year: 2006, Page(s):157 - 164



Modular systems implemented on field-programmable gate arrays (FPGAs) can benefit from being able to load and unload modules at run-time, a concept that is of much interest in the research community. Although dynamic partial reconfiguration is possible in Virtex and Spartan series FPGAs, the configuration architecture of these devices is not amenable to modular reconfiguration, a limitation which ... View full abstract»

• ### Ant colony optimisation for task matching and scheduling

Publication Year: 2006, Page(s):373 - 380



PC clusters have recently received considerable interest as cost-effective parallel platforms for CPU-intensive applications. A cluster of PCs generally comprises of a collection of heterogeneous process elements (PEs). To make effective use of a PC cluster, a parallel program, which is characterised by a node- and edge-weighted directed acyclic graph (DAG), can usually be decomposed into a set of... View full abstract»

• ### Constant-division algorithms

Publication Year: 1994, Page(s):334 - 340



There exist many types of special-purpose systems that require rapid and repeated division by a set of known constant divisors. Numerous solutions have been proposed in response to the deficiencies of the conventional division algorithms for applications which involve repeated divisions by known constants. Six approaches are reviewed in detail and their relationships are shown by reducing them to ... View full abstract»

• ### Critique of the paper Novel design of arithmetic coding for data compression'

Publication Year: 1997, Page(s):394 - 396


The author provides a critique of the paper Novel design of arithmetic coding for data compression' by J. Jiang (ibid., vol. 142, no. 6, pp. 419-24, 1995). A number of problems in the proposed implementation are pointed out and the origins of some of the described mechanisms are discussed View full abstract»

• ### Multiplier architectures for GF(p) and GF(2n)

Publication Year: 2004, Page(s):147 - 160



Two new hardware architectures are proposed for performing multiplication in GF(p) and GF (2n), which are the most time-consuming operations in many cryptographic applications. The architectures provide very fast and efficient execution of multiplication in both GF(p) and GF(2n), and can be mainly used in elliptic curve cryptography. Both architectures are scalable and theref... View full abstract»

• ### Efficient sharing (broadcasting) of multiple secrets

Publication Year: 1995, Page(s):237 - 240



Instead of using the conventional m-out-of-n perfect secret sharing scheme to protect a single secret among n users, the authors propose a secret sharing scheme based on one cryptographic assumption to protect multiple secrets. It is shown that, with this relaxation of the security requirement, secret sharing and some related secret-sharing problems, such as cheater detection and secret broadcasti... View full abstract»

• ### Modular multiplication method

Publication Year: 1998, Page(s):317 - 318



The Montgomery algorithm has been widely used in modern cryptography because it is effective for modular exponentiation. However, it is not efficient when used for just a few modular multiplications. Inefficiency is due to the large overhead involved in the residue transformation of arguments. A new modular multiplication method using the Montgomery reduction algorithm is presented which can elimi... View full abstract»

• ### System level performance analysis - the SymTA/S approach

Publication Year: 2005, Page(s):148 - 166



SymTA/S is a system-level performance and timing analysis approach based on formal scheduling analysis techniques and symbolic simulation. The tool supports heterogeneous architectures, complex task dependencies and context aware analysis. It determines system-level performance data such as end-to-end latencies, bus and processor utilisation, and worst-case scheduling scenarios. SymTA/S furthermor... View full abstract»

• ### Reconfigurable computing: architectures and design methods

Publication Year: 2005, Page(s):193 - 207



Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and special-purpose design methods. I... View full abstract»

• ### 1 GHz 64-bit high-speed comparator using ANT dynamic logic with two-phase clocking

Publication Year: 1998, Page(s):433 - 436



A high-speed 64-bit comparator using two-phase clocking dynamic CMOS logic with modified noninverting all-N-transistor block is presented. The pull-up charging and pull-down discharging of a comparator unit are accelerated by inserting two feedback MOS transistors between the evaluation N-block and the output. Detailed simulation results reveal appropriate L/W guidelines for the all-N-transistor b... View full abstract»

• ### Integrated approach for fault tolerance and digital signature in RSA

Publication Year: 1999, Page(s):151 - 159



Data security and fault tolerance are two important issues in modern communications. In most cases, they are studied and implemented separately. The author proposes an integrated approach for both fault tolerance and digital signature in the RSA implementation. It shares the same computations required by the hash function, which is the major part of the digital signature and error detections and c... View full abstract»

• ### Dynamic scheduling of tasks on partially reconfigurable FPGAs

Publication Year: 2000, Page(s):181 - 188



Field-programmable gate arrays (FPGAs) which allow partial reconfiguration at run time can be shared among multiple independent tasks. When the sequence of tasks to be performed is unpredictable, the FPGA controller needs to make allocation decisions online. Since online allocation suffers from fragmentation, tasks can end up waiting despite there being sufficient, albeit noncontiguous, resources ... View full abstract»

• ### Pollaczek-Khinchin formula for the M/G/1 queue in discrete time with vacations

Publication Year: 1997, Page(s):222 - 226



The continuous-time M/G/1 queue with vacations has been studied by many researchers. In the paper the authors report on an investigation of the discrete-time M/G/1 queue using Little's formula and conditional expectation. This direct approach can also be adopted to study the continuous-time case View full abstract»

• ### Densely packed decimal encoding

Publication Year: 2002, Page(s):102 - 104



Chen-Ho encoding is a lossless compression of three binary coded decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations. An improvement to the encoding which has the same advantages but is not limited to multiples of three digits is described. The new encoding allows arbitrary-length decimal numbers to be coded efficiently while keeping... View full abstract»

• ### High-performance compensation technique for the radix-4 CORDIC algorithm

Publication Year: 2002, Page(s):219 - 228



Although the full radix-4 CORDIC algorithm is efficient compared to the standard radix-2 version, the scale-factor overhead causes its improvement to be limited. In this work, an algorithm and its associated architecture have been proposed for parallel compensation of the scale factor for the radix-4 CORDIC algorithm in the rotation mode. The proposed method, which makes no prior assumptions about... View full abstract»

• ### Efficient group signature scheme based on the discrete logarithm

Publication Year: 1998, Page(s):15 - 18



Group signatures, introduced by D. Chaum and E. Van Heyst (1991), allow individual members to make signatures on behalf of the group while providing anonymity. All previously proposed schemes, as far as we know, are not very efficient in terms of computational, communication and storage costs. In the paper, we describe a novel group signature that is used to reflect and project the actual needs ar... View full abstract»

• ### Low-power variable-length fast Fourier transform processor

Publication Year: 2005, Page(s):499 - 506



Fast Fourier transform (FFT) processing is one of the key procedures in the popular orthogonal frequency division multiplexing (OFDM) communication systems. Structured pipeline architectures and low power consumption are the main concerns for its VLSI implementation. In the paper, the authors report a variable-length FFT processor design that is based on a radix-2/4/8 algorithm and a single-path d... View full abstract»

• ### Signature-monitoring technique based on instruction-bit grouping

Publication Year: 2005, Page(s):527 - 536



A new concurrent error-detection scheme monitors the signatures in online detection of instruction memory and control flow errors caused by transient and intermittent faults. The proposed signature-monitoring technique is based on the grouping of column bit information of instructions in a block to produce the block signature. The grouping size that represents the number of bits in a group could a... View full abstract»

• ### Three-stage compression approach to reduce test data volume and testing time for IP cores in SOCs

Publication Year: 2005, Page(s):704 - 712



A three-stage compression technique that reduces test data volume and test application time for scan-based testing of intellectual property (IP) cores in system-on-chip integrated circuits is presented. In the first stage, referred to as width compression, the concept of scan chain compatibilities is combined with a method that exploits the logic dependencies between scan chains. This leads to a g... View full abstract»

• ### Using hyperprediction to compensate for delayed updates in value predictors

Publication Year: 2005, Page(s):596 - 608


Value prediction has been proposed as a technique to break true data dependences in order to increase the instruction-level parallelism available in programs. Recent work has pointed out, however, that the delay inherent in updating the value prediction table with the actual correct value can introduce a substantial number of wrong value predictions, which can then decrease the overall processor p... View full abstract»

• ### Gigabyte per second streaming lossless data compression hardware based on a configurable variable-geometry CAM dictionary

Publication Year: 2006, Page(s):47 - 58



A high-throughput lossless data compression IP core built around a CAM-based dictionary whose number of available entries and data word width adjust to the characteristics of the incoming data stream is presented. These two features enhance model adaptation to the input data, improving compression efficiency, and enable greater throughputs as a multiplicity of bytes can be processed per cycle. A p... View full abstract»

• ### High-security asynchronous circuit implementation of AES

Publication Year: 2006, Page(s):71 - 77



The authors present a novel circuit implementation of the advanced encryption standard using self-timed dual-rail technology. The design reduces leakage of internal information through balanced power consumption, which is achieved by avoidance of glitches and by data-independent switching behaviour. The design utilises a pipeline structure with built-in controllers and novel, highly balanced secur... View full abstract»

• ### Low-power bus encoding with crosstalk delay elimination

Publication Year: 2006, Page(s):93 - 100



In deep-submicron technology, minimising the propagation delay and power consumption on buses is the most important design objective in system-on-chip design. In particular, the coupling effects between wires on the bus that can cause serious problems such as crosstalk delay, noise and power consumption. Most of the previous work on bus encoding targeted either (1) minimising the power consumption... View full abstract»

• ### Low power system on chip bus encoding scheme with crosstalk noise reduction capability

Publication Year: 2006, Page(s):101 - 108



Inter-wire coupling is a major source of wire load and delay faults for on-chip buses implemented in ultra-deep submicron system on chip (SoC) systems. Elimination or minimisation of such faults is crucial to the performance and reliability of SoC designs. A novel on-chip bus encoding scheme targeting high-performance generic SoC systems is presented. In addition to its efficiency in terms of powe... View full abstract»

• ### Leakage power analysis and reduction: models, estimation and tools

Publication Year: 2005, Page(s):353 - 368



The high leakage current in the nanometre regime is becoming a significant proportion of power dissipation in CMOS circuits as threshold voltage, channel length and gate oxide thickness are scaled. Consequently, the identification and estimation of different leakage currents are very important in designing low power circuits. In the paper a methodology for accurate estimation of the total leakage ... View full abstract»

• ### Low-power RT-level synthesis techniques: a tutorial

Publication Year: 2005, Page(s):333 - 343



Power consumption and power-related issues have become a first-order concern for most designs and loom as fundamental barriers for many others. While the primary method used to date for reducing power has been supply voltage reduction, this technique begins to lose its effectiveness as voltages drop to below one volt and further reductions in the supply voltage begin to create more problems than a... View full abstract»

• ### Architecture description languages for programmable embedded systems

Publication Year: 2005, Page(s):285 - 297



Embedded systems present a tremendous opportunity to customise designs by exploiting the application behaviour. Shrinking time-to-market, coupled with short product lifetimes, create a critical need for rapid exploration and evaluation of candidate architectures. Architecture description languages (ADL) enable exploration of programmable architectures for a given set of application programs under ... View full abstract»

• ### Resource-constrained system-on-a-chip test: a survey

Publication Year: 2005, Page(s):67 - 81



Manufacturing test is a key step in the implementation flow of modern integrated electronic products. It certifies the product quality, accelerates yield learning and influences the final cost of the device. With the ongoing shift towards the core-based system-on-a-chip (SOC) design paradigm, unique test challenges, such as test access and test reuse, are confronted. In addition, when addressing t... View full abstract»

• ### Virtually scaling-free adaptive CORDIC rotator

Publication Year: 2004, Page(s):448 - 456



The authors propose a coordinate rotation digital computer (CORDIC) rotator algorithm that eliminates the problems of scale factor compensation and limited range of convergence associated with the classical CORDIC algorithm. In the proposed scheme, depending on the target angle or the initial coordinate of the vector, a scaling by 1 or 1/√2 is needed that can be realised with minimal hardwar... View full abstract»

• ### Realisation of multiple-valued functions using the capacitive threshold logic gate

Publication Year: 2004, Page(s):435 - 447



The circuit-level hardware realisation of several multiple-valued logic functions using the capacitive threshold logic design style is presented. The generic design approach for multiple-input, multiple-output and multiple-level transfer functions is shown. SPICE simulations of complex operators demonstrate correct operation which qualifies the proposed circuits for integration into larger multipl... View full abstract»

• ### Efficient hardware architecture for fast IP address lookup

Publication Year: 2003, Page(s):43 - 52



A multi-gigabit Internet protocol (IP) router may receive several million packets per second from each input link. For each packet, the router needs to find the longest matching prefix in the forwarding table in order to determine the packet's next-hop. An efficient hardware solution for the IP address lookup problem is presented. The problem is modelled as a searching problem on a binary-trie. Th... View full abstract»

• ### Self-diagnostic tools of the APEmille parallel machine

Publication Year: 2002, Page(s):273 - 279



The authors describe the self-diagnostic tools of the APEmille SIMD machine, whose logical architecture is a three-dimensional torus of processors. The tools are aimed at implementing system-level diagnosis using a comparison model. The diagnostic model accounts for some critical features of the APEmille architecture, and has been validated by means of VHDL simulation. Essentially, diagnostic tool... View full abstract»

• ### Earle latch design for high performance pipeline

Publication Year: 2002, Page(s):245 - 248


A modified Earle latch, that requires only one copy of the input data instead of the two copies required in the original Earle scheme is presented. In SPICE simulations, the modified Earle latch has the smallest area and lowest power dissipation compared to other static latches. Chip implementation shows that the speed of an adder using the proposed latch outputs can be improved from 33 MHz to 60 ... View full abstract»

• ### PayFair: a prepaid internet ensuring customer fairness micropayment scheme

Publication Year: 2001, Page(s):207 - 213



A software-based prepaid micropayment scheme is developed. As with existing prepaid micropayment schemes, the profits of the merchants are protected. Furthermore, in this proposed scheme, fairness for the customers is also assured. More precisely, in this new scheme, the merchant, after receiving prepaid money, can only claim that a customer has already spent a specific amount of money by showing ... View full abstract»

• ### High-bandwidth x86 instruction fetching based on instruction pointer table

Publication Year: 2001, Page(s):113 - 118



Providing higher degree superscalar instruction fetching is a major concern in a high performance superscalar processor design. In x86 architectures, the variable-length instructions make fetching multiple instructions in a cycle difficult. A common practice is to use predecoded information to help in instruction fetching, while the complex instruction formats induce high redundancies in storing a... View full abstract»

• ### Efficient coverage analysis metric for HDL design validation

Publication Year: 2001, Page(s):1 - 6



Simulation is still the primary approach for the functional verification of register-transfer level circuit descriptions written in hardware description language (HDL). The major problem of the simulation approach is to choose a good metric to gauge the quality of the test patterns. The finite state machine (FSM) coverage test can find most of the design errors in a FSM. However, it is impractical... View full abstract»

• ### Tree-structured LFSR synthesis scheme for pseudo-exhaustive testing of VLSI circuits

Publication Year: 2000, Page(s):343 - 348



A new test architecture, called TLS (tree-LFSR/SR), generates pseudo-exhaustive test patterns for both combinational and sequential VLSI circuit is presented. Instead of using a single scan chain, the proposed test architecture routes a scan tree driven by the LFSR to generate all possible input patterns for each output cone. The new test architecture is able to take advantages of both signal shar... View full abstract»

• ### Hardware-efficient systolic architecture for inversion and division in GF(2m)

Publication Year: 1998, Page(s):272 - 278



Two parallel-in parallel-out systolic arrays for computing inverses and divisions in finite fields GF(2m) with the standard basis representation are presented. Both architectures realise a new variant of Euclid's algorithm. One of the proposed arrays involves O(m2) area complexity and O(1) time complexity, while the other involves O(m) area complexity and O(m) time complexity... View full abstract»

• ### CodeSign: an embedded system design environment

Publication Year: 1998, Page(s):171 - 180



A modelling environment is described for the automated design of embedded systems. The basic model of computation consists of a class of high-level time Petri nets augmented with object-oriented mechanisms. It is formal, ensuring unambiguous specification, supports a high level of analysis and is general enough to support other more specialised formalisms. This model constitutes a major part of th... View full abstract»

• ### Deadlock detection using (0, 1)-labelling of resource allocation graphs

Publication Year: 1998, Page(s):68 - 72



A deadlock detection method based on the use of the resource allocation graph is presented. The method is different from the existing deadlock avoidance techniques in that the original directed resource allocation graph is first transformed into an undirected (0 1)-labelled graph in which the deadlock would occur only if a cycle has been labelled alternatingly with 0s and 1s. The algorithm is appl... View full abstract»

• ### Balanced Boolean functions

Publication Year: 1998, Page(s):52 - 62



Many common logic circuits such as adders, parity checkers and multiplexers realise Boolean functions that are true for exactly half their input combinations, and false for the other half; we refer to such functions as balanced. Recently, these functions have been shown to be very useful for testing logic circuits, and for data encryption in cryptography. Here, we present a general theory of balan... View full abstract»

• ### Design for the discrete cosine transform in VLSI

Publication Year: 1998, Page(s):127 - 133



The discrete cosine transform is reviewed with the aid of recent implementations of the 8×8 transform. The distinct roles of algorithmic and multiplier design are identified, and key circuit and logic innovations are highlighted View full abstract»

• ### Gaussian-elimination-based algorithm for solving linear equations on mesh-connected processors

Publication Year: 1996, Page(s):407 - 412


The problem of solving a system of N linear equations on a mesh-connected multiprocessor structure is considered. The solution to the problem is obtained by using a Gaussian-elimination-based algorithm called `successive Gaussian elimination'. The new algorithm does not contain a separate backsubstitution phase. A two-dimensional array of N×(N+1) processors is employed to obtain the solution... View full abstract»

