By Topic

Computers and Digital Techniques, IEE Proceedings -

Issue 6 • Date Nov 1996

Filter Results

Displaying Results 1 - 15 of 15
  • Exact minimisation of Kronecker expressions for symmetric functions

    Page(s): 349 - 354
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (604 KB)  

    In the paper, an algorithm for the exact minimisation of Kronecker expressions (KROs) for totally symmetric functions is presented. KROs are a class of AND/EXOR forms closely related to ordered Kronecker functional decision diagrams (OKFDDs). This close relation is used to obtain a polynomial time minimisation algorithm. A generalisation to partially symmetric functions is investigated. Experimental results in comparison to previously published methods are given to show the efficiency of the approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • XMESH interconnection network for massively parallel computers

    Page(s): 401 - 406
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (508 KB)  

    An XMESH is proposed as a suitable interconnection network for massively parallel computers, and the performance of the proposed interconnection network is analysed. The XMESH has the same horizontal links as those of the toroidal mesh (TMESH); however, it has diagonally crossed links instead of vertical links. The proposed XMESH shows desirable characteristics as an interconnection network for massively parallel computers as the number of nodes increases, while retaining the structural advantages of the TMESH such as the symmetric structure and constant degree of K=4. Analytical performance evaluations show that the XMESH has a shorter diameter, a shorter mean internode distance, and a higher message-completion rate than the TMESH or the diagonal mesh (DMESH). To confirm these results, an optimal self-routing algorithm for the proposed topology is developed and is used to simulate the maximum delay, the average delay and the throughput in the presence of contention. In all cases, the XMESH is shown to outperform the TMESH and the DMESH, and can provide an attractive alternative to those networks in implementing massively parallel computers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-effective novel flexible cell-level systolic architecture for high throughput implementation of 2-D FIR filters

    Page(s): 436 - 439
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (300 KB)  

    Recurrence relations and fully pipelined novel cell-level systolic architectures are suggested for massively parallel implementation of two-dimensional FIR and linear phase FIR filters. Owing to the higher level of parallelism, the proposed structures would yield more throughput over the existing structures. Besides, it can be flexibly configured according to the throughput requirement of the application for the chosen processor technology for cost-effective implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three-valued quasi-linear transformation for logic synthesis

    Page(s): 391 - 400
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (984 KB)  

    A nonlinear transformation, called the sign transform, is defined and studied. It maps all the ternary functions onto themselves, is uniquely invertible and is inherently related to Walsh and Walsh Galois transforms. Decomposable Boolean functions and ternary product terms are mapped analytically into the sign domain. It is shown that sign coefficients describe the structure of a ternary output logic network implemented by a complete basis: EXOR of variables and sign function. Methods are discussed for direct transformation of switching functions without having to use the butterfly algorithm: Shannon's decomposition in the sign domain and, in general, merging sign spectra of subfunctions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimum partitioning of rectilinear layouts

    Page(s): 440 - 442
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (372 KB)  

    Given a set S of nonoverlapping axis-parallel rectangles placed inside a rectangular region B, a partition of the space B/S is called a `free space partition'. A simple proof of a formula on the number of rectangles in a minimum free space partition is presented. Based on this formula, it is shown that a minimum free space partition can be computed using a well known geometric graph search algorithm in O(n3/2 log n) time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability levels for fault-tolerant linear processing using real number error correction

    Page(s): 355 - 363
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1000 KB)  

    Fault-tolerant linear processing systems using real number codes, as in algorithm-based fault tolerance methodologies, can be extended to include error correction for correcting intermittent errors in the processed output data. The reliability function for a protected system containing correction is considered under simple but realistic assumptions on the arrival of failures in both the normal processing and corrector subassemblies. An error-correcting procedure employing a real, convolutional code is described briefly, and a system diagram of a corrector subsystem containing self-checking comparators for detecting its internal failures is presented. Failures are modelled as arriving according to Poisson processes in both parts of the fault-tolerant system. Formulas for the reliability of the protected system are given and an efficient lower bound is developed. This bound depends only on broad details such as relative areas used in a VLSI design of the processing and corrector parts. Computational methods employing computer algebra packages are discussed, and some typical reliability curves for two configurations demonstrate the dramatic improvement which error correction introduces View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Genetic algorithm for variable ordering of OBDDs

    Page(s): 364 - 368
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (576 KB)  

    A genetic algorithm (GA) is applied to find a variable ordering that minimises the size of ordered binary decision diagrams (OBDDs). OBDDs are a data structure for representation and manipulation of Boolean functions often applied in CAD. The choice of the variable ordering largely influences the size of the OBDD (i.e. its size may vary from polynomial to exponential in the number of variables). Dynamic variable ordering is the state-of-the-art method for finding good variable orderings. In the paper it is shown by experimental results that better sizes can be obtained using GAs. The authors' GA approach is a practical alternative to the exact algorithm for variable ordering. It produced optimal results for most considered benchmarks, but it is also applicable to functions with more than 20 variables due to its short runtimes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New program model for program partitioning on NUMA multiprocessor systems

    Page(s): 431 - 435
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (512 KB)  

    A new program model is presented to accurately represent parallel programs for partitioning and scheduling problems. This model extends the graphic representation of the macrodataflow by considering the complex communication options supported by NUMA systems. The proposed model shows not only task precedence relations but also data sharing status. Moreover, a new partitioning method based on the proposed model is also developed. Experimental results show that many conventional partitioning algorithms operate more efficiently using the proposed model, and that the proposed algorithm surpasses existing algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Alternative algorithm for optimisation of Reed-Muller universal logic module networks

    Page(s): 385 - 390
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (584 KB)  

    Reed-Muller universal logic modules (RM-ULMs) can be used as individual modules or as networks to implement RM functions of any number of variables. For a large RM expansion, it is generally more efficient to realise it using cascades of low-order RM-ULMs. A programmed algorithm for the optimisation of the number of modules at the sub-system level has already been published. In the paper an alternative algorithm is presented which performs similar optimisation of fixed-polarity generalised RM expansions but without the need to maintain a saved-branch counter and an input record. Also, whenever the number of piterms having the specific control-tuple-state exceeds two, it is not necessary for the algorithm to perform separate variable examinations. Consequently, the saving in computation time will be especially significant in RM expansions with large number of variables. The resulting algorithm is simple in structure and can be easily implemented using high-level languages such as C or Fortran View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Difference clocks: a new scheme for logical time in distributed systems

    Page(s): 426 - 430
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (536 KB)  

    Logical clocks and vector clocks were proposed in the past to capture causality between events of different processes of a distributed computation. However, these clocks could not capture intraprocess concurrency. Later, bit-matrix clocks and hierarchical clocks were developed to capture interprocess concurrency as well as intraprocess concurrency. The major disadvantages of these clocks are the associated storage and communication overheads. To overcome these disadvantages, the authors introduce the concept of difference clocks. The difference clocks maintain minimal information about the differences among various local clocks. It is shown that they can be used to reconstruct the bit-matrix clocks and result in substantial reduction in storage space and communication overhead View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Branch merging for scheduling concurrent executions of branch operations

    Page(s): 369 - 375
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (884 KB)  

    Branches are a major limiting factor to instruction-level parallelism. One solution is to execute several branches simultaneously using multiway branching architectures. Such architectures are especially important when the instruction issue width becomes large. The authors study the problem of compile-time scheduling of branch operations on such architectures: an optimisation called branch merging. The scheduling attempts to bring profitable branches together for concurrent execution. It is shown that finding the optimal solution to the branch merging problem is NP-hard. A heuristic is then proposed, which relies on a cost model to direct the merging of branches and their associated basic blocks. Merged branches are then scheduled together for concurrent execution. The authors used simulation to evaluate the effectiveness of the proposed algorithm. Experiments on selected benchmark programs show that the heuristic achieves roughly a 10% performance improvement on multiway branching architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Branch merging for scheduling concurrent executions of branch operations

    Page(s): 369 - 375
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (708 KB)  

    Branches are a major limiting factor to instruction-level parallelism. One solution is to execute several branches simultaneously using multiway branching architectures. Such architectures are especially important when the instruction issue width becomes large. The authors study the problem of compile-time scheduling of branch operations on such architectures: an optimisation called branch merging. The scheduling attempts to bring profitable branches together for concurrent execution. It is shown that finding the optimal solution to the branch merging problem is NP-hard. A heuristic is then proposed, which relies on a cost model to direct the merging of branches and their associated basic blocks. Merged branches are then scheduled together for concurrent execution. The authors used simulation to evaluate the effectiveness of the proposed algorithm. Experiments on selected benchmark programs show that the heuristic achieves roughly a 10% performance improvement on multiway branching architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Layout-driven chaining of scan flip-flops

    Page(s): 421 - 425
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (524 KB)  

    In an era of submicron technology, routing is becoming a dominant factor in area, timing and power consumption. The problem of scan flip-flops chaining with the objective of achieving minimum routing area overhead is studied. The first attempt is to chain the flip-flops at the logic level. To make more accurate decisions on chaining flip-flops, the second attempt is to perform the chaining of scan flip-flops taking layout information into consideration. Specifically, the authors show that the chaining problem is a travelling salesman problem (TSP). Then, two heuristics, greedy and matching-based algorithms, are proposed to solve the TSP problem. Various cost functions are defined which take layout information into account. Benchmarking results show that the cost function achieves the best results when it considers placement and routing information and is dynamically updated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Removing CSC violations in asynchronous circuits by delay padding

    Page(s): 413 - 420
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (936 KB)  

    A novel alternative for removing CSC (complete state coding) violations in asynchronous circuit synthesis for STGs (signal transition graphs) is presented. The main feature of the work is to exploit delays in the physical circuit to remove CSC violations. Its main advantages are that it: does not need to obey the noninput constraint: and saves area overhead when a CSC violation in the state graph does not actually appear in the physical circuit. The delay constraint for removing each CSC violation is formulated. Then an algorithm is proposed to derive a consistent set of constraints to ensure that all violations are removed. If a consistent set exists, it is shown that those constraints can always be satisfied by padding delays during hazard analysis, and therefore hazard-free circuits without any CSC violation can be derived. Based on this approach, the marked-graph benchmarks, hitherto unsolvable due to the noninput constraint in existing methods, are now resolved View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaussian-elimination-based algorithm for solving linear equations on mesh-connected processors

    Page(s): 407 - 412
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    The problem of solving a system of N linear equations on a mesh-connected multiprocessor structure is considered. The solution to the problem is obtained by using a Gaussian-elimination-based algorithm called `successive Gaussian elimination'. The new algorithm does not contain a separate backsubstitution phase. A two-dimensional array of N×(N+1) processors is employed to obtain the solution in (5N-log N-4) time steps. This scheme eliminates the use of two processor structures in conjunction, one for triangulation and the other for backsubstitution, for producing the complete solution using the existing Gaussian elimination algorithm. Most importantly, the new algorithm supports pairwise pivoting to assure numerical stability. The proposed processor-array structure is amenable for VLSI implementation as identical processors with only simple and regular interconnections are required View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.