By Topic

Computers and Digital Techniques, IEE Proceedings -

Issue 2 • Date Mar 1995

Filter Results

Displaying Results 1 - 12 of 12
  • Genetic algorithm for mapping tasks onto a reconfigurable parallel processor

    Page(s): 81 - 86
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    The authors describe a genetic algorithm for a difficult optimisation problem which arises in the context of parallel processing. The problem is to assign each task in the given task graph T to a processor, so as to minimise the total overall execution time of the tasks. Total execution time is computed with the knowledge of individual run times of tasks and the communication requirements among tasks. The intertask communication time is dependent on the interconnection network which connects the processors. No prior knowledge of the interconnection topology is assumed. The algorithm finds the interconnection architecture that is best suited for the task graph T; this makes sense when the target architecture is reconfigurable through programmable switches, e.g. transputer based parallel processors. The algorithm is also extended to add heterogeneous platforms, where each task t can be executed on a particular class of processors. The optimisation technique is based on the genetic paradigm. The authors describe an efficient chromosome representation, genetic operators and a fitness measure suitable for the application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fine grain scheduler for shared-memory multiprocessor systems

    Page(s): 98 - 106
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    The Tatung fine grain scheduler (TFGS), which works on machine instruction level for multiprocessor systems, is described. The object of TFGS is to minimise the total execution time of an application program that is to be executed on a shared memory multiprocessor system. An application program is compiled to generate intermediate code. This code is then represented by a data/control dependence graph, a branch nest tree and a priority list. The data dependence between operations, the pipeline effect of each processing element, and branches in the application programs are considered when TFGS does the scheduling task. The multiprocessor system is assumed to be interconnected by a shared memory. The hardware support of shared memory is designed. To process branches and loops within the application program, a status recording mechanism is proposed. The hardware has been designed and simulated. TFGS has been implemented, and some application programs have been used as the testing inputs. The results are very encouraging View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing and performance of the double tree (DOT) network

    Page(s): 93 - 97
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (328 KB)  

    The paper deals with the analysis of an irregular multistage interconnection network called the double tree (DOT) network. A dynamic shortest path routing algorithm for the packet switching DOT network is proposed. The DOT network, being an irregular network, can provide biased pairwise service to favoured connections. Its performance under varying degrees of localised communication is analysed. A comparison with the Omega network is also made View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and VLSI implementation of an address generation coprocessor

    Page(s): 145 - 151
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    Most applications of general purpose VLSI processors are developed using high level languages. In these languages, information is generally handled in a structured form. Compilers generate a considerable amount of code to navigate through the data structures and considerable processing time is spent performing address calculations required to access the data structures. An alternative to software address generation, a hardware memory reconfiguring unit or an address generation coprocessor is presented. To demonstrate the VLSI feasibility of the designed device, it is implemented in VLSI using the Octtool suite of tools. The tools used and the implementation procedure are described. VLSI design aspects such as regularity, modularity, scalability, etc. are discussed. The performance of the device is evaluated using assembly language programs that implement popular signal processing algorithms such as convolution, correlation, FFT and matrix multiplication. A system with the address generation unit exhibits a speed up of between approximately 1.5 and 2.5 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient parallel algorithms on optically interconnected arrays of processors

    Page(s): 87 - 92
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (524 KB)  

    Arrays of processors with pipelined optical buses are introduced for the efficient implementation of computationally intensive applications. Techniques for the concurrent transmission of messages over the optical bus to avoid collision of messages is shown. Convenient parallel data movement operations are derived for this architecture, which are then used in the design of parallel algorithms for the solution of some important numerical problems. The parallel algorithms implemented in the paper are for solving systems of linear equations and finding the roots of nonlinear equations. Even though this array of processors can function in the MIMD mode of operation, it is more suitable for the SIMD mode of operation, because it can be easily synchronised and scaled to a massive number of processors. Hence, the above parallel algorithms have been designed with the SIMD mode in mind. Their time complexities have been analysed, and are shown to compare favourably with those implemented on processors connected with electronic buses or point to point links such as the hypercube. Moreover, whereas a processing element of a hypercube of size N has log N ports, a processing element of an array with optical buses has a constant number of ports. Thus, it seems that an array of processors with optical buses is a promising, and could be a better, alternative for future supercomputing systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Manifestations of faults in single- and double-BJT BiCMOS logic gates

    Page(s): 135 - 144
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    Combining the inherent advantages of bipolar and CMOS, BiCMOS is emerging as a major technology for high speed, high performance, digital and mixed signal applications. Logic behaviour of single- and double-BJT BiCMOS devices under transistor level shorts and opens is examined. In addition to sequential behaviour, some stuck open faults exhibit increased delay. While most stuck on faults can be detected by logic level testing, some of them can only be detected by monitoring the power supply current (IDDQ monitoring). A stuck open fault in double-BJT BiCMOS device manifesting as enhanced dynamic IDD current is shown. The faulty behaviour of bipolar (TTL) and CMOS logic families is compared with BiCMOS. Testability of both single- and double-BJT BiCMOS devices are discussed, along with a design for testability approach for detecting stuck open faults in S-BJT BiCMOS devices View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast algorithms for LUC digital signature computation

    Page(s): 165 - 169
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (312 KB)  

    Recently, a digital signature scheme based on a special type of Lucas function has been proposed which is free from the multiplicative attack on the RSA digital signature (P. Smith and M. Lennon, 1993). A disadvantage of this new digital signature scheme LUC is that it takes more computation than the RSA does. An important property, V(x+y)=V(x)×V(y)-V(x-y), of this special type of Lucas function is exploited to develop fast algorithms to make the LUC digital signature perform more efficiently. A parallel architecture for the proposed fast algorithms is developed. Besides the fast algorithm constructions, the paper shows that there exist many similar computational and mathematical aspects between the exponentiation and the special type of Lucas function considered View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intelligent code migration technique for synchronisation operations on a multiprocessor

    Page(s): 107 - 116
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    A compiler technique for migrating synchronisation operations is proposed. The traditional code motion technique is only used for migrating loop invariant statements before the loop; it cannot migrate those synchronisation operations inside the loop. On the other hand, all current statement level synchronisation schemes cannot handle the code migration. However, the performance enhancement is significant after code migration of synchronisation operation. This new migration technique reorders the sequence of synchronisation instructions; Send Signal(S) is moved up and Wait Signal(S, i-d) is moved down, to improve the system performance. Evidence shows that the migration of some synchronisation operations is not helpful for performance enhancement. Therefore, to migrate efficiently, an intelligent code migration algorithm is proposed. In this algorithm, only a few synchronisation migration operations are needed to speed up performance enormously View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient global strategy for designing and testing scanned sequential circuits

    Page(s): 170 - 176
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (572 KB)  

    Scan design has been widely used for alleviating the burden of test generation. How to reduce the extra costs caused by the scan design becomes a major target. Previous approaches have tried individually to enhance the abilities of test generation algorithm and scan cell selection strategy. In contrast, a global strategy taking care of the close relationship between these factors and combining a new scan structure is proposed. Reducing the test application time which may dominate the cost testing a mass product is the goal of this research. First of all, a new scan structure named gradually-on (GO) structure which allows for the scan cells in the scan chain to be gradually turned on is used. Two assertions for the design of test generator are proposed next, and a fault list oriented test generation algorithm is developed in accordance with these two assertions. A simulation based partial scan methodology is finally introduced for selecting the suitable scanning flip-flops one by one through the means of sufficiently utilising the dynamic information generated during fault simulation. Experimental results show that overall consideration of scan design and test generation is able to speed up test generation and reduce a great amount of test application time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of a neural binary pattern classifier

    Page(s): 152 - 156
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (316 KB)  

    The paper describes a binary neural network architecture and its performance in pattern classification. The network is called binary because its inputs are binary and its main components are composed of binary neurons. Apart from the usual input and output layers, the network has two `hidden' layers, called code layer and linear plane, connected in a feedforward structure. The weights of these feedforward connections are also binary. The performance of the network is demonstrated through binary pattern classification experiments. Comparisons with many one- and two-hidden-layer backpropagation networks are included. The proposed network shows superior performance in all the cases that have been studied View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Orientation assignment of standard cells using a fuzzy mathematical transformation

    Page(s): 157 - 164
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (508 KB)  

    It is well known that minimising total wire length reduces routing area in a standard cell layout. After the placement phase, another advanced improvement on total wire length is made by assigning the orientations of standard cells. The authors develop two way constrained fuzzy graph clustering based on fuzzy c-means clustering and the transformation between the orientation assignment of standard cells and the constrained graph bisection to minimise total wire length. Finally, the proposed algorithm has tested several standard cell layouts, and the experimental results show that the proposed algorithm produces significant wire reduction on total wire length View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-performance analysis of cascaded crossbar interconnected multiprocessors

    Page(s): 117 - 134
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (908 KB)  

    The cost performance ratio of interconnection networks for multiprocessors depends upon the connectivity of the network: the processing efficiency of these systems increases when the number of links and switches also increases. The authors study the performance of a class of interconnection networks in which processors and memory modules are grouped into stages, each consisting of a crossbar network. Each stage is connected to two neighbouring stages. A queueing model is presented to analyse cascaded crossbar architectures organised into bidirectional rings and operating under synchronous packet switching. Cost performance comparisons of various cascaded crossbar network configurations are presented. It is shown that, even with poor memory reference locality, it is preferable to have more stages and fewer processors per stage View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.