By Topic

Computers, IEEE Transactions on

Issue 5 • Date May 1997

Filter Results

Displaying Results 1 - 15 of 15
  • Comments on "Theory and applications of cellular automata in cryptography" [with reply]

    Page(s): 637 - 639
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (33 KB)  

    This paper argues that the cipher systems based on cellular automata (CA) proposed by S. Nandi et al. (1994) are affine and are insecure. A reply by S. Nandi and P. Pal Chaudhuri is given. The reply emphasizes the point that the regular, modular, cascadable structure of local neighborhood CA can be employed for building low cost cipher system hardware. This cost effective engineering solution can achieve desired level of security with larger size CA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • On optimal strategies for cycle-stealing in networks of workstations

    Page(s): 545 - 557
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB)  

    We study the parallel scheduling problem for a new modality of parallel computing: having one workstation “steal cycles” from another. We focus on a draconian mode of cycle-stealing, in which the owner of workstation B allows workstation A to take control of B's processor whenever it is idle, with the promise of relinquishing control immediately upon demand. The typically high communication overhead for supplying workstation B with work and receiving its results militates in favor of supplying B with large amounts of work at a time; the risk of losing work in progress when the owner of B reclaims the workstation militates in favor of supplying B with a sequence of small packets of work. The challenge is to balance these two pressures in a way that maximizes the amount of work accomplished. We formulate two models of cycle-stealing. The first attempts to maximize the expected work accomplished during a single episode, when one knows the probability distribution of the return of B's owner. The second attempts to match the productivity of an omniscient cycle-stealer, when one knows how much work that stealer can accomplish. We derive optimal scheduling strategies for sample scenarios within each of these models View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum-congestion hypergraph embedding in a cycle

    Page(s): 600 - 602
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (56 KB)  

    The minimum-congestion hypergraph embedding in a cycle (MCHEC) problem is to embed the n edges in an m-vertex hypergraph as paths in a cycle on the same number of vertices, such that congestion-the maximum number of paths that use any single edge in the cycle-is minimized. The MCHEC problem has applications in electronic design automation and parallel computing. In this paper, it is proven that the MCHEC problem is NP-complete. An O((nm)k+1) algorithm is described that computes an embedding with congestion k or determines that such an embedding does not exist. Finally, a linear-time approximation algorithm for arbitrary instances is presented that computes an embedding whose congestion is at most three times optimal View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multirate VLSI arrays and their synthesis

    Page(s): 515 - 529
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    Many applications in signal and image processing can be efficiently implemented on regular VLSI architectures such as systolic arrays. Multirate arrays (MRAs) are an extension of systolic arrays where different data streams are propagated with different clocks. We address the analysis and synthesis problem for this class of architectures. We present a formal definition of MRAs, as systems of recurrence equations defined over sparse polyhedral domains. We also give transformation rules for this class of recurrences, and use them to show that MRAs constitute a particular subset of systems of affine recurrence equations (SoAREs). We then address the synthesis problem, and show how an MRA can be systematically derived from an initial specification in the form of a mathematical equation. The main transformations that we use are domain rescalings and dependency decomposition, and we illustrate our method by deriving a hitherto unknown decimation filter array View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modular arithmetic using low order redundant bases

    Page(s): 611 - 616
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB)  

    N-digit, radix-α bases are proposed for VLSI implementation of redundant arithmetic, mod m, where ⟨αNm=±1, ⟨αjm≠±1, for 0<j<N and m is prime. These bases simplify arithmetic overflow and are well suited to redundant arithmetic. The representations provide competitive, multiplierless T-point number theoretic transforms, mod m, where Tl N or Tl 2N View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Skewed associativity improves program performance and enhances predictability

    Page(s): 530 - 544
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB)  

    Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developed for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large data arrays are accessed. Execution time can vary drastically for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organization experts. They are not aware of such phenomena and have no control over it. In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. It is therefore possible to use larger block sizes in blocked algorithms, which will further reduce blocking overhead costs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward optimal broadcast in a star graph using multiple spanning trees

    Page(s): 593 - 599
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (156 KB)  

    In a multicomputer network, sending a packet typically incurs two costs: start-up time and transmission time. This work is motivated by the observation that most broadcast algorithms in the literature for the star graph networks only try to minimize one of the costs. Thus, many algorithms, though claimed to be optimal, are only so when one of the costs is negligible. In this paper, we try to optimize both costs simultaneously for four types of broadcast problems: one-to-all or all-to-all broadcasting in an n-star network with either one-port or all-port communication capability. As opposed to earlier solutions, the main technique used in this paper is to construct from a source node multiple spanning trees, along each of which one partition of the broadcast message is transmitted View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective victim caching: a method to improve the performance of direct-mapped caches

    Page(s): 603 - 610
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB)  

    Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi (1990) as an approach to improve the miss rate of direct-mapped caches without affecting their access time. This approach augments the direct-mapped main cache with a small fully associate cache, called victim cache, that stores cache blocks evicted from the main cache as a result of replacements. We propose and evaluate an improvement of this scheme, called selective victim caching. In this scheme, incoming blocks into the first-level cache are placed selectively in the main cache or a small victim cache by the use of a prediction scheme based on their past history of use. In addition, interchanges of blocks between the main cache and the victim cache are also performed selectively. We show that the scheme results in significant improvements in miss rate as well as the average memory access time, for both small and large caches (4 Kbytes-128 Kbytes). For example, simulations with ten instruction traces from the SPEC '92 benchmark suite showed an average improvement of approximately 21 percent in miss rate over simple victim caching for a 16-Kbyte cache with a block size of 32 bytes; the number of blocks interchanged between the main and victim caches reduced by approximately 70 percent. Implementation alternatives for the scheme in an on-chip processor cache are also described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The circulating processor model of parallel systems

    Page(s): 572 - 587
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (568 KB)  

    This paper introduces the circulating processor model for parallel computer systems. Models of parallel systems tend to be computationally complex due to synchronization constraints such as task forking and joining. However, product form queuing network models remain computationally efficient as the size of the system grows by calculating only the mean performance metrics of the system. The circulating processor model is a product form queuing network model that differs from more traditional models in that the processors circulate among the parallel applications. In traditional models, the tasks of the parallel application circulate among the processors. Behaviors such as forking and joining of tasks and barrier synchronizations are better captured using this new approach. The circulating processor model may be load-dependent or load-independent. For systems that contain a single parallel application, the load-dependent circulating processor model is exact, while the load-independent model is not. In the latter case, an exact error can be calculated. For systems that contain multiple parallel applications, the load-dependent circulating processor model is a good approximation to the actual system, while the load-independent model is not. A case study using Parallel Virtual Machine (PVM) on a network of workstations illustrates the applicability of the circulating processor model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stochastic bounds for parallel program execution times with processor constraints

    Page(s): 630 - 636
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    A parallel program can be modeled as an acyclic directed graph, where a node represents a task, which is the smallest grain of computation to be assigned to a processor, and arcs stand for precedence (synchronization) constraints among the tasks. Due to different input data and unpredictable dynamic run time environments, the execution times of tasks as well as the entire program can be treated as random variables. In this paper, we develop some stochastic lower and upper bounds for parallel program execution times when there are limited processors. Such analysis can provide important information for job scheduling and resource allocation. For several typical classes of parallel programs, we derive very accurate closed form approximations for the bounds. Examples are also given to demonstrate the quality of the bounds derived View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication in multicomputers with nonconvex faults

    Page(s): 616 - 622
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    A technique to enhance multicomputer routers for fault-tolerant routing with modest increase in routing complexity and resource requirements is described. This method handles solid faults in meshes, which includes all convex faults and many practical nonconvex faults, for example, faults in the shape of L or T. As examples of the proposed method, adaptive and nonadaptive fault-tolerant routing algorithms using four virtual channels per physical channel are described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A search of minimal key functions for normal basis multipliers

    Page(s): 588 - 592
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (88 KB)  

    The circuit complexity of a Massey-Omura normal basis multiplier for a finite field GF(2m) depends on the key function for multiplication. Key functions with minimum complexity, called minimal key functions, are desirable. This paper investigates the complexity of a key function and reports search results of minimal key functions. A table of minimal key functions for m up to 31 is included View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Validated roundings of dot products by sticky accumulation

    Page(s): 623 - 629
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    The dot product operation is very prevalent in scientific computation and has therefore been incorporated as a primitive operation in some languages. The implementation of the dot product operation by a sequence of IEEE standard multiplications and additions does not prevent a substantial accumulation of the round-off errors or warn the user about a catastrophic cancellation. We present the design of a double precision dot product operation employing sticky accumulation, where the final rounded result is validated by raising a new exception flag if the result incurred catastrophic cancellation. Sticky accumulation can be implemented in a pipeline or parallel environment to sustain double precision with an extended control of the error. Our design allows that, in the absence of catastrophic cancellation, one ulp accuracy is guaranteed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Associative nets: a graph-based parallel computing model

    Page(s): 558 - 571
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB)  

    This paper presents a new parallel computing model called Associative Nets. This model relies on basic primitives called associations that consist of applying an associative operator over connected components of a subgraph of the physical interprocessor connection graph. Associations can be very efficiently implemented (in terms of hardware cost or processing time) thanks to asynchronous computation. This model is quite effective for image analysis and several other fields; as an example, graph processing algorithms are presented. While relying on a much simpler architecture, these algorithms have, in general, a complexity equivalent to the one obtained by more expensive computing models, like the PRAM model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org