By Topic

Selected Areas in Communications, IEEE Journal on

Issue 6 • Date June 1999

Filter Results

Displaying Results 1 - 15 of 15
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buffer management schemes for supporting TCP in gigabit routers with per-flow queueing

    Page(s): 1159 - 1169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB)  

    There has been much interest in using active queue management in routers in order to protect users from connections that are not very responsive to congestion notification. An Internet draft recommends schemes based on random early detection for achieving these goals, to the extent that it is possible, in a system without “per-flow” state. However, a “stateless” system with first-in/first-out (FIFO) queueing is very much handicapped in the degree to which flow isolation and fairness can be achieved. Starting with the observation that a “stateless” system is but one extreme in a spectrum of design choices and that per-flow queueing for a large number of flows is possible, we present active queue management mechanisms that are tailored to provide a high degree of isolation and fairness for TCP connections in a gigabit IP router using per-flow queueing. We show that IP flow state in a router can be bounded if the scheduling discipline used has finite memory, and we investigate the performance implications of different buffer management strategies in such a system. We show that merely using per-flow scheduling is not sufficient to achieve effective isolation and fairness, and it must be combined with appropriate buffer management strategies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for optimizing the cost and performance of next-generation IP routers

    Page(s): 1013 - 1029
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    The explosive growth of Internet users, the increased user demand for bandwidth, and the declining cost of technology have all resulted in the emergence of new classes of high-speed distributed IP-router architectures with packet-forwarding rates of the order of gigabits, or even terabits, per second. This paper develops an analytical framework for modeling and analyzing the impact of technological factors on the cost-performance tradeoffs in distributed-router architectures. The main tradeoff in a distributed router results naturally from moving the main packet-forwarding and processing power from a centralized forwarding engine to an ensemble of smaller forwarding engines, either dedicated to or shared among the line cards. Processing packets in these smaller engines can be much cheaper (by as much two to three orders of magnitude) than in a centralized forwarding engine. Therefore, the main goal of our modeling framework is to determine an optimal allocation of processing power to the forwarding engines (in a distributed router) to minimize overall router cost while achieving a given level of packet-forwarding performance. Two types of router models are analyzed using the proposed framework: a distributed-router architecture and parallel-router architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the speedup required for work-conserving crossbar switches

    Page(s): 1057 - 1066
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB)  

    This paper describes the architecture for a work-conserving server using a combined I/O-buffered crossbar switch. The switch employs a novel algorithm based on output occupancy, the lowest occupancy output first algorithm (LOOFA), and a speedup of only two. A work-conserving switch provides the same throughput performance as an output-buffered switch. The work-conserving property of the switch is independent of the switch size and input traffic pattern. We also present a suite of algorithms that can be used in combination with LOOFA. These algorithms determine the fairness and delay properties of the switch. We also describe a mechanism to provide delay bounds for real-time traffic using LOOFA. These delay bounds are achievable without requiring output-buffered switch emulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flow aggregated, traffic driven label mapping in label-switching networks

    Page(s): 1170 - 1177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB)  

    Label-switching technology enables high performance and flexible layer-3 packet forwarding based on the fixed-length label information that is mapped to the layer-3 packet stream. A label-switching router (LSR) forwards layer-3 packets based on their layer-3 address information or their label information that is mapped to the layer-3 address information. Two label-mapping policies have been proposed. One is traffic driven mapping, where the label is mapped for a layer-3 packet stream of each host-pair according to the actual packet arrival. The other is topology driven mapping, where the label is mapped in advance for a layer-3 packet stream toward the same destination network, regardless of actual packet arrival to the LSR. This paper evaluates the required number of labels under each of these two label-mapping policies using real backbone traffic traces. The evaluation shows that both label-mapping policies require a large number of labels. In order to reduce the required number of labels, we propose a label-mapping policy that is a combination of the two label-mapping policies above. This is traffic-driven label mapping for the packet stream toward the same destination network. The evaluation shows that the proposed label-mapping policy requires only about one-tenth as many labels as the traffic-driven label mapping for the host-pair packet stream and the topology-driven label mapping for the destination-network packet stream View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IP-address lookup using LC-tries

    Page(s): 1083 - 1092
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    There has been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multigigabit routers for the Internet, without having to rely on multilayer switching techniques. We address this problem by using an LC-trie, a trie structure with combined path and level compression. This data structure enables us to build efficient, compact, and easily searchable implementations of an IP-routing table. The structure can store both unicast and multicast addresses with the same average search times. The search depth increases as Θ(log log n) with the number of entries in the table for a large class of distributions, and it is independent of the length of the addresses. A node in the trie can be coded with four bytes. Only the size of the base vector, which contains the search strings, grows linearly with the length of the addresses when extended from 4 to 16 bytes, as mandated by the shift from IP version 4 to IP version 6. We present the basic structure as well as an adaptive version that roughly doubles the number of lookups/s. More general classifications of packets that are needed for link sharing, quality-of-service provisioning, and multicast and multipath routing are also discussed. Our experimental results compare favorably with those reported previously in the research literature View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matching output queueing with a combined input/output-queued switch

    Page(s): 1030 - 1039
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    The Internet is facing two problems simultaneously: there is a need for a faster switching/routing infrastructure and a need to introduce guaranteed qualities-of-service (QoS). Each problem can be solved independently: switches and routers can be made faster by using input-queued crossbars instead of shared memory systems; QoS can be provided using weighted-fair queueing (WFQ)-based packet scheduling. Until now, however, the two solutions have been mutually exclusive-all of the work on WFQ-based scheduling algorithms has required that switches/routers use output-queueing or centralized shared memory. This paper demonstrates that a combined input/output-queueing (CIOQ) switch running twice as fast as an input-queued switch can provide precise emulation of a broad class of packet-scheduling algorithms, including WFQ and strict priorities. More precisely, we show that for an N×N switch, a “speedup” of 2-1/N is necessary, and a speedup of two is sufficient for this exact emulation. Perhaps most interestingly, this result holds for all traffic arrival patterns. On its own, the result is primarily a theoretical observation; it shows that it is possible to emulate purely OQ switches with CIOQ switches running at approximately twice the line rate. To make the result more practical, we introduce several scheduling algorithms that with a speedup of two can emulate an OQ switch. We focus our attention on the simplest of these algorithms, critical cells first (CCF), and consider its running time and implementation complexity. We conclude that additional techniques are required to make the scheduling algorithms implementable at a high speed and propose two specific strategies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing scheduling algorithms in high-speed networks

    Page(s): 1145 - 1158
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (276 KB)  

    The fluid generalized processor sharing (GPS) algorithm has desirable properties for integrated services networks and many packet fair queueing (PFQ) algorithms have been proposed to approximate GPS. However, there have been few high-speed implementations of PFQ algorithms that can support a large number of sessions with diverse rate requirements and at the same time maintain all the important properties of GPS. The implementation cost of a PFQ algorithm is determined by: (1) computation of the system virtual time function; (2) maintenance of the relative ordering of the packets via their timestamps (scheduling); and (3) regulation of packets based on eligibility time, in some algorithms. While most of the recently proposed PFQ algorithms reduce the complexity of computing the system virtual time function, the complexity of scheduling and traffic regulation is still a function of the number of active sessions. In addition, while reducing the algorithmic or asymptotic complexity has been the focus of most analysis, it is also important to reduce the complexity of basic operations in order for the algorithm to run at high speed. We develop techniques to reduce both types of complexities for networks of both fixed and variable size packets. Regulation and scheduling are implemented in an integrated architecture that can be viewed as logically performing sorting in two dimensions simultaneously. By using a novel grouping architecture, we are able to perform this with an algorithmic complexity independent of the number of sessions in the system at the cost of a small controllable amount of relative error. To reduce the cost of basic operations, we propose a hardware-implementation framework and several novel techniques that reduce the on-chip memory size, off-chip memory bandwidth, and off-chip access latency. The proposed implementation techniques have been incorporated into commercial ATM switch and IP router products View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel IP-routing lookup scheme and hardware architecture for multigigabit switching routers

    Page(s): 1093 - 1104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB)  

    One of the pertinent design issues for new generation IP routers is the route-lookup mechanism. For each incoming IP packet, the IP routing is required to perform a longest-prefix matching on the route lookup in order to determine the packet's next hop. This study presents a fast unicast route-lookup mechanism that only needs tiny SRAM and can be implemented using a hardware pipeline. The forwarding table, based on the proposed scheme, is small enough to fit into a faster SRAM with low cost. For example, a large routing table with 40000 routing entries can be compacted into a forwarding table of 450-470 kbytes costing less than US$30. Most route lookups need only one memory access; no lookup needs more than three memory accesses. When implemented using a hardware pipeline, the proposed mechanism can achieve one routing lookup every memory access. With current 10-ns SRAMs, this mechanism furnishes approximately 100×106 routing lookups/s, which is much faster than any current commercially available routing-lookup scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance issues in VC-merge capable switches for multiprotocol label switching

    Page(s): 1178 - 1189
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    In a multiprotocol label switching (MPLS) domain, ATM label-switching routers (LSRs) are potentially capable of providing the highest forwarding capacity in the backbone network. Virtual circuit (VC) merging is a mechanism in an ATM-LSR that allows many IP routes to be mapped to the same VC label and provides a scalable mapping method that can support thousands of destinations. VC merging requires reassembly buffers so that cells belonging to different packets intended for the same destination do not interleave with each other. In this paper, the impact of VC merging on the buffering requirement for the reassembly buffers is investigated. We propose a realistic architecture that supports VC merging. We study the performance of this architecture using an analytic approach and using simulation driven by empirical Internet packet-size distribution. At the cell level, our main finding indicates that VC merging incurs a minimal overhead compared to non-VC merging, in terms of additional buffering. Moreover, the overhead decreases as utilization increases or as the traffic becomes more bursty with longer dependence. The finding has important practical consequences since routers and switches are dimensioned for high utilization and stressful traffic conditions. At the packet level, VC merging generally achieves a higher goodput than non-VC merging with EPD for the same buffer size. We also study the delay performance and find that the additional delay due to VC merging is insignificant at high speed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and evaluation of a high-performance ATM firewall switch and its applications

    Page(s): 1190 - 1200
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB)  

    We present the design of a value-added ATM switch that is capable of performing packet-level (IP) filtering at the maximum throughput of 2.88 Gbit/s per port. This firewall switch nicely integrates the IP level security mechanisms into the hardware components of an ATM switch so that most of the filtering operations are performed in parallel with the normal cell processing, and most of its cost is absorbed into the base cost of the switch. The firewall switch employs the concept of “last cell hostage” (LCH) to avoid or reduce the latency caused by filtering. We analyze in detail the performance of the firewall switch in terms of the throughput and the latency and address related design issues. Applications of our firewall switch as Internet and intranet security solutions are also discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Priority queue schedulers with approximate sorting in output-buffered switches

    Page(s): 1127 - 1144
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (888 KB)  

    All recently proposed packet-scheduling algorithms for output-buffered switches that support quality-of-service (QoS) transmit packets in some priority order, e.g., according to deadlines, virtual finishing times, eligibility times, or other time stamps that are associated with a packet. Since maintaining a sorted priority queue introduces significant overhead, much emphasis on QoS scheduler design is put on methods to simplify the task of maintaining a priority queue. In this paper, we consider an approach that attempts to approximate a sorted priority queue at an output-buffered switch. The goal is to trade off less accurate sorting for lower computational overhead. Specifically, this paper presents a scheduler that approximates the sorted queue of an earliest-deadline-first (EDF) scheduler. The approximate scheduler is implemented using a set of prioritized first-in/first-out (FIFO) queues that are periodically relabeled. The scheduler can be efficiently implemented with a fixed number of pointer manipulations, thus enabling an implementation in hardware. Necessary and sufficient conditions for the worst-case delays of the scheduler with approximate sorting are presented. Numerical examples, including traces based on MPEG video, demonstrate that in realistic scenarios, scheduling with approximate sorting is a viable option View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On fast address-lookup algorithms

    Page(s): 1067 - 1082
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    The growth of the Internet and its acceptance has sparkled keen interest in the research community in respect to many apparent scaling problems for a large infrastructure based on IP technology. A self-contained problem of considerable practical and theoretical interest is the longest-prefix lookup operation, perceived as one of the decisive bottlenecks. Several novel approaches have been proposed to speed up this operation that promise to scale forwarding technology into gigabit speeds. This paper surveys these new lookup algorithms and classifies them based on applied techniques, accompanied by a set of practical requirements that are critical to the design of high-speed routing devices. We also propose several new algorithms to provide lookup capability at gigabit speeds. In particular, we show the theoretical limitations of routing table size and show that one of our new algorithms is almost optimal, while requiring only a small number of memory accesses to perform each address lookup View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of packet-fair queuing schedulers using a RAM-based searching engine

    Page(s): 1105 - 1126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB)  

    The implementation of packet-fair queuing (PFQ) schedulers, which aim at approximating the generalized processor sharing (GPS) policy, is a central issue for providing multimedia services with various quality-of-service (QoS) requirements in packet-switching networks. In the PFQ scheduler, packets are usually time stamped with a value based on some algorithm and are transmitted with an increasing order of the time-stamp values. One of the most challenging issues is to search for the smallest time-stamp value among hundreds of thousands of sessions. In this paper, we propose a novel RAM-based searching engine (RSE) to speed up the searching process by using the concept of hierarchical searching with a tree data structure. The time for searching the smallest time stamp is independent of the number of sessions in the system and is only bounded by the memory accesses needed. The RSE can be implemented with commercial memory and field programmable gate array (FPGA) chips in a cost-effective manner. With the extension of the RSE, we propose a two-dimensional (2-D) RSE architecture to implement a general shaper-scheduler. Other challenging issues, such as time-stamp overflow and aging, are also addressed in the paper View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear-complexity algorithms for QoS support in input-queued switches with no speedup

    Page(s): 1040 - 1056
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB)  

    We present several fast, practical linear-complexity scheduling algorithms that enable provision of various quality-of-service (QoS) guarantees in an input-queued switch with no speedup. Specifically, our algorithms provide per-virtual-circuit transmission rate and cell delay guarantees using a credit-based bandwidth reservation scheme. Our algorithms also provide approximate max-min fair sharing of unreserved switch capacity. The novelties of our algorithms derive from judicious choices of edge weights in a bipartite matching problem. The edge weights are certain functions of the amount and waiting times of queued cells and credits received by a virtual circuit. By using a linear-complexity variation of the well-known stable-marriage matching algorithm, we present theoretical proofs and demonstrate by simulations that the edge weights are bounded. This implies various QoS guarantees or contracts about bandwidth allocations and cell delays. Network management can then provide these contracts to the clients. We present several different algorithms of varied complexity and performance (as measured by the usefulness of each algorithm's contract). While most of this paper is devoted to the study of “soft” guarantees, a few “hard” guarantees can also be proved rigorously for some of our algorithms. As can be expected, the provable guarantees are weaker than the observed performance bounds in simulations. Although our algorithms are designed for switches with no speedup, we also derive upper bounds on the minimal buffer requirement in the output queues necessary to prevent buffer overflow when our algorithms are used in switches with speedup larger than one View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Journal on Selected Areas in Communications focuses on all telecommunications, including telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Muriel Médard
MIT