By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 2 • Date Feb 2002

Filter Results

Displaying Results 1 - 7 of 7
  • Performance of CORBA-based client-server architectures

    Page(s): 111 - 127
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1236 KB) |  | HTML iconHTML  

    Middleware has been introduced to provide interoperability as well as transparent location of servers in heterogeneous client-server environments. Although such benefits accrue from the use of middleware, careful consideration of system architecture is required to achieve high performance. Based on implementation and measurements made on the system, this paper is concerned with the impact of client-server interaction architecture on the performance of CORBA Systems. CORBA or Common Object Request Broker Architecture, proposed by the Object Management Group, is one of the commonly used standards for middleware architectures. Using a commercially available CORBA compliant ORB software called ORBeline, four different architectures were designed and implemented for client-server interaction on a network of workstations. In the Handle-Driven ORB (H-ORB) architecture, the client gets the address of the server from the agent and communicates with the server directly. In the Forwarding ORB (F-ORB) architecture the client request is automatically forwarded by the agent to the appropriate server which then returns the results of the computations to the client directly. In the Process Planner (P-ORB) architecture, the agent combines request forwarding with concurrent invocation of multiple servers for complex requests that require the services of multiple servers. The Adaptive ORB (A-ORB) combines the functionalities of both the H-ORB and the F-ORB and can switch dynamically from an H-ORB mode to an F-ORB mode and vice versa, depending on the load condition. Our measurements show that the differences among the performances of these architectures change with a change in the workload. The paper will report on the relative performances of these four architectures under different workload conditions. The results provide insights into system behavior for designers as well as users of systems. In particular, the impact of internode delays, message size, and request service times on the latency and scalability attributes of these architectures is analyzed. A discussion of how agent cloning can improve system performance is also included View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal schedules for cycle-stealing in a network of workstations with a bag-of-tasks workload

    Page(s): 179 - 191
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB) |  | HTML iconHTML  

    We refine the model underlying our prior work on scheduling bag-of-tasks ("embarrassingly parallel") workloads via cycle-stealing in networks of workstations (S.N. Bhatt et al., 1997; A.L. Rosenberg, 1999), obtaining a model wherein the scheduling guidelines of Rosenberg produce optimal schedules for every such cycle-stealing opportunity. We thereby render prescriptive the descriptive model of those sources. Although computing optimal schedules usually requires the use of general function-optimizing methods, we show how to compute optimal schedules efficiently for the broad class of opportunities whose durations come from a concave probability distribution. Even when no such efficient computation of an optimal schedule is available, our refined model often suggests a natural notion of approximately optimal schedule, which may be efficiently computable. We illustrate such efficient approximability via the important class of cycle-stealing opportunities whose durations come from a heavy-tailed distribution. Such opportunities do not admit any optimal schedule, nor even a natural notion of approximately optimal schedule, within the model of Bhatt and Rosenberg. Within our refined model, though, we derive computationally simple schedules for heavy-tailed opportunities, which can be "tuned" to accomplish an expected amount of work that is arbitrarily close to optimal View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SMT layout overhead and scalability

    Page(s): 142 - 155
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (514 KB) |  | HTML iconHTML  

    Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient parallel execution of irregular recursive programs

    Page(s): 167 - 178
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (281 KB) |  | HTML iconHTML  

    Programs whose parallelism stems from multiple recursion form an interesting subclass of parallel programs with many practical applications. The highly irregular shape of many recursion trees makes it difficult to obtain good load balancing with small overhead. We present a system, called REAPAR, that executes recursive C programs in parallel on SMP machines. Based on data from a single profiling run of the program, REAPAR selects a load-balancing strategy that is both effective and efficient and it generates parallel code implementing that strategy. The performance obtained by REAPAR on a diverse set of benchmarks matches that published for much more complex systems requiring high-level problem-oriented explicitly parallel constructs. A case study even found REAPAR to be competitive to handwritten (low-level, machine-oriented) thread-parallel code View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori

    Page(s): 128 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB) |  | HTML iconHTML  

    All-to-all communication is one of the most dense collective communication patterns and occurs in many important applications in parallel and distributed computing. In this paper, we present a new all-to-all broadcast algorithm in multidimensional all-port mesh and torus networks. We propose a broadcast pattern which ensures a balanced traffic load in all dimensions in the network so that the all-to-all broadcast algorithm can achieve a very tight near-optimal transmission time. The algorithm also takes advantage of overlapping of message switching time and transmission time, and the total communication delay asymptotically matches the lower bound of all-to-all broadcast. Finally, the algorithm is conceptually simple and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the near-optimum in practice View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rate-based borrowing scheme for QoS provisioning in multimedia wireless networks

    Page(s): 156 - 166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (188 KB) |  | HTML iconHTML  

    Now that cellular networks are being called upon to support real-time interactive multimedia traffic such as video teleconferencing, these networks must be able to provide their users with quality-of-service (QoS) guarantees. Although the QoS provisioning problem arises in wireline networks as well, mobility of hosts, scarcity of bandwidth, and channel fading make QoS provisioning a challenging task in wireless networks. It has been noticed that multimedia applications can tolerate and gracefully adapt to transient fluctuations in the QoS that they receive from the network. The management of such adaptive multimedia applications is becoming a new research area in wireless networks. As it turns out, the additional flexibility afforded by the ability of multimedia applications to tolerate and adapt to transient changes in the QoS parameters can be exploited by protocol designers to significantly improve the overall performance of wireless systems. The main contribution of this paper is to propose a novel, rate-based, borrowing scheme for QoS provisioning in high-speed cellular networks carrying multimedia traffic. Our scheme attempts to allocate the desired bandwidth to every multimedia connection originating in a cell or being handed off to the cell. The novelty of our scheme is that, in case of insufficient bandwidth, in order not to deny service to requesting connections (new or hand-off), bandwidth will be borrowed, on a temporary basis, from existing connections. Our borrowing scheme guarantees that no connection gives up more than its fair share of bandwidth, in the sense that the amount of bandwidth borrowed from a connection is proportional to its tolerance to bandwidth loss. Importantly, our scheme ensures that the borrowed bandwidth is promptly returned to the degraded connections. Extensive simulation results show that our rate-based QoS provisioning scheme outperforms the best previously known schemes in terms of call dropping probability, call blocking probability, and bandwidth utilization View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel data distribution technique for host-client type parallel applications

    Page(s): 97 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (350 KB) |  | HTML iconHTML  

    This paper considers an analytic data distribution for improving the performance of host-client type parallel applications which exhibit serialized communication patterns. The technique involves assuming the serialized communications is enforced, which simplifies data analysis and can provide the basis for real-time dynamic load balancing. This distribution has been tested using a parallel matrix multiplication implementation and a parallel MPEG compression implementation. The key results of this paper are that analytic distribution can reduce execution time and increase scalability of certain parallel applications over typical equal data distributions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology