By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 5 • Date May 2014

Filter Results

Displaying Results 1 - 25 of 26
  • A Model Approach to the Estimation of Peer-to-Peer Traffic Matrices

    Publication Year: 2014 , Page(s): 1101 - 1111
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    Peer-to-Peer (P2P) applications have witnessed an increasing popularity in recent years, which brings new challenges to network management and traffic engineering (TE). As basic input information, P2P traffic matrices are of significant importance for TE. Because of the excessively high cost of direct measurement, many studies aim to model and estimate general traffic matrices, but few focus on P2P traffic matrices. In this paper, we propose a model to estimate P2P traffic matrices in operational networks. Important factors are considered, including the number of peers, the localization ratio of P2P traffic, and the network distance. Here, the distance can be measured with AS hop counts or geographic distance. To validate our model, we evaluate its performance using traffic traces collected from both the real P2P video-on-demand (VoD) and file-sharing applications. Evaluation results show that the proposed model outperforms the other two typical models for the estimation of the general traffic matrices in several metrics, including spatial and temporal estimation errors, stability in the cases of oscillating and dynamic flows, and estimation bias. To the best of our knowledge, this is the first research on P2P traffic matrices estimation. P2P traffic matrices, derived from the model, can be applied to P2P traffic optimization and other TE fields. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs

    Publication Year: 2014 , Page(s): 1112 - 1123
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2495 KB) |  | HTML iconHTML  

    This paper presents a performance modeling and optimization analysis tool to predict and optimize the performance of sparse matrix-vector multiplication (SpMV) on GPUs. We make the following contributions: 1) We present an integrated analytical and profile-based performance modeling to accurately predict the kernel execution times of CSR, ELL, COO, and HYB SpMV kernels. Our proposed approach is general, and neither limited by GPU programming languages nor restricted to specific GPU architectures. In this paper, we use CUDA-based SpMV kernels and NVIDIA Tesla C2050 for our performance modeling and experiments. According to our experiments, for 77 out of 82 test cases, the performance differences between the predicted and measured execution times are less than 9 percent; for the rest five test cases, the differences are between 9 and 10 percent. For CSR, ELL, COO, and HYB SpMV CUDA kernels, the average differences are 6.3, 4.4, 2.2, and 4.7 percent, respectively. 2) Based on the performance modeling, we design a dynamic-programming based SpMV optimal solution auto-selection algorithm to automatically report an optimal solution (i.e., optimal storage strategy, storage format(s), and execution time) for a target sparse matrix. In our experiments, the average performance improvements of the optimal solutions are 41.1, 49.8, and 37.9 percent, compared to NVIDIA's CSR, COO, and HYB CUDA kernels, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Scalable and Mobility-Resilient Data Search System for Large-Scale Mobile Wireless Networks

    Publication Year: 2014 , Page(s): 1124 - 1134
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (743 KB) |  | HTML iconHTML  

    This paper addresses the data search problem in large-scale highly mobile and dense wireless networks. Current wireless network data search systems are not suitable for large-scale highly mobile and dense wireless networks. This paper presents a scalable and mobility-resilient LOcality-based distRibuted Data search system (LORD) for large-scale wireless networks with high mobility and density. Taking advantage of the high density, rather than mapping data to a location point, LORD maps file metadata to a geographical region and stores it in multiple nodes in the region, thus enhancing mobility-resilience. LORD has a novel region-based geographic data routing protocol that does not rely on flooding or GPSs for data publishing and querying, and a coloring-based partial replication algorithm to reduce data replicas in a region while maintaining the querying efficiency. LORD also works for unbalanced wireless networks with sparse regions. Simulation results show the superior performance of LORD compared to representative data search systems in terms of scalability, overhead, and mobility resilience in a highly dense and mobile network. The results also show the high scalability and mobility-resilience of LORD in an unbalanced wireless network with sparse regions, and the effectiveness of its coloring-based partial replication algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Scalable and Modular Architecture for High-Performance Packet Classification

    Publication Year: 2014 , Page(s): 1135 - 1144
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (602 KB) |  | HTML iconHTML  

    Packet classification is widely used as a core function for various applications in network infrastructure. With increasing demands in throughput, performing wire-speed packet classification has become challenging. Also the performance of today's packet classification solutions depends on the characteristics of rulesets. In this work, we propose a novel modular Bit-Vector (BV) based architecture to perform high-speed packet classification on Field Programmable Gate Array (FPGA). We introduce an algorithm named StrideBV and modularize the BV architecture to achieve better scalability than traditional BV methods. Further, we incorporate range search in our architecture to eliminate ruleset expansion caused by range-to-prefix conversion. The post place-and-route results of our implementation on a state-of-the-art FPGA show that the proposed architecture is able to operate at 100+ Gbps for minimum size packets while supporting large rulesets up to 28 K rules using only the on-chip memory resources. Our solution is ruleset-feature independent , i.e. the above performance can be guaranteed for any ruleset regardless the composition of the ruleset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Extensible System for Multilevel Automatic Data Partition and Mapping

    Publication Year: 2014 , Page(s): 1145 - 1154
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (964 KB) |  | HTML iconHTML  

    Automatic data distribution is a key feature to obtain efficient implementations from abstract and portable parallel codes. We present a highly efficient and extensible runtime library that integrates techniques for automatic data partition and mapping. It uses a novel approach to define an abstract interface and a plug-in system to encapsulate different types of regular and irregular techniques, helping to generate codes which are independent of the exact mapping functions selected. Currently, it supports hierarchical tiling of arrays with dense and stride domains, that allows the implementation of both data and task parallelism using a SPMD model. It automatically computes appropriate domain partitions for a selected virtual topology, mapping them to available processors with static or dynamic load-balancing techniques. Our library also allows the construction of reusable communication patterns that efficiently exploit MPI communication capabilities. The use of our library greatly reduces the complexity of data distribution and communication, hiding the details of the underlying architecture. The library can be used as an abstract layer for building generic tiling operations as well. Our experimental results show that the use of this library allows to achieve similar performance as carefully-implemented manual versions for several, well-known parallel kernels and benchmarks in distributed and multicore systems, and substantially reduces programming effort. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage

    Publication Year: 2014 , Page(s): 1155 - 1165
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB) |  | HTML iconHTML  

    In personal computing devices that rely on a cloud storage environment for data backup, an imminent challenge facing source deduplication for cloud backup services is the low deduplication efficiency due to a combination of the resource-intensive nature of deduplication and the limited system resources. In this paper, we present ALG-Dedupe, an Application-aware Local-Global source deduplication scheme that improves data deduplication efficiency by exploiting application awareness, and further combines local and global duplicate detection to strike a good balance between cloud storage capacity saving and deduplication time reduction. We perform experiments via prototype implementation to demonstrate that our scheme can significantly improve deduplication efficiency over the state-of-the-art methods with low system overhead, resulting in shortened backup window, increased power efficiency and reduced cost for cloud backup services of personal storage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectural Support for Handling Jitterin Shared Memory Based Parallel Applications

    Publication Year: 2014 , Page(s): 1166 - 1176
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1297 KB) |  | HTML iconHTML  

    With an increasing number of cores per chip, it is becoming harder to guarantee optimal performance for parallel shared memory applications due to interference caused by kernel threads, interrupts, bus contention, and temperature management schemes (referred to as jitter). We demonstrate that the performance of parallel programs gets reduced (up to 35.22 percent) in large CMP based systems. In this paper, we characterize the jitter for large multi-core processors, and evaluate the loss in performance. We propose a novel jitter measurement unit that uses a distributed protocol to keep track of the number of wasted cycles. Subsequently, we try to compensate for jitter by using DVFS across a region of timing critical instructions called a frame. Additionally, we propose an OS cache that intelligently manages the OS cache lines to reduce memory interference. By performing detailed cycle accurate simulations, we show that we are able to execute a suite of Splash2 and Parsec benchmarks with a deterministic timing overhead limited to 2 percent for 14 out of 17 benchmarks with modest DVFS factors. We reduce the overall jitter by an average 13.5 percent for Splash2 and 6.4 percent for Parsec. The area overhead of our scheme is limited to 1 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BitTorrent Locality and Transit TrafficReduction: When, Why, and at What Cost?

    Publication Year: 2014 , Page(s): 1177 - 1189
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1590 KB) |  | HTML iconHTML  

    A substantial amount of work has recently gone into localizing BitTorrent traffic within an ISP in order to avoid excessive and often times unnecessary transit costs. Several architectures and systems have been proposed and the initial results from specific ISPs and a few torrents have been encouraging. In this work we attempt to deepen and scale our understanding of locality and its potential. Looking at specific ISPs, we consider tens of thousands of concurrent torrents, and thus capture ISP-wide implications that cannot be appreciated by looking at only a handful of torrents. Second, we go beyond individual case studies and present results for few thousands ISPs represented in our data set of up to 40K torrents involving more than 3.9M concurrent peers and more than 20M in the course of a day spread in 11K ASes. Finally, we develop scalable methodologies that allow us to process this huge data set and derive accurate traffic matrices of torrents. Using the previous methods we obtain the following main findings: i) Although there are a large number of very small ISPs without enough resources for localizing traffic, by analyzing the 100 largest ISPs we show that Locality policies are expected to significantly reduce the transit traffic with respect to the default random overlay construction method in these ISPs; ii) contrary to the popular belief, increasing the access speed of the clients of an ISP does not necessarily help to localize more traffic; iii) by studying several real ISPs, we have shown that soft speed-aware locality policies guarantee win-win situations for ISPs and end users. Furthermore, the maximum transit traffic savings that an ISP can achieve without limiting the number of inter-ISP overlay links is bounded by “unlocalizable” torrents with few local clients. The application of restrictions in the number of inter-ISP links leads to a higher transit traffic reduction but the QoS of clients downloading “unlocalizableȁ- ; torrents would be severely harmed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CPU Scheduling for Power/Energy Management on Multicore Processors Using Cache Miss and Context Switch Data

    Publication Year: 2014 , Page(s): 1190 - 1199
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB) |  | HTML iconHTML  

    Power and energy have become increasingly important concerns in the design and implementation of today's multicore/manycore chips. In this paper, we present two priority-based CPU scheduling algorithms, Algorithm Cache Miss Priority CPU Scheduler (CM-PCS) and Algorithm Context Switch Priority CPU Scheduler (CS-PCS), which take advantage of often ignored dynamic performance data, in order to reduce power consumption by over 20 percent with a significant increase in performance. Our algorithms utilize Linux cpusets and cores operating at different fixed frequencies. Many other techniques, including dynamic frequency scaling, can lower a core's frequency during the execution of a non-CPU intensive task, thus lowering performance. Our algorithms match processes to cores better suited to execute those processes in an effort to lower the average completion time of all processes in an entire task, thus improving performance. They also consider a process's cache miss/cache reference ratio, number of context switches and CPU migrations, and system load. Finally, our algorithms use dynamic process priorities as scheduling criteria. We have tested our algorithms using a real AMD Opteron 6134 multicore chip and measured results directly using the “KillAWatt” meter, which samples power periodically during execution. Our results show not only a power (energy/execution time) savings of 39 watts (21.43 percent) and 38 watts (20.88 percent), but also a significant improvement in the performance, performance per watt, and execution time · watt (energy) for a task consisting of 24 concurrently executing benchmarks, when compared to the default Linux scheduler and CPU frequency scaling governor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Trust Management for Delay Tolerant Networks and Its Application to Secure Routing

    Publication Year: 2014 , Page(s): 1200 - 1210
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1343 KB) |  | HTML iconHTML  

    Delay tolerant networks (DTNs) are characterized by high end-to-end latency, frequent disconnection, and opportunistic communication over unreliable wireless links. In this paper, we design and validate a dynamic trust management protocol for secure routing optimization in DTN environments in the presence of well-behaved, selfish and malicious nodes. We develop a novel model-based methodology for the analysis of our trust protocol and validate it via extensive simulation. Moreover, we address dynamic trust management, i.e., determining and applying the best operational settings at runtime in response to dynamically changing network conditions to minimize trust bias and to maximize the routing application performance. We perform a comparative analysis of our proposed routing protocol against Bayesian trust-based and non-trust based (PROPHET and epidemic) routing protocols. The results demonstrate that our protocol is able to deal with selfish behaviors and is resilient against trust-related attacks. Furthermore, our trust-based routing protocol can effectively trade off message overhead and message delay for a significant gain in delivery ratio. Our trust-based routing protocol operating under identified best settings outperforms Bayesian trust-based routing and PROPHET, and approaches the ideal performance of epidemic routing in delivery ratio and message delay without incurring high message or protocol maintenance overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Identification in Large-Scale RFID Systems with Handheld Reader

    Publication Year: 2014 , Page(s): 1211 - 1222
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1476 KB) |  | HTML iconHTML  

    Efficient identification of tags has been an essential operation for Radio Frequency IDentification (RFID) systems. In this paper, we consider the crucial problem of collecting all tags in a large-scale system through a handheld RFID reader. The reader has to move around due to the limited communication range of tags. We focus on the minimization of power consumption of the reader given the constraint on its movement distance. Two challenges must be addressed. First, the communication range of a tag is dependent on the reader. There is an intrinsic tradeoff between power saving and movement distance. Second, the number of sites at which the reader can collect tags can be numerous and the problem complexity is extremely high. We theoretically prove that the problem of minimizing the energy consumption of the reader is NP Complete (NPC). To solve the problem, we first analytically reveal that the time needed for reading a given number of tags is linearly proportional to the number of tags only. With this insight, we next propose an approach called ePath by constructing an energy-efficient candidate path and then incrementally pruning the path when the tag locations are given. We further relax the assumption on tag locations by extending ePath to exploit the tag distribution density knowledge only. Extensive simulations have been performed, and results show that our approach significantly reduces the power consumption of the reader comparing to an existing approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hop-by-Hop Message Authenticationand Source Privacy in WirelessSensor Networks

    Publication Year: 2014 , Page(s): 1223 - 1232
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (857 KB) |  | HTML iconHTML  

    Message authentication is one of the most effective ways to thwart unauthorized and corrupted messages from being forwarded in wireless sensor networks (WSNs). For this reason, many message authentication schemes have been developed, based on either symmetric-key cryptosystems or public-key cryptosystems. Most of them, however, have the limitations of high computational and communication overhead in addition to lack of scalability and resilience to node compromise attacks. To address these issues, a polynomial-based scheme was recently introduced. However, this scheme and its extensions all have the weakness of a built-in threshold determined by the degree of the polynomial: when the number of messages transmitted is larger than this threshold, the adversary can fully recover the polynomial. In this paper, we propose a scalable authentication scheme based on elliptic curve cryptography (ECC). While enabling intermediate nodes authentication, our proposed scheme allows any node to transmit an unlimited number of messages without suffering the threshold problem. In addition, our scheme can also provide message source privacy. Both theoretical analysis and simulation results demonstrate that our proposed scheme is more efficient than the polynomial-based approach in terms of computational and communication overhead under comparable security levels while providing message source privacy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improvement of Real-Time Multi-CoreSchedulability with Forced Non-Preemption

    Publication Year: 2014 , Page(s): 1233 - 1243
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (909 KB) |  | HTML iconHTML  

    While tasks may be preemptive or non-preemptive (due to their transactional operations), deadline guarantees in multi-core systems have been made only for those task sets in each of which all tasks are preemptive or non-preemptive, not a mixture thereof,i.e., fully preemptive or fully non-preemptive. In this paper, we first develop a schedulability analysis framework that guarantees the timing requirements of a given task set in which a task can be either preemptive or non-preemptive in multi-core systems. We then apply this framework to the prioritization polices of EDF (earliest deadline first) and FP (fixed priority), yielding schedulability tests of mpn-EDF (Mixed Preemptive/Non-preemptive EDF) and mpn-FP, which are generalizations of corresponding fully-preemptive and non-preemptive algorithms, i.e., fp-EDF and np-EDF, and fp-FP and np-FP. In addition to their timing guarantees for any task set that consists of a mixture of preemptive and non-preemptive tasks, the tests outperform the existing schedulability tests of np-EDF andnp-FP (i.e., special cases of mpn-EDF and mpn-FP). Using these tests, we also improve schedulability by developing an algorithm that optimally disallows preemption of a preemptive task under a certain assumption. We demonstrate via simulation that the algorithm finds up to 47.6 percent additional task sets that are schedulable with mpn-FP (likewise mpn-EDF), but not with fp-FP and np-FP (likewisefp-EDF and np-EDF). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving the Performance of IndependentTask Assignment Heuristics MinMin,MaxMin and Sufferage

    Publication Year: 2014 , Page(s): 1244 - 1256
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (4071 KB) |  | HTML iconHTML  

    MinMin, MaxMin, and Sufferage are constructive heuristics that are widely and successfully used in assigning independent tasks to processors in heterogeneous computing systems. All three heuristics are known to run in O(KN2) time in assigning N tasks to K processors. In this paper, we propose an algorithmic improvement that asymptotically decreases the running time complexity of MinMin to O(KN log N) without affecting its solution quality. Furthermore, we combine the newly proposed MinMin algorithm with MaxMin as well as Sufferage, obtaining two hybrid algorithms. The motivation behind the former hybrid algorithm is to address the drawback of MaxMin in solving problem instances with highly skewed cost distributions while also improving the running time performance of MaxMin. The latter hybrid algorithm improves the running time performance of Sufferage without degrading its solution quality. The proposed algorithms are easy to implement and we illustrate them through detailed pseudocodes. The experimental results over a large number of real-life data sets show that the proposed fast MinMin algorithm and the proposed hybrid algorithms perform significantly better than their traditional counterparts as well as more recent state-of-the-art assignment heuristics. For the large data sets used in the experiments, MinMin, MaxMin, and Sufferage, as well as recent state-of-the-art heuristics, require days, weeks, or even months to produce a solution, whereas all of the proposed algorithms produce solutions within only two or three minutes. View full abstract»

    Open Access
  • Liquid: A Scalable Deduplication File System for Virtual Machine Images

    Publication Year: 2014 , Page(s): 1257 - 1266
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (746 KB) |  | HTML iconHTML  

    A virtual machine (VM) has been serving as a crucial component in cloud computing with its rich set of convenient features. The high overhead of a VM has been well addressed by hardware support such as Intel virtualization technology (VT), and by improvement in recent hypervisor implementation such as Xen, KVM, etc. However, the high demand on VM image storage remains a challenging problem. Existing systems have made efforts to reduce VM image storage consumption by means of deduplication within a storage area network (SAN) cluster. Nevertheless, an SAN cannot satisfy the increasing demand of large-scale VM hosting for cloud computing because of its cost limitation. In this paper, we propose Liquid, a scalable deduplication file system that has been particularly designed for large-scale VM deployment. Its design provides fast VM deployment with peer-to-peer (P2P) data transfer and low storage consumption by means of deduplication on VM images. It also provides a comprehensive set of storage features including instant cloning for VM images, on-demand fetching through a network, and caching with local disks by copy-on-read techniques. Experiments show that Liquid's features perform well and introduce minor performance overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Localized Movement-Assisted SensorDeployment Algorithm for HoleDetection and Healing

    Publication Year: 2014 , Page(s): 1267 - 1277
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1681 KB) |  | HTML iconHTML  

    One of the fundamental services provided by a wireless sensor network (WSN) is the monitoring of a specified region of interest (RoI). Considering the fact that emergence of holes in the RoI is unavoidable due to the inner nature of WSNs, random deployment, environmental factors, and external attacks, assuring that the RoI is completely and continuously covered is very important. This paper seeks to address the problem of hole detection and healing in mobile WSNs. We discuss the main drawbacks of existing solutions and we identify four key elements that are critical for ensuring effective coverage in mobile WSNs: 1) determining the boundary of the RoI, 2) detecting coverage holes and estimating their characteristics, 3) determining the best target locations to relocate mobile nodes to repair holes, and 4) dispatching mobile nodes to the target locations while minimizing the moving and messaging cost. We propose a lightweight and comprehensive solution, called holes detection and healing (HEAL), that addresses all of the aforementioned aspects. The computation complexity of HEAL is O(v2) , where v is the average number of 1-hop neighbors. HEAL is a distributed and localized algorithm that operates in two distinct phases. The first identifies the boundary nodes and discovers holes using a lightweight localized protocol over the Gabriel graph of the network. The second treats the hole healing, with novel concept, hole healing area. We propose a distributed virtual forces-based local healing approach where only the nodes located at an appropriate distance from the hole will be involved in the healing process. Through extensive simulations we show that HEAL deals with holes of various forms and sizes, and provides a cost-effective and an accurate solution for hole detection and healing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Performance of a Mesh Network with Dynamically Appearing and Disappearing Femtocells as Additional Internet Gateways

    Publication Year: 2014 , Page(s): 1278 - 1288
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (629 KB) |  | HTML iconHTML  

    The number of hops from Mesh Routers (MRs) to an Internet gateway (IGW) plays an important role in determining the performance of a wireless mesh network (WMN). A recent patent has introduced a mechanism of using an existing femtocell (FC) as an additional potential IGW so that the performance of a WMN could be enhanced. Such an integration of WMN with FCs enables an increase in WMN's overall capacity. But, due to FCs' unpredictable operating times and uninformed disconnections, MRs require reliable and efficient schedule for switching between available FCs which has not been analyzed in the patent. The switching can be done in a push based preemptive or a pull based non-preemptive manner. In this paper, we formulate both these switching schemes for a WMN-FC integrated network as approximate statistical models based on a Markovian process. We also determine a switching schedule of FCs for each of MR based on reliable uncapacitated facility location (RUFL) problem. We extend an existing RUFL problem to incorporate dynamic operating nature of multiple FCs which display dynamic available/unavailable patterns for additional potential IGWs. Extensive simulations are carried out to validate our proposed statistical models and establish the performance of these switching schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Randomized Gathering of Mobile Agents in Anonymous Unidirectional Ring Networks

    Publication Year: 2014 , Page(s): 1289 - 1296
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB) |  | HTML iconHTML  

    We consider the gathering problem of multiple (mobile) agents in anonymous unidirectional ring networks under the constraint that each agent knows neither the number of nodes nor the number of agents. For this problem, we fully characterize the relation between probabilistic solvability and termination detection. First, we prove for any (small) constant p(0 <; p ≤ 1) that no randomized algorithm exists that solves, with probability p, the gathering problem with (termination) detection. For this reason, we consider the relaxed gathering problem, called the gathering problem without detection, which does not require termination detection. We propose a randomized algorithm that solves, with any given constant probability p(0 <; p <; 1), the gathering problem without detection. Finally, we prove that no randomized algorithm exists that solves, with probability 1, the gathering problem without detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reservation-Based Packet Bufferswith Deterministic Packet Departures

    Publication Year: 2014 , Page(s): 1297 - 1305
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB) |  | HTML iconHTML  

    High-performance routers need to temporarily store a large number of packets in response to congestion. DRAM is typically needed to implement large packet buffers, but the worst-case random access latencies of DRAM devices are too slow to match the bandwidth requirements of high-performance routers. Existing DRAM-based architectures for supporting linespeed queue operations can be classified into two categories: prefetching-based and randomization-based. They are all based on interleaving memory accesses across multiple parallel DRAM banks for achieving higher memory bandwidths, but they differ in their packet placement and memory operation scheduling mechanisms. In this paper, we describe novel reservation-based packet buffer architectures with interleaved memories that take advantage of the known packet departure times to achieve simplicity and determinism. The number of interleaved DRAM banks required to implement the proposed packet buffer architectures is independent of the number of logical queues, yet the proposed architectures can achieve the performance of an SRAM implementation. Our reservation-based solutions are scalable to growing packet storage requirements in routers while matching increasing line rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Risk-Constrained Operation for Internet Data Centers in Deregulated Electricity Markets

    Publication Year: 2014 , Page(s): 1306 - 1316
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3084 KB) |  | HTML iconHTML  

    In this paper, we study the problem of achieving the optimal tradeoff between operation risk and expected energy cost for Internet data center (IDC) operators in deregulated electricity markets according to the risk preferences of IDC operators. To achieve the target above, we propose a risk-constrained stochastic programming decision framework. Then, we formulate a risk-constrained expected energy cost minimization problem with the uncertainties in spot price and workload. To solve the formulated problem, we use a decomposition-based cutting plane algorithm. Finally, extensive evaluations based on real-life data show the effectiveness of the proposed decision framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Component-Based Localizationin Sparse Networks

    Publication Year: 2014 , Page(s): 1317 - 1327
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1099 KB) |  | HTML iconHTML  

    Accurate localization is crucial for wireless ad-hoc and sensor networks. Among the localization schemes, component-based approaches specialize in localization performance. By grouping nodes into increasingly large rigid components, component-based localization algorithms can properly conquer network sparseness and anchor sparseness. However, such design is sensitive to measurement errors. Existing robust localization methods focus on eliminating the positioning error of a single node. Indeed, a single node has two dimensions of freedom in 2D space and only suffers from one type of transformation: translation. As a rigid 2D structure, a component suffers from three possible transformations: translation, rotation, and reflection. A high degree of freedom brings about complicated cases of error productions and difficulties on error controlling. This study is the first work addressing how to deal with ranging noises for component-based methods. By exploiting a set of robust patterns, we present an Error-TOlerant Component-based algorithm (ETOC) that not only inherits the high-performance characteristic of component-based methods, but also achieves robustness of the result. We evaluate ETOC through a real-world sensor network consisting of 120 TelosB motes as well as extensive large-scale simulations. Experiment results show that, comparing with the-state-of-the-art designs, ETOC can work properly in sparse networks and provide more accurate localization results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems

    Publication Year: 2014 , Page(s): 1328 - 1338
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1374 KB) |  | HTML iconHTML  

    The explosive growth in data volume and complexity imposes great challenges for file systems. To address these challenges, an innovative namespace management scheme is in desperate need to provide both the ease and efficiency of data access. In almost all today's file systems, the namespace management is based on hierarchical directory trees. This tree-based namespace scheme is prone to severe performance bottlenecks and often fails to provide real-time response to complex data lookups. This paper proposes a Semantic-Aware Namespace scheme, called SANE, which provides dynamic and adaptive namespace management for ultra-large storage systems with billions of files. SANE introduces a new naming methodology based on the notion of semantic-aware per-file namespace, which exploits semantic correlations among files, to dynamically aggregate correlated files into small, flat but readily manageable groups to achieve fast and accurate lookups. SANE is implemented as a middleware in conventional file systems and works orthogonally with hierarchical directory trees. The semantic correlations and file groups identified in SANE can also be used to facilitate file prefetching and data de-duplication, among other system-level optimizations. Extensive trace-driven experiments on our prototype implementation validate the efficacy and efficiency of SANE. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signature Searching in a Networked Collection of Files

    Publication Year: 2014 , Page(s): 1339 - 1348
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1031 KB) |  | HTML iconHTML  

    A signature is a data pattern of interest in a large data file or set of large data files. Such signatures that need to be found arise in applications such as DNA sequence analysis, network intrusion detection, biometrics, large scientific experiments, speech recognition and sensor networks. Related to this is string matching. More specifically we envision a problem where long linear data files (i.e., flat files) contain multiple signatures that are to be found using a multiplicity of processors (parallel processor). This paper evaluates the performance of finding signatures in files residing in the nodes of parallel processors configured as trees, two dimensional meshes and hypercubes. We assume various combinations of sequential and parallel searching. A unique feature of this work is that it is assumed that data is pre-loaded onto processors, as may occur in practice, thus load distribution time need not be accounted for. Elegant expressions are found for average signature searching time and speedup, and graphical results are provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments

    Publication Year: 2014 , Page(s): 1349 - 1358
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1346 KB) |  | HTML iconHTML  

    Large-scale data centers leverage virtualization technology to achieve excellent resource utilization, scalability, and high availability. Ideally, the performance of an application running inside a virtual machine (VM) shall be independent of co-located applications and VMs that share the physical machine. However, adverse interference effects exist and are especially severe for data-intensive applications in such virtualized environments. In this work, we present TRACON, a novel Task and Resource Allocation CONtrol framework that mitigates the interference effects from concurrent data-intensive applications and greatly improves the overall system performance. TRACON utilizes modeling and control techniques from statistical machine learning and consists of three major components: the interference prediction model that infers application performance from resource consumption observed from different VMs, the interference-aware scheduler that is designed to utilize the model for effective resource management, and the task and resource monitor that collects application characteristics at the runtime for model adaption. We implement and validate TRACON with a variety of cloud applications. The evaluation results show that TRACON can achieve up to 25 percent improvement on application throughput on virtualized servers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory

    Publication Year: 2014 , Page(s): 1359 - 1369
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1005 KB) |  | HTML iconHTML  

    Transactional contention management policies show considerable variation in relative performance with changing workload characteristics. Consequently, incorporation of fixed-policy Transactional Memory (TM) in general purpose computing systems is suboptimal by design and renders such systems susceptible to pathologies. Of particular concern are Hardware TM (HTM) systems where traditional designs have hardwired policies in silicon. Adaptive HTMs hold promise, but pose major challenges in terms of design and verification costs. In this paper, we present the ZEBRA HTM design, which lays down a simple yet high-performance approach to implement adaptive contention management in hardware. Prior work in this area has associated contention with transactional code blocks. However, we discover that by associating contention with data (cache blocks) accessed by transactional code rather than the code block itself, we achieve a neat match in granularity with that of the cache coherence protocol. This leads to a design that is very simple and yet able to track closely or exceed the performance of the best performing policy for a given workload. ZEBRA, therefore, brings together the inherent benefits of traditional eager HTMs-parallel commits-and lazy HTMs-good optimistic concurrency without deadlock avoidance mechanisms-, combining them into a low-complexity design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology