Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Workload Characterization Symposium, 2005. Proceedings of the IEEE International

Date 6-8 Oct. 2005

Filter Results

Displaying Results 1 - 25 of 30
  • Proceedings of the 2005 IEEE International Symposium on Workload Characterization

    Publication Year: 2005 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (22 KB)  
    Freely Available from IEEE
  • IEEE International Symposium on Workload Characterization - Copyright

    Publication Year: 2005 , Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (73 KB)  
    Freely Available from IEEE
  • Message from the General Chair

    Publication Year: 2005 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (130 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Chair

    Publication Year: 2005 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • IISWC-2005 people

    Publication Year: 2005 , Page(s): v
    Save to Project icon | Request Permissions | PDF file iconPDF (98 KB)  
    Freely Available from IEEE
  • IISWC-2005 reviewers

    Publication Year: 2005 , Page(s): vi
    Save to Project icon | Request Permissions | PDF file iconPDF (64 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2005 , Page(s): vii - viii
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • Summarizing performance is no mean feat [computer performance analysis]

    Publication Year: 2005
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (228 KB)  

    For decades, computer benchmarkers have fought a war of means, arguing over proper uses of arithmetic, harmonic, and geometric means, starting in the mid-1980s. One would think this basic issue of computer performance analysis would have been long resolved, but contradictions are still present in some excellent and widely-used textbooks. This paper offers a framework that resolves these issues and includes both workload analyses and relative performance analyses (such as SPEC or Livermore Loops), emphasizing differences between algebraic and statistical approaches. In some cases, the lognormal distribution is found to be quite useful, especially with appropriate forms of standard deviation and confidence interval used to augment the usual geometric mean. Results can be used to indicate the relative importance of careful workload analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation

    Publication Year: 2005 , Page(s): 2 - 12
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (945 KB) |  | HTML iconHTML  

    Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to complete. Simulating the full execution of the whole benchmark suite for one architecture configuration can take months. To address this issue researchers have examined using targetted sampling based on phase behavior to significantly reduce the simulation time of each program in the benchmark suite. However, even with this sampling approach, simulating the full benchmark suite across a large range of architecture designs can take days to weeks to complete. The goal of this paper is to further reduce simulation time for architecture design space exploration. We reduce simulation time by finding similarity between benchmarks and program inputs at the level of samples (100M instructions of execution). This allows us to use a representative sample of execution from one benchmark to accurately represent a sample of execution of other benchmarks and inputs. The end result of our analysis is a small number of sample points of execution. These are selected across the whole benchmark suite in order to accurately represent the complete simulation of the whole benchmark suite for design space exploration. We show that this provides approximately the same accuracy as the SimPoint sampling approach while reducing the number of simulated instructions by a factor of 1.5. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting recurrent phase behavior under real-system variability

    Publication Year: 2005 , Page(s): 13 - 23
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (623 KB) |  | HTML iconHTML  

    As computer systems become ever more complex and power hungry, research on dynamic on-the-fly system management and adaptations receives increasing attention. Such research relies on recognizing and responding to patterns or phases in application execution, which has therefore become an important and widely-studied research area. While application phase analysis has received significant attention, much of this attention thus far has focused on simulation-based studies. In these cycle-level simulations without indeterministic operating system intervention, applications display behavior that is repeatable from phase to phase and from run to run. A natural question, therefore, concerns how these phases appear in real system runs, where interrupts and time variability can influence the timing and behavior of the program. Our paper examines the phase behavior of applications running on real systems. The key goals of our work are to reliably discern and recover phase behavior in the face of application variability stemming from real system effects and time sampling. We propose a set of new, "transition-based" phase detection techniques. Our techniques can detect repeatable workload phase information from time-varying, real system measurements with less than 5% false alarm probabilities. In comparison to previous value-based detection methods, our transition-based techniques achieve on average 6x higher recurrent phase detection efficiency under real system variability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance characterization of high definition digital video decoding using H.264/AVC

    Publication Year: 2005 , Page(s): 24 - 33
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB) |  | HTML iconHTML  

    H.264/AVC is a new international video coding standard that provides higher coding efficiency with respect to previous standards at the expense of a higher computational complexity. The complexity is even higher when H.264/AVC is used in applications with high bandwidth and high quality like high definition (HD) video decoding. In this paper, we analyze the computational requirements of H.264 decoder with a special emphasis in HD video and we compare it with previous standards and lower resolutions. The analysis was done with a SIMD optimized decoder using hardware performance monitoring. The main objective is to identify the application bottlenecks and to suggest the necessary support in the architecture for processing HD video efficiently. We have found that H.264/AVC decoding of HD video perform many more operations per frame than MPEG-4 and MPEG-2, has new kernels with more demanding memory access patterns and has a lot data dependent branches that are difficult to predict. In order to improve the H.264/AVC decoding process at HD it is necessary to explore a better support in media instructions, specialized prefetching techniques and possibly, the use of some kind of multiprocessor architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The ALPBench benchmark suite for complex multimedia applications

    Publication Year: 2005 , Page(s): 34 - 45
    Cited by:  Papers (47)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    Multimedia applications are becoming increasingly important for a large class of general-purpose processors. Contemporary media applications are highly complex and demand high performance. A distinctive feature of these applications is that they have significant parallelism, including thread- , data-, and instruction-level parallelism, that is potentially well-aligned with the increasing parallelism supported by emerging multi-core architectures. Designing systems to meet the demands of these applications therefore requires a benchmark suite comprising these complex applications and that exposes the parallelism present in them. This paper makes two contributions. First, it presents ALPBench, a publicly available benchmark suite that pulls together five complex media applications from various sources: speech recognition (CMU Sphinx 3), face recognition (CSU), ray tracing (Tachyon), MPEG-2 encode (MSSG), and MPEG-2 decode (MSSG). We have modified the original applications to expose thread-level and data-level parallelism using POSIX threads and sub-word SIMD (Inters SSE2) instructions respectively. Second, the paper provides a performance characterization of the ALPBench benchmarks, with a focus on parallelism. Such a characterization is useful for architects and compiler writers for designing systems and compiler optimizations for these applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing overheads for acquiring dynamic memory traces

    Publication Year: 2005 , Page(s): 46 - 55
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (349 KB) |  | HTML iconHTML  

    Tools for acquiring dynamic memory address information for large scale applications are important for performance modeling, optimization, and for trace-driven simulation. However, straightforward use of binary instrumentation tools for such a fine-grained task as address tracing can cause astonishing slowdown in application run time. For example, in a large scale FY05 collaboration with the Department of Defense High Performance Computing Modernization Office (HPCMO), over 1 million processor hours were expended in order to gather address information on 7 parallel applications. In this paper, we discuss in detail the issues surrounding the performance of memory address acquisition using low-level binary instrumentation tracing. We present three techniques and optimizations to improve performance: 1) SimPoint-guided sampling, 2) instrumentation tool routine optimization, and 3) reduction of instrumentation points through static application analysis. The use of these three techniques together reduces instrumented application slowdown by an order of magnitude. The techniques are generally applicable and have been deployed in the MetaSim tracer thereby enabling memory address acquisition for real-sized applications. We expect the optimizations reported here reduce the HPCMO effort by approximately 80% in FY06. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate statistical approaches for generating representative workload compositions

    Publication Year: 2005 , Page(s): 56 - 66
    Cited by:  Papers (5)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (382 KB) |  | HTML iconHTML  

    Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which principal components analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose independent components analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-level comparative performance characterization of SPECjbb2005 versus SPECjbb2000

    Publication Year: 2005 , Page(s): 67 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (454 KB) |  | HTML iconHTML  

    SPEC has released SPECjbb2005, a new server-side Java benchmark which supersedes SPECjbb2000. SPECjbb2005 is a substantial update to SPECjbb2000, intended to make the workload more representative based on current Java development practices. SPECjbb2000 has been in existence for about five years and it has been a valuable tool for optimizing the performance of commercial JVMs as well as supporting research activities. Since SPECjbb2005 replaces SPECjbb2000, it is important to understand the key differences between the two, as well as implications for JVM and hardware designers. In this paper, we present a comparative characterization of these two workloads based on detailed measurements on an Intel® Xeon™ processor-based commercial server. First, we describe key functional changes introduced in SPECjbb2005. Using low-intrusion application profiling tools we compare application execution profiles. Through JVM monitoring tools, we compare JVM behavior including JIT optimization and garbage collection. Using operating system monitoring tools we compare key system level metrics including CPU utilization. With the aid of processor performance monitoring events, we compare key architectural characteristics such as cache miss rates, memory/bus utilization, and branch behavior. Finally, we summarize key findings, provide recommendations to JVM developers and hardware designers, and suggest areas for future work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload characterization of biometric applications on Pentium 4 microarchitecture

    Publication Year: 2005 , Page(s): 76 - 86
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (468 KB) |  | HTML iconHTML  

    Biometric computing is a technique that uses physiological and behavioral characteristics of persons to identify and authenticate individuals. Due to the increasing demand on security, privacy and anti-terrorism, biometric applications represent the rapidly growing computing workloads. However, very few results on the execution characteristics of these applications on the state-of-the-art microprocessor and memory systems have been published so far. This paper proposes a suite of biometric applications and reports the results of a biometric workload characterization effort, focusing on various architecture features. To understand the impacts and implications of biometric workloads on the processor and memory architecture design, we contrast the characteristics of biometric workloads and the widely used SPEC 2000 integer benchmarks. Our experiments show that biometric applications typically show small instruction footprint that can fit in the L1 instruction cache. The loads and stores account for more than 50% of the dynamic instructions. This indicates that biometric applications are data-centric in nature. Although biometric applications work across large-scale datasets to identify matched patterns, the active working sets of these workloads are usually small. As a result, prefetching and large L2 cache effectively handle the data footprints of a majority of the studied benchmarks. Branch misprediction rate is less than 4% on all studied workloads. The IPC of the studied benchmarks ranges from 0.13 to 0.77 indicates that out-of-order superscalar execution is not quite efficient. The developed biometric benchmark suite (BMW) and input data sets are freely available and can be downloaded from http://www.ideal.ece.ufl.edu/BMW. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characterization and analysis of HMMER and SVM-RFE parallel bioinformatics applications

    Publication Year: 2005 , Page(s): 87 - 98
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (549 KB) |  | HTML iconHTML  

    Bioinformatics applications constitute an emerging data-intensive, high-performance computing (HPC) domain. While there is much research on algorithmic improvements, (2004), the actual performance of an application also depends on how well the program maps to the target hardware. This paper presents a performance study of two parallel bioinformatics applications HMMER (sequence alignment) and SVM-RFE (gene expression analysis), on Intel x86 based hyperthread-capable (2002) shared-memory multiprocessor systems. The performance characteristics varied according to the application and target hardware characteristics. For instance, HMMER is compute intensive and showed better scalability on a 3.0 GHz system versus a 2.2 GHz system. However, SVM-RFE is memory intensive and showed better absolute performance on the 2.2 GHz machine which has better memory bandwidth. The performance is also impacted by processor features, e.g. hyperthreading (HT) (2002) and prefetching. With HMMER we could obtain -75% of the performance with HT enabled with respect to doubling the number of CPUs. While load balancing optimizations can provide speedup of -30% for HMMER on a hyperthreading-enabled system, the load balancing has to adapt to the target number of processors and threads. SVM-RFE benefits differently from the same load-balancing and thread scheduling tuning. We conclude that compiler and runtime optimizations play an important role to achieve the best performance for a given bioinformatics algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel processing in biological sequence comparison using general purpose processors

    Publication Year: 2005 , Page(s): 99 - 108
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (787 KB) |  | HTML iconHTML  

    The comparison and alignment of DNA and protein sequences are important tasks in molecular biology and bioinformatics. One of the most well known algorithms to perform the string-matching operation present in these tasks is the Smith-Waterman algorithm (SW). However, it is a computation intensive algorithm, and many researchers have developed heuristic strategies to avoid using it, specially when using large databases to perform the search. There are several efficient implementations of the SW algorithm on general purpose processors. These implementations try to extract data-level parallelism taking advantage of single-instruction multiple-data extensions (SIMD), capable of performing several operations in parallel on a set of data. In this paper, we propose a more efficient data parallel implementation of the SW algorithm. Our proposed implementation obtains a 30% reduction in the execution time relative to the previous best data-parallel alternative. In this paper we review different alternative implementation of the SW algorithm, compare them with our proposal, and present preliminary results for some heuristic implementations. Finally, we present a detailed study of the computational complexity of the different alignment algorithms presented and their behavior on the different aspect of the CPU microarchitecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualization techniques for system performance characterization

    Publication Year: 2005
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB)  

    As computer environments increase in size and complexity, it is becoming more challenging to analyze and identify factors that limit performance and scalability. Application developers and system administrators need easy-to-use tools that enable them to quickly identify system bottlenecks and configure system for best performance. In this paper, we present techniques that allow users to view the load on all major system component simultaneously, with negligible overhead, and no changes in the application. We include several case studies where such techniques have been used to characterize and improve performance of applications, system software, and future CPU/platform designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient power analysis using synthetic testcases

    Publication Year: 2005 , Page(s): 110 - 118
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    Power dissipation has become an important consideration for processor designs. Assessing power using simulators is problematic given the long runtimes of real applications. Researchers have responded with techniques to reduce the total number of simulated instructions while still maintaining representative simulation behavior. Synthetic testcases have been shown to reduce the number of necessary instructions significantly while still achieving accurate performance results for many workload characteristics. In this paper, we show that the synthetic testcases can rapidly and accurately assess the dynamic power dissipation of real programs. Synthetic versions of the SPEC2000 and STREAM benchmarks can predict the total power per cycle to within 6.8% error on average, with a maximum of 15% error, and total power per instruction to within 4.4% error. In addition, for many design changes for which IPC and power change significantly, the synthetic testcases show small errors, many less than 5%. We also show that simulated power dissipation for both applications and synthetics correlates well with the IPCs of the real programs, often giving a correlation coefficient greater than 0.9. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of Java virtual machine scalability issues on SMP systems

    Publication Year: 2005 , Page(s): 119 - 128
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1524 KB) |  | HTML iconHTML  

    This paper studies the scalability issues of Java virtual machine (JVM) on symmetrical multiprocessing (SMP) systems. Using a cycle-accurate simulator, we evaluate the performance scaling of multithreaded Java benchmarks with the number of processors and application threads. By correlating low-level hardware performance data to two high-level software constructs: thread types and memory regions, we present in detail the performance analysis and study the potential performance impacts of memory system latencies and resource contentions on scalability. Several key findings are revealed through this paper. First, among the memory access latency components, the primary portion of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Second, among the regions of memory, Java heap space produces most memory stalls. Additionally, a large majority of memory stalls occur in application threads, as opposed to other JVM threads. Furthermore, we find that increasing the number of processors or application threads, independently of each other, leads to increases in L2 cache miss ratio and cache-to-cache transfer ratio. This problem can be alleviated by using a thread-local heap or allocation buffer which can improve L2 cache performance. For certain benchmarks such as Raytracer, their cache-to-cache transfers, mainly dominated by false sharing, can be significantly reduced. Our experiments also show that a thread-local allocation buffer with a size between 16KB and 256KB often leads to optimal performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload characterization for the design of future servers

    Publication Year: 2005 , Page(s): 129 - 136
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (423 KB) |  | HTML iconHTML  

    Workload characterization has become an integral part of the design of future servers since their characteristics can guide the developers to understand the workload requirements and how the underlying architecture would optimize the performance of the intended workload. In this paper, we give an overview of the POWER5 architecture. We also introduce the POWER5 performance monitor facilities and performance events that lead to the construction of a CPI (cycles per instruction) breakdown model. For our study, we characterize four different groups of workloads: commercial, HPC, memory, and scientific. Using the data obtained from the POWER5 performance counters, we breakdown the CPI stack into a base component, when the processor is completing work and a stall component when the processor is not completing instructions. The stall component can be further divided into cycles when the pipeline was empty and cycles when the pipeline was not empty but completion is stalled. With this model, we enumerate the number of processing cycles, i.e., a fraction of the CPI, a workload spent while progressing through the core resources and the incurred penalty upon encountering those resource usage inhibitors. The results show the CPI breakdown for each workload, identify where each workload spends its processing cycles and the associated CPI cost when accessing the core resources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding the causes of performance variability in HPC workloads

    Publication Year: 2005 , Page(s): 137 - 149
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (510 KB) |  | HTML iconHTML  

    While most workload characterization focuses on application and architecture performance, the variability in performance also has wide ranging impacts on the users and managers of large scale computing resources. Performance variability, though secondary to absolute performance itself can significantly detract from both the overall performance realized by parallel workloads and the suitability of a given architecture for a workload. In making choices about how to best match an HPC workload to an HPC architecture most examinations focus primarily on application performance, often in terms nominal or optimal performance. A practical concern which brackets the degree to which one can expect to see this performance in a multi-user production computing environment is the degree to which performance varies. Without an understanding of the performance variability exhibited by a computer for a given workload, in a practical sense, the effective performance that can be realized is still undetermined. In this work we examine both architectural and application causes of variability, quantify their impacts, and demonstrate performance gains realized by reducing variability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The FeasNewt benchmark

    Publication Year: 2005 , Page(s): 150 - 154
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (266 KB)  

    We describe the FeasNewt mesh-quality optimization benchmark. The performance of the code is dominated by three phases - gradient evaluation, Hessian evaluation and assembly, and sparse matrix-vector products - that have very different mixtures of floating-point operations and memory access patterns. The code includes an optional runtime data- and iteration-reordering phase, making it suitable for research on irregular memory access patterns. Mesh-quality optimization (or "mesh smoothing") is an important ingredient in the solution of nonlinear partial differential equations (PDEs) as well as an excellent surrogate for finite-element or finite-volume PDE solvers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comprehensive throughput evaluation of LANs in clusters of PCs with Switchbench - or how to bring your switch to its knees

    Publication Year: 2005 , Page(s): 155 - 162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB) |  | HTML iconHTML  

    Understanding the performance of parallel applications for prevalent clusters of commodity PCs is still not an easy task: One must understand performance characteristics of all subsystems in the cluster machine, besides the inherently required knowledge about the applications' behaviour. While there are already many benchmarks that characterise a single node's subsystems like CPU, memory and I/O. as well as a few to evaluate its network interface with point-to-point data streams, there are to the best of our knowledge no benchmarks available that characterise a cluster network or LAN as a whole. We present Switchbench (2005), a set of three microbenchmarks that thoroughly evaluate the throughput characteristics of networks for clusters. A first microbenchmark tests the basic processing limitations of the switches, by sending and receiving data concurrently at maximum throughputs on all network interfaces. A second microbenchmark tests arbitrary communication patterns by pairwise connecting nodes for high-speed throughput tests. A third and slightly more realistic microbenchmark executes an all-to-all personalised communication (AAPC) algorithm to test many different patterns and critical bisections in the network. The microbenchmarks already proved to be extremely useful in a previous study to experimentally quantify performance limitations in different networks of clusters of PCs with up to 128 nodes. We also establish the suitability of our microbenchmarks by comparing their results with two application benchmarks. The benchmarks consist of two C programs supported by shell scripts to start the programs on all nodes of the cluster with the correct execution parameters to automatically scale the workloads from a few nodes up to the full cluster size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.