By Topic

High Performance Extreme Computing (HPEC), 2012 IEEE Conference on

Date 10-12 Sept. 2012

Filter Results

Displaying Results 1 - 25 of 28
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE
  • [Title page]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (89 KB)  
    Freely Available from IEEE
  • A third generation many-core processor for secure embedded computing systems

    Page(s): 1 - 3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB) |  | HTML iconHTML  

    As compute-intensive products proliferate, there is an ever growing need to provide security features to detect tampering, identify cloned or counterfeit hardware, and deter cybersecurity threats. This paper describes the security features of the third generation 100-core HyperX™ processor which addresses these needs. Programmable security barriers allow the processor to implement a red-black System on Chip solution. The implementation of Physically Unclonable Functions (PUFs), encryption/decryption engines, a secure boot controller, and anti-tamper features enable the engineer to realize a secure embedded computing solution in an ultra-low power, many-core, C programmable processor-memory network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting SPM-aware Scheduling on EPIC architectures for high-performance real-time systems

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (510 KB) |  | HTML iconHTML  

    In contemporary computer architectures, the Explicitly Parallel Instruction Computing Architectures (EPIC) permits microprocessors to implement Instruction Level Parallelism (ILP) by using the compiler, rather than complex on-die circuitry to control parallel instruction execution like the superscalar architecture. Based on the EPIC, this paper proposes a time predictable two-level scratchpad based memory architecture, and a Scratchpad-aware Scheduling method to improve the performance by optimizing the Load-To-Use Distance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (325 KB) |  | HTML iconHTML  

    The utilization of Graphical Processing Units (GPUs) for the element-by-element (EbE) finite element method (FEM) is demonstrated. EbE FEM is a long known technique, by which a conjugate gradient (CG) type iterative solution scheme can be entirely decomposed into computations on the element level, i.e., without assembling the global system matrix. In our implementation, NVIDIA's parallel computing solution, the Compute Unified Device Architecture (CUDA), is used to perform the required element-wise computations in parallel. Since element matrices need not be stored, the memory requirement can be kept extremely low. It is shown that this low-storage but computation-intensive technique is better suited for GPUs than those requiring the massive manipulation of large data sets. This study of the proposed parallel model illustrates a highly improved locality and minimization of data movement, which could also significantly reduce energy consumption in other heterogeneous HPC architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerating fully homomorphic encryption using GPU

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (577 KB) |  | HTML iconHTML  

    As a major breakthrough, in 2009 Gentry introduced the first plausible construction of a fully homomorphic encryption (FHE) scheme. FHE allows the evaluation of arbitrary functions directly on encrypted data on untwisted servers. In 2010, Gentry and Halevi presented the first FHE implementation on an IBM x3500 server. However, this implementation remains impractical due to the high latency of encryption and recryption. The Gentry-Halevi (GH) FHE primitives utilize multi-million-bit modular multiplications and additions which are time-consuming tasks for a general purpose computer. In the GH-FHE implementation, the most computationally intensive arithmetic operation is modular multiplication. In this paper, the million-bit modular multiplication is computed in two steps. For large number multiplication, Strassen's FFT based algorithm is employed and accelerated on a graphics processing unit (GPU) through its massive parallelism. Subsequently, Barrett modular reduction algorithm is applied to implement modular reduction. As an experimental study, we implement the GH-FHE primitives for the small setting with a dimension of 2048 on NVIDIA C2050 GPU. The experimental results show the speedup factors of 7.68, 7.4 and 6.59 for encryption, decryption and recrypt respectively, when compared with the existing CPU implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Use of CUDA for the Continuous Space Language Model

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (261 KB) |  | HTML iconHTML  

    The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). Implementation was accomplished using a combination of CUBLAS library routines and CUDA kernel calls on three different CUDA enabled devices of varying compute capability and a time savings over the traditional CPU approach demonstrated. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graph programming model: An efficient approach for sensor signal processing

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (188 KB) |  | HTML iconHTML  

    The HPC community has struggled to find a parallel programming model or language that can efficiently expose algorithmic parallelism in a sequential program and automate the implementation of a highly efficient parallel program. A plethora of parallel programming languages have been developed along with sophisticated compilers and runtimes, but none of these approaches have been successful enough to became a defacto standard. Graph Programming Model has the capability and efficiencies to become that ubiquitous standard for the signal processing domain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An application of the constraint programming to the design and operation of synthetic aperture radars

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (190 KB) |  | HTML iconHTML  

    The design and operation of synthetic aperture radars require compatible sets of hundreds of quantities. Compatibility is achieved when these quantities satisfy constraints arising from physics, geometry etc. In the aggregate these quantities and constraints form a logical model of the radar. In practice the logical model is distributed over multiple people, documents and software modules thereby becoming fragmented. Fragmentation gives rise to inconsistencies and errors. The SAR Inference Engine addresses the fragmentation problem by implementing the logical model of a Sandia synthetic aperture radar in a form that is intended to be usable from system design to mission planning to actual operation of the radar. These diverse contexts require extreme flexibility that is achieved by employing the constraint programming paradigm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast functional simulation with a dynamic language

    Page(s): 1 - 3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (631 KB) |  | HTML iconHTML  

    Simulation of large computational systems-on-a-chip (SoCs) is increasing challenging as the number and complexity of components is scaled up. With the ubiquity of programmable components in computational SoCs, fast functional instruction-set simulation (ISS) is increasingly important. Much ISS has been done with straightforward functional models of a non-pipelined fetch-decode-execute iteration written in a low-to-mid-level C-family static language, delivering mid-level efficiency. Some ISS programs, such as QEMU, perform dynamic binary translation to allow software emulation to reach more usable speeds. This relatively complex methodology has not been widely adopted for system modeling. We demonstrate a fresh approach to ISS that achieves performance comparable to a fast dynamic binary translator by exploiting recent advances in just-in-time (JIT) compilers for dynamic languages, such as JavaScript and Lua, together with a specific programming idiom inspired by pipelined processor design. We believe that this approach is relatively accessible to system designers familiar with C-family functional simulator coding styles, and may be generally useful for fast modeling of complex SoC components. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthetic Aperture Radar on low power multi-core Digital Signal Processor

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1134 KB) |  | HTML iconHTML  

    Commercial off-the-self (COTS) components have recently gained popularity in Synthetic Aperture Radar (SAR) applications. The compute capabilities of these devices have advanced to a level where real time processing of complex SAR algorithms have become feasible. In this paper, we focus on a low power multi-core Digital Signal Processor (DSP) from Texas Instruments Inc. and evaluate its capability for SAR signal processing. The specific DSP studied here is an eight-core device, codenamed TMS320C6678, that provides a peak performance of 128 GFLOPS (single precision) for only 10 watts. We describe how the basic SAR operations can be implemented efficiently in such a device. Our results indicate that a baseline SAR range-Doppler algorithm takes around 0.25 second for a 16 M (4K × 4K) image, achieving real-time performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ruggedization of MXM graphics modules

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (342 KB) |  | HTML iconHTML  

    MXM modules, used to package graphics processing devices for use in benign environments, have been tested for use in harsh environments typical of deployed defense and aerospace systems. Results show that specially mechanically designed MXM GP-GPU modules can survive these environments, and successfully provide the enormous processing capability offered by the latest generation of GPUs to harsh environment applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel search of k-nearest neighbors with synchronous operations

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB) |  | HTML iconHTML  

    We present a new study of parallel algorithms for locating k-nearest neighbors (kNN) of each single query in a high dimensional (feature) space on a many-core processor or accelerator that favors synchronous operations, such as on a graphics processing unit. Exploiting the intimate relationships between two primitive operations, select and sort, we introduce a cohort of truncated sort algorithms for parallel kNN search. The truncated bitonic sort (TBiS) in particular has desirable data locality, synchronous concurrency and simple data and program structures. Its implementation on a graphics processing unit outperforms the other existing implementations for kNN search based on either sort or select operations. We provide algorithm analysis and experimental results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HPC-VMs: Virtual machines in high performance computing systems

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (310 KB) |  | HTML iconHTML  

    The concept of virtual machines dates back to the 1960s. Both IBM and MIT developed operating system features that enabled user and peripheral time sharing, the underpinnings of which were early virtual machines. Modern virtual machines present a translation layer of system devices between a guest operating system and the host operating system executing on a computer system, while isolating each of the guest operating systems from each other. 1 In the past several years, enterprise computing has embraced virtual machines to deploy a wide variety of capabilities from business management systems to email server farms. Those who have adopted virtual deployment environments have capitalized on a variety of advantages including server consolidation, service migration, and higher service reliability. But they have also ended up with some challenges including a sacrifice in performance and more complex system management. Some of these advantages and challenges also apply to HPC in virtualized environments. In this paper, we analyze the effectiveness of using virtual machines in a high performance computing (HPC) environment. We propose adding some virtual machine capability to already robust HPC environments for specific scenarios where the productivity gained outweighs the performance lost for using virtual machines. Finally, we discuss an implementation of augmenting virtual machines into the software stack of a HPC cluster, and we analyze the affect on job launch time of this implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multithreaded FPGA acceleration of DNA sequence mapping

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (868 KB) |  | HTML iconHTML  

    In bioinformatics, short read alignment is a computationally intensive operation that involves matching millions of short strings (called reads) against a reference genome. At the time of writing, a representative run requires to match tens of millions of reads of length of about 100 symbols against a genome that can consists of a few billion characters. Existing short read aligners are expected to report all the occurrences of each read as well as allow users to control the number of allowed mismatches between reads and reference genome. Popular software implementations such as Bowtie [8] or BWA [10] can take many hours or days to execute, making the problem an ideal candidate for hardware acceleration. In this paper, we describe FHAST (FPGA Hardware Accelerated Sequencing-matching Tool), a hardware accelerator that acts as a drop-in replacement for short read alignment software. Our architecture masks memory latency by executing many concurrent hardware threads accessing memory simultaneously and consists of multiple parallel engines to exploit the parallelism available to us on an FPGA. We have implemented and tested FHAST on the Convey HC-1 [9], taking advantage of the large amount of memory bandwidth available to the system and the shared memory image between hardware and software. By comparing the performance of FHAST against Bowtie on the Convey HC-1 we observed up to ~70X improvement in total end-to-end execution time, reducing runs that take several hours to a few minutes. We also favorably compare the rate of growth when expanding FHAST to utilize multiple FPGAs against multiple CPUs in Bowtie. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large scale network situational awareness via 3D gaming technology

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (625 KB) |  | HTML iconHTML  

    Obtaining situational awareness of network activity across an enterprise presents unique visualization challenges. IT analysts are required to quickly gather and correlate large volumes of disparate data to identify the existence of anomalous behavior. This paper will show how the MIT Lincoln Laboratory LLGrid Team has approached obtaining network situational awareness utilizing the Unity 3D video game engine. We have developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter (FPS) to demonstrate a virtual depiction of the current state of the network and the machines operating on the network. Within the game or virtual world an analyst or player can gather critical information on all network assets as well as perform physical system actions on machines in question. 3D gaming technology provides tools to create an environment that is both visually familiar to the player as well display immense amounts of system data in a meaningful and easy to absorb format. Our prototype system was able to monitor and display 5000 assets in ~10% of the time of our network time window. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable cryptographic authentication for high performance computing

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (182 KB) |  | HTML iconHTML  

    High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system [1,2,3] at MIT Lincoln Laboratory and its use to solve one such authentication problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An update on SIPHER (Scalable Implementation of Primitives for Homomorphic EncRyption) — FPGA implementation using Simulink

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (369 KB) |  | HTML iconHTML  

    Accelerating the development of a practical Fully Homomorphic Encryption (FHE) scheme is the goal of the DARPA PROCEED program. For the past year, this program has had as its focus the acceleration of various aspects of the FHE concept toward practical implementation and use. FHE would be a game-changing technology to enable secure, general computation on encrypted data, e.g., on untrusted off-site hardware. However, FHE will still require several orders of magnitude improvement in computation before it will be practical for widespread use. Recent theoretical breakthroughs demonstrated the existence of FHE schemes [1, 2], and to date much progress has been made in both algorithmic and implementation improvements. Specifically our contribution to the Proceed program has been the development of FPGA based hardware primitives to accelerate the computation on encrypted data using FHE based on lattice techniques [3]. Our project, SIPHER, has been using a state of the art tool-chain developed by Mathworks to implement VHDL code for FPGA circuits directly from Simulink models. Our baseline Homomorphic Encryption prototypes are developed directly in Matlab using the fixed point toolbox to perform the required integer arithmetic. Constant improvements in algorithms require us to be able to quickly implement them in a high level language such as Matlab. We reported on our initial results at HPEC 2011 [4]. In the past year, increases in algorithm complexity have introduced several new design requirements for our FPGA implementation. This report presents new Simulink primitives that had to be developed to deal with these new requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scrubbing optimization via availability prediction (SOAP) for reconfigurable space computing

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1203 KB) |  | HTML iconHTML  

    Reconfigurable computing with FPGAs can be highly effective in terms of performance, adaptability, and power for accelerating space applications, but their configuration memory must be scrubbed to prevent the accumulation of single-event upsets. Many scrubbing techniques currently exist, each with different advantages, making it difficult for the system designer to choose the optimal scrubbing strategy for a given mission. This paper surveys the currently available scrubbing techniques and introduces the SOAP method for predicting system availability for various scrubbing strategies using Markov models. We then apply the method to compare hypothetical Virtex-5 and Virtex-6 systems for blind, CRC-32, and Frame ECC scrubbing strategies in LEO and HEO. We show that availability in excess of 5 nines can be obtained with modern, FPGA-based systems using scrubbing. Furthermore, we show the value of the SOAP method by observing that different scrubbing strategies are optimal for different types of missions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CUDA and OpenCL implementations of 3D CT reconstruction for biomedical imaging

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (587 KB) |  | HTML iconHTML  

    Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized parallel distribution load flow solver on commodity multi-core CPU

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3045 KB) |  | HTML iconHTML  

    Solving a large number of load flow problems quickly is required for Monte Carlo analysis and various power system problems, including long term steady state simulation, system benchmarking, among others. Due to the computational burden, such applications are considered to be time-consuming, and infeasible for online or realtime application. In this work we developed a high performance framework for high throughput distribution load flow computation, taking advantage of performance-enhancing features of multi-core CPUs and various code optimization techniques. We optimized data structures to better fit the memory hierarchy. We use the SPIRAL code generator to exploit inherent patterns of the load flow model through code specizlization. We use SIMD instructions and multithreading to parallelize our solver. Finally, we designed a Monte Carlo thread scheduling infrastructure to enable real time operation. The optimized solver is able to achieve more than 50% of peak performance on a Intel Core i7 CPU, which translates to solving millions of load flow problems within a second for IEEE 37 test feeder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient and scalable computations with sparse tensors

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (873 KB) |  | HTML iconHTML  

    For applications that deal with large amounts of high dimensional multi-aspect data, it becomes natural to represent such data as tensors or multi-way arrays. Multi-linear algebraic computations such as tensor decompositions are performed for summarization and analysis of such data. Their use in real-world applications can span across domains such as signal processing, data mining, computer vision, and graph analysis. The major challenges with applying tensor decompositions in real-world applications are (1) dealing with large-scale high dimensional data and (2) dealing with sparse data. In this paper, we address these challenges in applying tensor decompositions in real data analytic applications. We describe new sparse tensor storage formats that provide storage benefits and are flexible and efficient for performing tensor computations. Further, we propose an optimization that improves data reuse and reduces redundant or unnecessary computations in tensor decomposition algorithms. Furthermore, we couple our data reuse optimization and the benefits of our sparse tensor storage formats to provide a memory-efficient scalable solution for handling large-scale sparse tensor computations. We demonstrate improved performance and address memory scalability using our techniques on both synthetic small data sets and large-scale sparse real data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Benchmarking parallel eigen decomposition for residuals analysis of very large graphs

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (235 KB) |  | HTML iconHTML  

    Graph analysis is used in many domains, from the social sciences to physics and engineering. The computational driver for one important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of a graph. This paper explores the computational implications of performing an eigen decomposition of a directed graph's symmetrized modularity matrix using commodity cluster hardware and freely available eigensolver software, for graphs with 1 million to 1 billion vertices, and 8 million to 8 billion edges. Working with graphs of these sizes, parallel eigensolvers are of particular interest. Our results suggest that graph analysis approaches based on eigen space analysis of graph residuals are feasible even for graphs of these sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Driving big data with big compute

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (543 KB) |  | HTML iconHTML  

    Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Anatomy of a globally recursive embedded LINPACK benchmark

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    We present a complete bottom-up implementation of an embedded LINPACK benchmark on iPad 2. We use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. We be believe our new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. We show a assembly API that allows us a much higher level of abstraction and provides rapid code development within the confines of mobile device SDK. We use performance modeling to help with the limitation of the device and the limited access to device from the development environment not geared for HPC application tuning. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.