By Topic

2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)

Date June 28 2010-July 1 2010

Filter Results

Displaying Results 1 - 25 of 53
  • [Front cover]

    Publication Year: 2010, Page(s): c1
    Request permission for commercial reuse | PDF file iconPDF (1358 KB)
    Freely Available from IEEE
  • DSN 2010 sponsors

    Publication Year: 2010, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (42 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2010, Page(s): ii
    Request permission for commercial reuse | PDF file iconPDF (89 KB)
    Freely Available from IEEE
  • Message from the general chair and conference coordinator

    Publication Year: 2010, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (96 KB) | HTML iconHTML
    Freely Available from IEEE
  • DSN 2010 organizers

    Publication Year: 2010, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (80 KB)
    Freely Available from IEEE
  • DSN 2010 Steering Committee

    Publication Year: 2010, Page(s): v
    Request permission for commercial reuse | PDF file iconPDF (71 KB)
    Freely Available from IEEE
  • FTXS Committees

    Publication Year: 2010, Page(s): vi
    Request permission for commercial reuse | PDF file iconPDF (83 KB)
    Freely Available from IEEE
  • FTXS program

    Publication Year: 2010, Page(s):vii - viii
    Request permission for commercial reuse | PDF file iconPDF (126 KB)
    Freely Available from IEEE
  • PFARM committees

    Publication Year: 2010, Page(s): ix
    Request permission for commercial reuse | PDF file iconPDF (75 KB)
    Freely Available from IEEE
  • PFARM program

    Publication Year: 2010, Page(s):x - xi
    Request permission for commercial reuse | PDF file iconPDF (105 KB)
    Freely Available from IEEE
  • WDSN committees

    Publication Year: 2010, Page(s): xii
    Request permission for commercial reuse | PDF file iconPDF (78 KB)
    Freely Available from IEEE
  • WDSN program

    Publication Year: 2010, Page(s):xiii - xiv
    Request permission for commercial reuse | PDF file iconPDF (104 KB)
    Freely Available from IEEE
  • WRAITS committees

    Publication Year: 2010, Page(s): xv
    Request permission for commercial reuse | PDF file iconPDF (74 KB)
    Freely Available from IEEE
  • WRAITS program

    Publication Year: 2010, Page(s):xvi - xvii
    Request permission for commercial reuse | PDF file iconPDF (97 KB)
    Freely Available from IEEE
  • Citation information

    Publication Year: 2010, Page(s): xviii
    Request permission for commercial reuse | PDF file iconPDF (89 KB)
    Freely Available from IEEE
  • DSN-W 2010 [Copyright notice]

    Publication Year: 2010, Page(s): xix
    Request permission for commercial reuse | PDF file iconPDF (95 KB)
    Freely Available from IEEE
  • DSN-W 2010 trademark information

    Publication Year: 2010, Page(s): xx
    Request permission for commercial reuse | PDF file iconPDF (69 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 2010, Page(s): xxi
    Request permission for commercial reuse | PDF file iconPDF (92 KB)
    Freely Available from IEEE
  • 1st workshop on fault-tolerance for HPC at extreme scale FTXS 2010

    Publication Year: 2010, Page(s): 1
    Request permission for commercial reuse | PDF file iconPDF (111 KB) | HTML iconHTML
    Freely Available from IEEE
  • Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example

    Publication Year: 2010, Page(s):2 - 7
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (255 KB) | HTML iconHTML

    Effective failure prediction and mitigation strategies in high-performance computing systems could provide huge gains in resilience of tightly coupled large-scale scientific codes. These gains would come from prediction-directed process migration and resource servicing, intelligent resource allocation, and checkpointing driven by failure predictors rather than at regular intervals based on nominal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate fault prediction of BlueGene/P RAS logs via geometric reduction

    Publication Year: 2010, Page(s):8 - 14
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1402 KB) | HTML iconHTML

    This investigation presents two distinct and novel approaches for the prediction of system failures occurring in Oak Ridge National Laboratory's Blue Gene/P supercomputer. Each technique uses raw numeric and textual subsets of large data logs of physical system information such as fan speeds and CPU temperatures. This data is used to develop models of the system capable of sensing anomalies, or de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A practical failure prediction with location and lead time for Blue Gene/P

    Publication Year: 2010, Page(s):15 - 22
    Cited by:  Papers (13)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (173 KB) | HTML iconHTML

    Analyzing, understanding and predicting failure is of paramount importance to achieve effective fault management. While various fault prediction methods have been studied in the past, many of them are not practical for use in real systems. In particular, they fail to address two crucial issues: one is to provide location information (i.e., the components where the failure is expected to occur on) ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed object storage rebuild analysis via simulation with GOBS

    Publication Year: 2010, Page(s):23 - 28
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (459 KB) | HTML iconHTML

    Community acceptance of the object storage device model as represented by standards and use in existing HPC filesystems has enabled the development of more complex data storage systems. Object replicas may be placed in a variety of ways to obtain various properties, such as scalable lookup times, concurrent access to multiple objects, and efficient reorganization. The construction of a fully funct... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • See applications run and throughput jump: The case for redundant computing in HPC

    Publication Year: 2010, Page(s):29 - 34
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (218 KB) | HTML iconHTML

    For future parallel-computing systems with as few as twenty-thousand nodes we propose redundant computing to reduce the number of application interrupts. The frequency of faults in exascale systems will be so high that traditional checkpoint/restart methods will break down. Applications will experience interruptions so often that they will spend more time restarting and recovering lost work, than ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Second workshop on proactive failure avoidance, recovery, and maintenance (PFARM)

    Publication Year: 2010, Page(s):35 - 37
    Request permission for commercial reuse | PDF file iconPDF (94 KB) | HTML iconHTML
    Freely Available from IEEE