2013 1st International Workshop on Data Analysis Patterns in Software Engineering (DAPSE)

21-21 May 2013

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Publication Year: 2013, Page(s):i - ii
    Request permission for reuse | PDF file iconPDF (40 KB)
    Freely Available from IEEE
  • Contents

    Publication Year: 2013, Page(s): 1
    Request permission for reuse | PDF file iconPDF (46 KB)
    Freely Available from IEEE
  • Foreword

    Publication Year: 2013, Page(s):iii - iv
    Request permission for reuse | PDF file iconPDF (127 KB) | HTML iconHTML
    Freely Available from IEEE
  • Building Statistical Language Models of code

    Publication Year: 2013, Page(s):1 - 3
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (188 KB) | HTML iconHTML

    We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Commit graphs

    Publication Year: 2013, Page(s):4 - 5
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (174 KB) | HTML iconHTML

    We present commit graphs, a graph representation of the commit history in version control systems. The graph is structured by commonly changed files between commits. We derive two analysis patterns relating to bug-fixing commits and system modularity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concept to commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories

    Publication Year: 2013, Page(s):6 - 8
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (504 KB) | HTML iconHTML

    The concept to commit pattern is used for tracing code changes from user requests (analyzing the mailing list) to change implementation (analyzing the code repository). The analysis is done via text mining of both emails and commits descriptions in 4 stages. The first stage is identifying a search time window for the mailing list by evaluating a targeted commit time stamp. Once a window is establi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data analysis anti-patterns in empirical software engineering

    Publication Year: 2013, Page(s):9 - 10
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (106 KB) | HTML iconHTML

    The paper introduces the concept of data analysis anti-patterns, i.e., data analysis procedures that may lead to invalid results that may mislead decision makers. Two examples of anti-patterns are presented and discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effect size analysis

    Publication Year: 2013, Page(s):11 - 13
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (190 KB) | HTML iconHTML

    When we seek insight in collected data we are most often forced to limit our measurements to a portion of all individuals that can be hypothetically considered for observation. Nevertheless, as researchers, we want to draw more general conclusions that are valid beyond the restricted subset we are currently analyzing. Statistical significance testing is a fundamental pattern of data analysis that ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring software engineering data with formal concept analysis

    Publication Year: 2013, Page(s):14 - 16
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (251 KB) | HTML iconHTML

    Given the software engineering (SE) data, there does exist the binary relationship between entities and their properties within the data. Users are usually interested in their meaningful groupings of entities and properties. Formal concept analysis (FCA) is a powerful technique to deal with the binary relation between entities and entity properties to infer a hierarchy of concepts. The output of F... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extracting artifact lifecycle models from metadata history

    Publication Year: 2013, Page(s):17 - 19
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (212 KB) | HTML iconHTML

    Software developers and managers make decisions based on the understanding they have of their software systems. This understanding is both built up experientially and through investigating various software development artifacts. While artifacts can be investigated individually, being able to summarize characteristics about a set of development artifacts can be useful. In this paper we propose life... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measure what counts: An evaluation pattern for software data analysis

    Publication Year: 2013, Page(s):20 - 22
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (98 KB) | HTML iconHTML

    The `Measure what counts' pattern consists in evaluating software data analysis techniques against problem-specific measures related to cost and other stakeholders' goals instead of relying solely on generic metrics such as recall, precision, F-measure, and Receiver Operating Characteristic area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parametric classification over multiple samples

    Publication Year: 2013, Page(s):23 - 25
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (232 KB) | HTML iconHTML

    This pattern was originally designed to classify sequences of events in log files by error-proneness. Sequences of events trace application use in real contexts. As such, identifying error-prone sequences helps understand and predict application use. The classification problem we describe is typical in supervised machine learning, but the composite pattern we propose investigates it with several t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Patterns for cleaning up bug data

    Publication Year: 2013, Page(s):26 - 28
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (111 KB) | HTML iconHTML

    Bug reports provide insight about the quality of an evolving software and about its development process. Such data, however, is often incomplete and inaccurate, and thus should be cleaned before analysis. In this paper, we present patterns that help both novice and experienced data scientists to discard invalid bug data that could lead to wrong conclusions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Patterns for extracting high level information from bug reports

    Publication Year: 2013, Page(s):29 - 31
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (95 KB) | HTML iconHTML

    Bug reports record tasks performed by users and developers while collaborating to resolve bugs. Such data can be transformed into higher level information that helps data scientists understand various aspects of the team's development process. In this paper, we present patterns that show, step by step, how to extract higher level information about software verification from bug report data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural and temporal patterns-based features

    Publication Year: 2013, Page(s):32 - 34
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (156 KB) | HTML iconHTML

    In this paper, we propose a data transformation pattern to transform sequential data into a set of binary/categorical features and numerical features to enable data analysis. These features capture both structural and temporal information inherent in sequential data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The chunking pattern

    Publication Year: 2013, Page(s):35 - 37
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (138 KB) | HTML iconHTML

    Chunks are sets of code that have the property that a change that touches a chunk touches only that chunk. The pattern described in this paper defines chunks, indicates their usefulness, and provides an algorithm for calculating them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.