By Topic

Reverse Engineering (WCRE), 2013 20th Working Conference on

Date 14-17 Oct. 2013

Filter Results

Displaying Results 1 - 25 of 67
  • [Front cover]

    Publication Year: 2013 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (159 KB)  
    Freely Available from IEEE
  • [Front matter]

    Publication Year: 2013 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Contents

    Publication Year: 2013 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (70 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 2013 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • Message from the Chairs

    Publication Year: 2013 , Page(s): iii - iv
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2013 , Page(s): v - ix
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Genetic programming for Reverse Engineering

    Publication Year: 2013 , Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (271 KB) |  | HTML iconHTML  

    This paper overviews the application of Search Based Software Engineering (SBSE) to reverse engineering with a particular emphasis on the growing importance of recent developments in genetic programming and genetic improvement for reverse engineering. This includes work on SBSE for remodularisation, refactoring, regression testing, syntax-preserving slicing and dependence analysis, concept assignment and feature location, bug fixing, and code migration. We also explore the possibilities for new directions in research using GP and GI for partial evaluation, amorphous slicing, and product lines, with a particular focus on code transplantation. This paper accompanies the keynote given by Mark Harman at the 20th Working Conference on Reverse Engineering (WCRE 2013). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The first decade of GUI ripping: Extensions, applications, and broader impacts

    Publication Year: 2013 , Page(s): 11 - 20
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (332 KB) |  | HTML iconHTML  

    This paper provides a retrospective examination of GUI Ripping - reverse engineering a workflow model of the graphical user interface of a software application - born a decade ago out of recognition of the severe need for improving the then largely manual state-of-the-practice of functional GUI testing. In these last 10 years, GUI ripping has turned out to be an enabler for much research, both within our group at Maryland and other groups. Researchers have found new and unique applications of GUI ripping, ranging from measuring human performance to re-engineering legacy user interfaces. GUI ripping has also enabled large-scale experimentation involving millions of test cases, thereby helping to understand the nature of GUI faults and characteristics of test cases to detect them. It has resulted in large multi-institutional Government-sponsored research projects on test automation and benchmarking. GUI ripping tools have been ported to many platforms, including Java AWT and Swing, iOS, Android, UNO, Microsoft Windows, and web. In essence, the technology has transformed the way researchers and practitioners think about the nature of GUI testing, no longer considered a manual activity; rather, thanks largely to GUI Ripping, automation has become the primary focus of current GUI testing techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reverse Engineering in Industry

    Publication Year: 2013 , Page(s): 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (81 KB)  

    This extended abstract gives a description of the panel “Reverse Engineering in Industry” which forms part of the 20th Working Conference on Reverse Engineering (WCRE 2013). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Who allocated my memory? Detecting custom memory allocators in C binaries

    Publication Year: 2013 , Page(s): 22 - 31
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB) |  | HTML iconHTML  

    Many reversing techniques for data structures rely on the knowledge of memory allocation routines. Typically, they interpose on the system's malloc and free functions, and track each chunk of memory thus allocated as a data structure. However, many performance-critical applications implement their own custom memory allocators. Examples include webservers, database management systems, and compilers like gcc and clang. As a result, current binary analysis techniques for tracking data structures fail on such binaries. We present MemBrush, a new tool to detect memory allocation and deallocation functions in stripped binaries with high accuracy. We evaluated the technique on a large number of real world applications that use custom memory allocators. As we show, we can furnish existing reversing tools with detailed information about the memory management API, and as a result perform an analysis of the actual application specific data structures designed by the programmer. Our system uses dynamic analysis and detects memory allocation and deallocation routines by searching for functions that comply with a set of generic characteristics of allocators and deallocators. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MemPick: High-level data structure detection in C/C++ binaries

    Publication Year: 2013 , Page(s): 32 - 41
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (309 KB) |  | HTML iconHTML  

    Many existing techniques for reversing data structures in C/C++ binaries are limited to low-level programming constructs, such as individual variables or structs. Unfortunately, without detailed information about a program's pointer structures, forensics and reverse engineering are exceedingly hard. To fill this gap, we propose MemPick, a tool that detects and classifies high-level data structures used in stripped binaries. By analyzing how links between memory objects evolve throughout the program execution, it distinguishes between many commonly used data structures, such as singly- or doubly-linked lists, many types of trees (e.g., AVL, red-black trees, B-trees), and graphs. We evaluate the technique on 10 real world applications and 16 popular libraries. The results show that MemPick can identify the data structures with high accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconstructing program memory state from multi-gigabyte instruction traces to support interactive analysis

    Publication Year: 2013 , Page(s): 42 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (266 KB) |  | HTML iconHTML  

    Exploitability analysis is the process of attempting to determine if a vulnerability in a program is exploitable. Fuzzing is a popular method of finding such vulnerabilities, in which a program is subjected to millions of generated program inputs until it crashes. Each program crash indicates a potential vulnerability that needs to be prioritized according to its potential for exploitation. The highest priority vulnerabilities need to be investigated by a security analyst by re-executing the program with the input that caused the crash while recording a trace of all executed assembly instructions and then performing analysis on the resulting trace. Recreating the entire memory state of the program at the time of the crash, or at any other point in the trace, is very important for helping the analyst build an understanding of the conditions that led to the crash. Unfortunately, tracing even a small program can create multimillion line trace files from which reconstructing memory state is a computationally intensive process and virtually impossible to do manually. In this paper we present an analysis of the problem of memory state reconstruction from very large execution traces. We report on a novel approach for reconstructing the entire memory state of a program from an execution trace that allows near realtime queries on the state of memory at any point in a program's execution trace. Finally we benchmark our approach showing storage and performance results in line with our theoretical calculations and demonstrate memory state query response times of less than 200ms for trace files up to 60 million lines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness

    Publication Year: 2013 , Page(s): 52 - 61
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (382 KB) |  | HTML iconHTML  

    Binary rewriting is the process of transforming executables by maintaining the original binary's functionality, while improving it in one or more metrics, such as energy use, memory use, security, or reliability. Although several technologies for rewriting binaries exist, static rewriting allows for arbitrarily complex transformations to be performed. Other technologies, such as dynamic or minimally-invasive rewriting, are limited in their transformation ability. We have designed the first static binary rewriter that guarantees 100% code coverage without the need for relocation or symbolic information. A key challenge in static rewriting is content classification (i.e. deciding what portion of the code segment is code versus data). Our contributions are (i) handling portions of the code segment with uncertain classification by using speculative disassembly in case it was code, and retaining the original binary in case it was data; (ii) drastically limiting the number of possible speculative sequences using a new technique called binary characterization; and (iii) avoiding the need for relocation or symbolic information by using call translation at usage points of code pointers (i.e. indirect control transfers), rather than changing addresses at address creation points. Extensive evaluation using stripped binaries for the entire SPEC 2006 benchmark suite (with over 1.9 million lines of code) demonstrates the robustness of the scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An incremental update framework for efficient retrieval from software libraries for bug localization

    Publication Year: 2013 , Page(s): 62 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1138 KB) |  | HTML iconHTML  

    Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques - Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate developer recommendation for bug resolution

    Publication Year: 2013 , Page(s): 72 - 81
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (423 KB) |  | HTML iconHTML  

    Bug resolution refers to the activity that developers perform to diagnose, fix, test, and document bugs during software development and maintenance. It is a collaborative activity among developers who contribute their knowledge, ideas, and expertise to resolve bugs. Given a bug report, we would like to recommend the set of bug resolvers that could potentially contribute their knowledge to fix it. We refer to this problem as developer recommendation for bug resolution. In this paper, we propose a new and accurate method named DevRec for the developer recommendation problem. DevRec is a composite method which performs two kinds of analysis: bug reports based analysis (BR-Based analysis), and developer based analysis (D-Based analysis). In the BR-Based analysis, we characterize a new bug report based on past bug reports that are similar to it. Appropriate developers of the new bug report are found by investigating the developers of similar bug reports appearing in the past. In the D-Based analysis, we compute the affinity of each developer to a bug report based on the characteristics of bug reports that have been fixed by the developer before. This affinity is then used to find a set of developers that are “close” to a new bug report. We evaluate our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 107,875 bug reports. We show that DevRec could achieve recall@5 and recall@10 scores of 0.4826-0.7989, and 0.6063-0.8924, respectively. We also compare DevRec with other state-of-art methods, such as Bugzie and DREX. The results show that DevRec on average improves recall@5 and recall@10 scores of Bugzie by 57.55% and 39.39% respectively. DevRec also outperforms DREX by improving the average recall@5 and recall@10 scores by 165.38% and 89.36%, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Has this bug been reported?

    Publication Year: 2013 , Page(s): 82 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (963 KB) |  | HTML iconHTML  

    Bug reporting is essentially an uncoordinated process. The same bugs could be repeatedly reported because users or testers are unaware of previously reported bugs. As a result, extra time could be spent on bug triaging and fixing. In order to reduce redundant effort, it is important to provide bug reporters with the ability to search for previously reported bugs. The search functions provided by the existing bug tracking systems are using relatively simple ranking functions, which often produce unsatisfactory results. In this paper, we adopt Ranking SVM, a Learning to Rank technique to construct a ranking model for effective bug report search. We also propose to use the knowledge of Wikipedia to discover the semantic relations among words and documents. Given a user query, the constructed ranking model can search for relevant bug reports in a bug tracking system. Unlike related works on duplicate bug report detection, our approach retrieves existing bug reports based on short user queries, before the complete bug report is submitted. We perform evaluations on more than 16,340 Eclipse and Mozilla bug reports. The evaluation results show that the proposed approach can achieve better search results than the existing search functions provided by Bugzilla and Lucene. We believe our work can help users and testers locate potential relevant bug reports more precisely. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic recovery of root causes from bug-fixing changes

    Publication Year: 2013 , Page(s): 92 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (722 KB) |  | HTML iconHTML  

    What is the root cause of this failure? This question is often among the first few asked by software debuggers when they try to address issues raised by a bug report. Root cause is the erroneous lines of code that cause a chain of erroneous program states eventually leading to the failure. Bug tracking and source control systems only record the symptoms (e.g., bug reports) and treatments of a bug (e.g., committed changes that fix the bug), but not its root cause. Many treatments contain non-essential changes, which are intermingled with root causes. Reverse engineering the root cause of a bug can help to understand why the bug is introduced and help to detect and prevent other bugs of similar causes. The recovered root causes are also better ground truth for bug detection and localization studies. In this work, we propose a combination of machine learning and code analysis techniques to identify root causes from the changes made to fix bugs. We evaluate the effectiveness of our approach based on a golden set (i.e., ground truth data) of manually recovered root causes of 200 bug reports from three open source projects. Our approach is able to achieve a precision, recall, and F-measure (i.e., the harmonic mean of precision and recall) of 76.42%, 71.88%, and 74.08% respectively. Compared with the work by Kawrykow and Robillard, our approach achieves a 60.83% improvement in F-measure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distilling useful clones by contextual differencing

    Publication Year: 2013 , Page(s): 102 - 111
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (954 KB) |  | HTML iconHTML  

    Clone detectors find similar code fragments and report large numbers of them for large systems. Textually similar clones may perform different computations, depending on the program context in which clones occur. Understanding these contextual differences is essential to distill useful clones for a specific maintenance task, such as refactoring. Manual analysis of contextual differences is time consuming and error-prone. To mitigate this problem, we present an automated approach to helping developers find and analyze contextual differences of clones. Our approach represents context of clones as program dependence graphs, and applies a graph differencing technique to identify required contextual differences of clones. We implemented a tool called CloneDifferentiator that identifies contextual differences of clones and allows developers to formulate queries to distill candidate clones that are useful for a given refactoring task. Two empirical studies show that CloneDifferentiator can reduce the efforts of post-detection analysis of clones for refactorings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effects of cloned code on software maintainability: A replicated developer study

    Publication Year: 2013 , Page(s): 112 - 121
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (356 KB) |  | HTML iconHTML  

    Code clones are a common occurrence in most software systems. Their presence is believed to have an effect on the maintenance process. Although these effects have been previously studied, there is not yet a conclusive result. This paper describes an extended replication of a controlled experiment (i.e. a strict replication with an additional task) that analyzes the effects of cloned bugs (i.e. bugs in cloned code) on the program comprehension of programmers. In the strict replication portion, the study participants attempted to isolate and fix two types of bugs, cloned and non-cloned, in one of two small systems. In the extension of the original study, we provided the participants with a clone report describing the location of all cloned code in the other system and asked them to again isolate and fix cloned and non-cloned bugs. The results of the original study showed that cloned bugs were not significantly more difficult to maintain than non-cloned bugs. Conversely, the results of the replication showed that it was significantly more difficult to correctly fix a cloned bug than a non-cloned bug. But, there was no significant difference in the amount of time required to fix a cloned bug vs. a non-cloned bug. Finally, the results of the study extension showed that programmers performed significantly better when given clone information than without clone information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The influence of non-technical factors on code review

    Publication Year: 2013 , Page(s): 122 - 131
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (789 KB) |  | HTML iconHTML  

    When submitting a patch, the primary concerns of individual developers are “How can I maximize the chances of my patch being approved, and minimize the time it takes for this to happen?” In principle, code review is a transparent process that aims to assess qualities of the patch by their technical merits and in a timely manner; however, in practice the execution of this process can be affected by a variety of factors, some of which are external to the technical content of the patch itself. In this paper, we describe an empirical study of the code review process for WebKit, a large, open source project; we replicate the impact of previously studied factors - such as patch size, priority, and component and extend these studies by investigating organizational (the company) and personal dimensions (reviewer load and activity, patch writer experience) on code review response time and outcome. Our approach uses a reverse engineered model of the patch submission process and extracts key information from the issue tracking and code review systems. Our findings suggest that these nontechnical factors can significantly impact code review outcomes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding project dissemination on a social coding site

    Publication Year: 2013 , Page(s): 132 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (495 KB) |  | HTML iconHTML  

    Popular social coding sites like GitHub and BitBucket are changing software development. Users follow some interesting developers, listen to their activities and find new projects. Social relationships between users are utilized to disseminate projects, attract contributors and increase the popularity. A deep understanding of project dissemination on social coding sites can provide important insights into questions of project diffusion characteristics and into the improvement of the popularity. In this paper, we seek a deeper understanding of project dissemination in GitHub. We collect 2,665 projects and 272,874 events. Moreover, we crawl 747,107 developers and 2,234,845 social links to construct social graphs. We analyze topological characteristics and reciprocity of social graphs. We then study the speed and the range of project dissemination, and the role of social links. Our main observations are: (1) Social relationships are not reciprocal. (2) The popularity increases gradually for a long time. (3) Projects spread to users far away from their creators. (4) Social links play a notable role of project dissemination. These results can be leveraged to increase the popularity. Specifically, we suggest that project owners should (1) encourage experienced developers to choose some promising new developers, follow them in return and provide guidance. (2) promote projects for a long time. (3) advertise projects to a wide range of developers. (4) fully utilize social relationships to advertise projects and attract contributors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What help do developers seek, when and how?

    Publication Year: 2013 , Page(s): 142 - 151
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (442 KB) |  | HTML iconHTML  

    Software development often requires knowledge beyond what developers already possess. In such cases, developers have to seek help from different sources of information. As a metacognitive skill, help seeking influences software developers' efficiency and success in many situations. However, there has been little research to provide a systematic investigation of the general process of help seeking activities in software engineering and human and system factors affecting help seeking. This paper reports our empirical study aiming to fill this gap. Our study includes two human experiments, involving 24 developers and two typical software development tasks. Our study gathers empirical data that allows us to provide an in-depth analysis of help-seeking task structures, task strategies, information sources, process model, and developers' information needs and behaviors in seeking and using help information and in managing information during help seeking. Our study provides a detailed understanding of help seeking activities in software engineering, the challenges that software developers face, and the limitations of existing tool support. This can lead to the design and development of more efficient and usable help seeking support that helps developers become better help seekers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards understanding how developers spend their effort during maintenance activities

    Publication Year: 2013 , Page(s): 152 - 161
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (555 KB) |  | HTML iconHTML  

    For many years, researchers and practitioners have strived to assess and improve the productivity of software development teams. One key step toward achieving this goal is the understanding of factors affecting the efficiency of developers performing development and maintenance activities. In this paper, we aim to understand how developers' spend their effort during maintenance activities and study the factors affecting developers' effort. By knowing how developers' spend their effort and which factors affect their effort, software organisations will be able to take the necessary steps to improve the efficiency of their developers, for example, by providing them with adequate program comprehension tools. For this preliminary study, we mine 2,408 developers' interaction histories and 3,395 patches from four open-source software projects (ECF, Mylyn, PDE, Eclipse Platform). We observe that usually, the complexity of the implementation required for a task does not reflect the effort spent by developers on the task. Most of the effort appears to be spent during the exploration of the program. In average, 62% of files explored during the implementation of a task are not significantly relevant to the final implementation of the task. Developers who explore a large number of files that are not significantly relevant to the solution to their task take a longer time to perform the task. We expect that the results of this study will pave the way for better program comprehension tools to guide developers during their explorations of software systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Leveraging specifications of subcomponents to mine precise specifications of composite components

    Publication Year: 2013 , Page(s): 162 - 171
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB) |  | HTML iconHTML  

    Specifications play an important role in many software engineering activities. Despite their usefulness, formal specifications are often unavailable in practice. Specification mining techniques try to automatically recover specifications from existing programs. Unfortunately, mined specifications are often overly general, which hampers their applications in the downstream analysis and testing. Nowadays, programmers develop software systems by utilizing existing components that usually have some available specifications. However, benefits of these available specifications are not explored by current specification miners. In this paper, we propose an approach to leverage available specifications of subcomponents to improve the precision of specifications of the composite component mined by state-based mining techniques. We monitor subcomponents against their specifications during the mining process and use states that are reached to construct abstract states of the composite component. Our approach makes subcomponents' states encoded within their specifications visible to their composite component, and improves the precision of mined specifications by effectively increasing the number of their states. The empirical evaluation shows that our approach can significantly improve the precision of mined specifications by removing erroneous behavior without noticeable loss of recall. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A model-driven graph-matching approach for design pattern detection

    Publication Year: 2013 , Page(s): 172 - 181
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB) |  | HTML iconHTML  

    In this paper an approach to automatically detect Design Patterns (DPs) in Object Oriented systems is presented. It allows to link system's source code components to the roles they play in each pattern. DPs are modelled by high level structural properties (e.g. inheritance, dependency, invocation, delegation, type nesting and membership relationships) that are checked against the system structure and components. The proposed metamodel also allows to define DP variants, overriding the structural properties of existing DP models, to improve detection quality. The approach was validated on an open benchmark containing several open-source systems of increasing sizes. Moreover, for other two systems, the results have been compared with the ones from a similar approach existing in literature. The results obtained on the analyzed systems, the identified variants and the efficiency and effectiveness of the approach are thoroughly presented and discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.