By Topic

Information Reuse and Integration (IRI), 2010 IEEE International Conference on

Date 4-6 Aug. 2010

Filter Results

Displaying Results 1 - 25 of 88
  • Message from Program Co-Chairs

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (78 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Forward

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (72 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • International Technical Program Committee

    Publication Year: 2010 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (77 KB)  
    Freely Available from IEEE
  • Precisiation of meaning—toward computation with natural language

    Publication Year: 2010 , Page(s): 1 - 4
    Cited by:  Papers (2)  |  Patents (2)
    Save to Project icon | Request Permissions | PDF file iconPDF (189 KB)  
    Freely Available from IEEE
  • Conference organizers

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB)  
    Freely Available from IEEE
  • Panel title: Critical need for funding of basic and applied research in large-scale computing

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (106 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2010 , Page(s): 1 - 9
    Save to Project icon | Request Permissions | PDF file iconPDF (182 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 2010 , Page(s): 1 - 6
    Save to Project icon | Request Permissions | PDF file iconPDF (105 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (175 KB)  
    Freely Available from IEEE
  • A message-based interoperability framework with application to astrophysics

    Publication Year: 2010 , Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    Many software applications in astrophysics lack the capabilities to directly exchange data or share their functionality. The proposed framework utilizes a light-weight messaging technique based on the Simple Application Messaging Protocol (SAMP) to enable software to participate in a collaborative system by sharing data and services between one another. A built-in mechanism allows users to non-programmatically create shared services based on application-internal functionality. The messaging components establish communication, handle the service propagation and discovery phases, and manage the service requests between participating applications. The effort needed to integrate the framework into existing applications is minimized by insourcing all application-independent processes and by offering language-specific handlers that can be used to interface with the framework within the host-application itself. The feasibility of the proposed framework is analyzed by integrating it into SolarSoftware (SSW) and JHelioviewer, simulating a typical use-case scenario. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective API navigation and reuse

    Publication Year: 2010 , Page(s): 7 - 12
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (170 KB) |  | HTML iconHTML  

    Most reuse libraries come with few source-code examples that demonstrate how the library at hand should be used. We have developed a source-code recommendation approach for constructing and delivering relevant code snippets that programmers can use to complete a certain programming task. Our approach is semantic-based; relying on an explicit ontological representation of source-code. We argue that such representation opens new doors for an improved recommendation mechanism that ensures relevancy and accuracy. Current recommendation systems require an existing repository of relevant code samples. However, for many libraries, such a repository does not exist. Therefore, we instead utilize points-to analysis to infer precise type information of library components. We have backed our approach with a tool that has been tested on multiple libraries. The obtained results are promising and demonstrate the effectiveness of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empirical evaluation of active sampling for CRF-based analysis of pages

    Publication Year: 2010 , Page(s): 13 - 18
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (414 KB) |  | HTML iconHTML  

    We propose an automatic method of extracting bibliographies for academic articles scanned with OCR markup. The method uses conditional random fields (CRF) for labeling serially OCR-ed text lines on an article's title page as appropriate names for bibliographic elements. Although we achieved excellent extraction accuracies for some Japanese academic journals, we needed a substantial amount of training data that had to be obtained through costly manual extraction of bibliographies from printed documents. Therefore, this paper reports an empirical evaluation of active sampling applied to the CRF-based extraction of bibliographies to reduce the amount of training data. We applied active sampling techniques to three academic journals published in Japan. The experiments revealed that the sampling strategy using the proposed criteria for selecting samples could reduce the amount of training data to less than half or even a third of those for two academic journals. This paper also reports the effect of pseudo-training data that were added to training. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workflow management of simulation based computation processes in transportation domain

    Publication Year: 2010 , Page(s): 19 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    Simulation based computational analysis is a key approach for system design and evaluation in many domains. These computation processes are often complex and experience-based, requiring domain scientists and engineers to deal with heterogeneous simulation models and applications in an integrated manner. As domain users employ simulations for increasingly sophisticated studies, management of the computation tasks and avalanche of legacy models becomes a barrier to decision making and high quality research. In this paper, we present a semantic approach to simulation based computation process modeling and management in transportation domain. The workflow representation of processes, semantic descriptions of underlying models and applications and processing these information form the basis of our approach. We have implemented this approach in a prototype system and used it in real-world applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving query suggestion by utilizing user intent

    Publication Year: 2010 , Page(s): 25 - 30
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (373 KB) |  | HTML iconHTML  

    In this paper, we introduce a query suggestion approach to reuse users' search context and search logs. For a given search log, we integrate two pieces of wisdom embedded in the search context: consecutive queries and reformulation patterns between consecutive queries. When providing suggestions online, we extract concepts that represent the user's intent and associate these concepts with wisdom attained from past users who had similar search intents. Finally, customized suggestions are provided according to the current user's search pattern. The experimental results demonstrate that the proposed approach outperforms existing query suggestion methods and effectively provides users with more accurate suggestions to help them get required information faster. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating the impact of data quality on sampling

    Publication Year: 2010 , Page(s): 31 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (482 KB) |  | HTML iconHTML  

    Three important data characteristics that can substantially impact a data mining project are class imbalance, poor data quality and the size of the training dataset. Data sampling is a commonly used method for improving learner performance when data is imbalanced. However, little effort has been put forth to investigate the performance of data sampling techniques when data is both noisy and imbalanced. In this work, we present a comprehensive empirical investigation of how data sampling techniques react to changes in four training dataset characteristics: dataset size, class distribution, noise level and noise distribution. We present the performance of four common data sampling techniques using 11 learning algorithms. The results, which are based on an extensive suite of experiments for which over 15 million models were trained and evaluated, show that data sampling can be very effective at dealing with the combined problems of noise and imbalance. In addition, the dataset characteristics which have the greatest impact on each of the data sampling techniques are identified. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heuristic based approach to clustering and its time critical applications

    Publication Year: 2010 , Page(s): 37 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (385 KB) |  | HTML iconHTML  

    Clustering may be named as the first clustering technique addressed by the research community since 1960s. However, as databases continue to grow in size, numerous research studies have been undertaken to develop more efficient clustering algorithms and to improve the performance of existing ones. This paper demonstrates a general optimization technique applicable to clustering algorithms with a need to calculate distances and check them against some minimum distance condition. The optimization technique is a simple calculation that finds the minimum possible distance between two points, and checks this distance against the minimum distance condition; thus reusing already computed values and reducing the need to compute a more complicated distance function periodically. The proposed optimization technique has been applied to the agglomerative hierarchical clustering, k-means clustering, and DBSCAN algorithms with successful results. Runtimes for all three algorithms with this optimization scenario were reduced, and the clusters they returned were verified to remain the same as the original algorithms. The optimization technique also shows potential for reducing runtimes by a substantial amount for large databases. As well, the optimization technique shows potential for reducing runtimes more and more as databases grow larger and larger. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparative study of filter-based feature ranking techniques

    Publication Year: 2010 , Page(s): 43 - 48
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Active learning with neural networks for intrusion detection

    Publication Year: 2010 , Page(s): 49 - 54
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (379 KB) |  | HTML iconHTML  

    This paper presents a neural-network-based active learning procedure for computer network intrusion detection. Applying data mining and machine learning techniques to network intrusion detection often faces the problem of very large training dataset size. For example, the training dataset commonly used for the DARPA KDD-1999 offline intrusion detection project contained approximately five hundred thousand (10% sample of the original five million) observations, which were used to build intrusion detection classification models. The practical problems associated with such a large dataset include very long model training times, redundant information, and increased complexity in understanding the domain-specific data. We demonstrate that a simple active learning procedure can dramatically reduce the size of the training data, without significantly sacrificing the classification accuracy of the intrusion detection model. A case study of the DARPA KDD-1999 intrusion detection project is used in our work. The network traffic instances are classified into one of two categories - normal and attack. A comparison of the actively trained neural network model with a C4.5 decision tree indicated that the actively learned model had better generalization accuracy. In addition, the training data classification performance of the actively learned model was comparable to that of the C4.5 decision tree. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic multicast height balance key agreements for secure multicast communication in ad hoc networks

    Publication Year: 2010 , Page(s): 55 - 58
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    Due to the network topology of an ad hoc network changes frequently and unpredictable, the security of multicast routing becomes more challenging than the traditional networks. In this paper, we describes a dynamic multicast height balanced group key agreement that allows a user in a multicast group to dynamically compose the group key and securely deliver multicast data for a multicast source to the other multicast group users in wireless ad hoc networks. The proposed hierarchical structure partitions the group members into location based clusters, capable of reducing the cost of communication and key management when member joins or leave networks. Moreover, based on the Diffie-Hellman key management, the proposed scheme not only provides efficient and rapid dynamic group key reconstructions and secure multicast data transmissions, but also fits the robustness of the wireless networks and lowers overhead costs of security management. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Secure information exchange

    Publication Year: 2010 , Page(s): 59 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (532 KB) |  | HTML iconHTML  

    Ad hoc routing protocols have been designed to be more and more efficient without keeping security in mind. This makes them vulnerable to a variety of attacks which affect the reliability of data transmission. We propose in this paper a new a secure protocol based on the hash chains to provide a high level of security. The proposed protocol is analyzed using the NS-2 simulator. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model based detection of implied scenarios in multi agent systems

    Publication Year: 2010 , Page(s): 63 - 68
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (488 KB) |  | HTML iconHTML  

    Multi-agent systems (MAS) are efficient solutions for commercial applications such as information retrieval and search. In a MAS, agents are usually designed with distribution of functionality and control. Lack of central control implies that the quality of service of MAS may be degraded because of possible unwanted behavior at the runtime, commonly known as emergent behavior. Detecting and removing emergent behavior during the design phase of MAS will lead to huge savings in deployment costs of such systems. An effective approach for the MAS design is to describe system requirements using scenarios. A scenario, commonly known as a message sequence chart or a sequence diagram, is a temporal sequence of messages sent between agents. In this paper a method for detecting emergent behavior of MAS by detecting incompleteness and partial description of scenarios is proposed. The method is explained along with a prototype MAS for semantic search that blends the search and ontological concept learning. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • k-NN based LS-SVM framework for long-term time series prediction

    Publication Year: 2010 , Page(s): 69 - 74
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (593 KB) |  | HTML iconHTML  

    Long-term time series prediction is to predict the future values multi-step ahead. It has received more and more attention due to its applications in predicting stock prices, traffic status, power consumption, etc. In this paper, a k-nearest neighbors (k-NN) based least squares support vector machine (LS-SVM) framework is proposed to perform long-term time series prediction. A new distance function, which integrates the Euclidean distance and the dissimilarity of the trend of a time series, is defined for the k-NN approach. By selecting similar instances (i.e., nearest neighbors) in the training dataset for each testing instance based on the k-NN approach, the complexity of training an LS-SVM regressor is reduced significantly. Experiments on two types of datasets were conducted to compare the prediction performance of the proposed framework with the traditional LS-SVM approach and the LL-MIMO (Multi-Input Multi-Output Local Learning) approach at the prediction horizon 20. The experimental results demonstrate that the proposed framework outperforms both traditional LS-SVM approach and LL-MIMO approach in prediction. Furthermore, experimental results also show the promising long-term prediction ability of the proposed framework even when the prediction horizon is large (up to 180). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aggregating music recommendation Web APIs by artist

    Publication Year: 2010 , Page(s): 75 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (455 KB) |  | HTML iconHTML  

    Through user accounts, music recommendations are refined by user-supplied genres and artists preferences. Music recommendation is further complicated by multiple genre artists, artist collaborations and artist similarity identification. We focus primarily on artist similarity in which we propose a rank fusion solution. We aggregate the most similar artist ranking from Idiomag, Last.fm and Echo Nest. Through an experimental evaluation of 300 artist queries, we compare five rank fusion algorithms and how each fusion method could impact the retrieval of established, new or cross-genre music artists. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel feature selection technique for highly imbalanced data

    Publication Year: 2010 , Page(s): 80 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (375 KB) |  | HTML iconHTML  

    Two challenges often encountered in data mining are the presence of excessive features in a data set and unequal numbers of examples in the two classes in a binary classification problem. In this paper, we propose a novel approach to feature selection for imbalanced data in the context of software quality engineering. This technique consists of a repetitive process of data sampling followed by feature ranking and finally aggregating the results generated during the repetitive process. This repetitive feature selection method is compared with two other approaches: one uses a filter-based feature ranking technique alone on the original data, while the other uses the data sampling and feature ranking techniques together only once. The empirical validation is carried out on two groups of software data sets. The results demonstrate that our proposed repetitive feature selection method performs on average significantly better than the other two approaches, especially when the data set is highly imbalanced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Class compactness for data clustering

    Publication Year: 2010 , Page(s): 86 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (396 KB) |  | HTML iconHTML  

    In this paper we introduce a compactness based clustering algorithm. The compactness of a data class is measured by comparing the inter-subset and intra-subset distances. The class compactness of a subset is defined as the ratio of the two distances. A subset is called an isolated cluster (or icluster) if its class compactness is greater than 1. All iclusters make a containment tree. We introduce monotonic sequences of iclusters to simplify the structure of the icluster tree, based on which a clustering algorithm is designed. The algorithm has the following advantages: it is effective on data sets with clusters nonlinearly separated, of arbitrary shapes, or of different densities. The effectiveness of the algorithm is demonstrated by experiments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.