Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Data Mining (ICDM), 2011 IEEE 11th International Conference on

Date 11-14 Dec. 2011

Filter Results

Displaying Results 1 - 25 of 162
  • [Front cover]

    Publication Year: 2011 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (216 KB)  
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2011 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (76 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2011 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (141 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2011 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (118 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2011 , Page(s): v - xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (169 KB)  
    Freely Available from IEEE
  • Message from the Conference General Chairs

    Publication Year: 2011 , Page(s): xv - xvi
    Save to Project icon | Request Permissions | PDF file iconPDF (76 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Co-Chairs

    Publication Year: 2011 , Page(s): xvii - xviii
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2011 , Page(s): xix - xx
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE
  • Steering Committee

    Publication Year: 2011 , Page(s): xxi
    Save to Project icon | Request Permissions | PDF file iconPDF (79 KB)  
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2011 , Page(s): xxii - xxix
    Save to Project icon | Request Permissions | PDF file iconPDF (117 KB)  
    Freely Available from IEEE
  • List of Reviewers

    Publication Year: 2011 , Page(s): xxx - xxxiii
    Save to Project icon | Request Permissions | PDF file iconPDF (73 KB)  
    Freely Available from IEEE
  • Sponsors

    Publication Year: 2011 , Page(s): xxxiv - xxxv
    Save to Project icon | Request Permissions | PDF file iconPDF (128 KB)  
    Freely Available from IEEE
  • Algorithms for Mining the Evolution of Conserved Relational States in Dynamic Networks

    Publication Year: 2011 , Page(s): 1 - 10
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2467 KB) |  | HTML iconHTML  

    Dynamic networks have recently being recognized as a powerful abstraction to model and represent the temporal changes and dynamic aspects of the data underlying many complex systems. Significant insights regarding the stable relational patterns among the entities can be gained by analyzing temporal evolution of the complex entity relations. This can help identify the transitions from one conserved state to the next and may provide evidence to the existence of external factors that are responsible for changing the stable relational patterns in these networks. This paper presents a new data mining method that analyzes the time-persistent relations or states between the entities of the dynamic networks and captures all maximal non-redundant evolution paths of the stable relational states. Experimental results based on multiple datasets from real world applications show that the method is efficient and scalable. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Infrastructure Pattern Discovery in Configuration Management Databases via Large Sparse Graph Mining

    Publication Year: 2011 , Page(s): 11 - 20
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (471 KB) |  | HTML iconHTML  

    A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their inter-relationships. Mining such graphs is challenging because they are large, complex, and multi-attributed, and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of sub graph isomorphism (for support counting), and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns, and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Role-Behavior Analysis from Trajectory Data by Cross-Domain Learning

    Publication Year: 2011 , Page(s): 21 - 30
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1014 KB) |  | HTML iconHTML  

    Behavior analysis using trajectory data presents a practical and interesting challenge for KDD. Conventional analyses address discriminative tasks of behaviors, e.g., classification and clustering typically using the subsequences extracted from the trajectory of an object as a numerical feature representation. In this paper, we explore further to identify the difference in the high-level semantics of behaviors such as roles and address the task in a cross-domain learning approach. The trajectory, from which the features are sampled, is intuitively viewed as a domain, and we assume that its intrinsic structure is characterized by the underlying role associated with the tracked object. We propose a novel hybrid method of spectral clustering and density approximation for comparing clustering structures of two independently sampled trajectory data and identifying patterns of behaviors unique to a role. We present empirical evaluations of the proposed method in two practical settings using real-world robotic trajectories. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semi-supervised Feature Importance Evaluation with Ensemble Learning

    Publication Year: 2011 , Page(s): 31 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (234 KB) |  | HTML iconHTML  

    We consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in high dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature importance evaluation method (SSFI for short), that combines ideas from co-training and random forests with a new permutation-based out-of-bag feature importance measure. We provide empirical results on several benchmark datasets indicating that SSFI can lead to significant improvement over state-of-the-art semi-supervised and supervised algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

    Publication Year: 2011 , Page(s): 41 - 50
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (259 KB) |  | HTML iconHTML  

    COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a sub sample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point, this can reduce evaluation cost by 100X or more. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overlapping Correlation Clustering

    Publication Year: 2011 , Page(s): 51 - 60
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4157 KB) |  | HTML iconHTML  

    We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mapping so that the distances between data points agree as much as possible with distances taken over their label sets. To define distances between label sets, we consider two measures: a set-intersection indicator function and the Jaccard coefficient. To solve the main optimization problem we propose a local-search algorithm. The iterative step of our algorithm requires solving non-trivial optimization sub problems, which, for the measures of set-intersection and Jaccard, we solve using a greedy method and non-negative least squares, respectively. Since our frameworks uses pair wise similarities of objects as the input, it lends itself naturally to the task of clustering structured objects for which feature vectors can be difficult to obtain. As a proof of concept we show how easily our framework can be applied in two different complex application domains. Firstly, we develop overlapping clustering of animal trajectories, obtaining zoologically meaningful results. Secondly, we apply our framework for overlapping clustering of proteins based on pair wise similarities of amino acid sequences, outperforming the of state-of-the-art method in matching a ground truth taxonomy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning

    Publication Year: 2011 , Page(s): 61 - 70
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1075 KB) |  | HTML iconHTML  

    Transductive transfer learning is one special type of transfer learning problem, in which abundant labeled examples are available in the source domain and only unlabeled examples are available in the target domain. It easily finds applications in spam filtering, microblogging mining and so on. In this paper, we propose a general framework to solve the problem by mapping the input features in both the source domain and target domain into a shared latent space and simultaneously minimizing the feature reconstruction loss and prediction loss. We develop one specific example of the framework, namely latent large-margin transductive transfer learning (LATTL) algorithm, and analyze its theoretic bound of classification loss via Rademacher complexity. We also provide a unified view of several popular transfer learning algorithms under our framework. Experiment results on one synthetic dataset and three application datasets demonstrate the advantages of the proposed algorithm over the other state-of-the-art ones. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Confidence in Predictions from Random Tree Ensembles

    Publication Year: 2011 , Page(s): 71 - 80
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    Obtaining an indication of confidence of predictions is desirable for many data mining applications. Such confidence levels, together with the predicted value, can inform on the certainty or extent of reliability that may be associated with the prediction. This can be useful, for example, where model outputs are used in making potentially costly decisions, and one may then focus on the higher confidence predictions, and in general across risk sensitive applications. The conformal prediction framework presents a novel approach for complementing predictions from machine learning algorithms with valid confidence measures. Confidence levels are obtained from the underlying algorithm, using a non-conformity measure which indicates how 'atypical' a given example set is. The non-conformity measure is key to determining the usefulness and efficiency of the approach. This paper considers inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems. Focusing on classification tasks, and considering realistic data contexts including class imbalance, we develop non-conformity measures for assessing the confidence of predicted class labels from random forests. We examine the performance of these measures on multiple datasets. Results demonstrate the usefulness and validity of the measures, their relative differences, and highlight the effectiveness of conformal prediction random forests for obtaining predictions with associated confidence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Heavy Subgraphs in Time-Evolving Networks

    Publication Year: 2011 , Page(s): 81 - 90
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (677 KB) |  | HTML iconHTML  

    Networks from different genres are not static entities, but exhibit dynamic behavior. The congestion level of links in transportation networks varies in time depending on the traffic. Similarly, social and communication links are employed at varying rates as information cascades unfold. In recent years there has been an increase of interest in modeling and mining dynamic networks. However, limited attention has been placed in high-scoring sub graph discovery in time-evolving networks. We define the problem of finding the highest-scoring temporal sub graph in a dynamic network, termed Heaviest Dynamic Sub graph (HDS). We show that HDS is NP-hard even with edge weights in {-1,1} and devise an efficient approach for large graph instances that evolve over long time periods. While a naive approach would enumerate all O(t2) sub-intervals, our solution performs an effective pruning of the sub-interval space by considering O(t·log(t)) groups of sub-intervals and computing an aggregate of each group in logarithmic time. We also define a fast heuristic and a tight upper bound for approximating the static version of HDS, and use them for further pruning the sub-interval space and quickly verifying candidate sub-intervals. We perform an extensive experimental evaluation of our algorithm on transportation, communication and social media networks for discovering sub graphs that correspond to traffic congestions, communication overflow and localized social discussions. Our method is two orders of magnitude faster than a naive approach and scales well with network size and time length. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Class L2,1-Norm Support Vector Machine

    Publication Year: 2011 , Page(s): 91 - 100
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (291 KB) |  | HTML iconHTML  

    Feature selection is an essential component of data mining. In many data analysis tasks where the number of data point is much less than the number of features, efficient feature selection approaches are desired to extract meaningful features and to eliminate redundant ones. In the previous study, many data mining techniques have been applied to tackle the above challenging problem. In this paper, we propose a new ℓ2,1-norm SVM, that is, multi-class hinge loss with a structured regularization term for all the classes to naturally select features for multi-class without bothering further heuristic strategy. Rather than directly solving the multi-class hinge loss with ℓ2,1-norm regularization minimization, which has not been solved before due to its optimization difficulty, we are the first to give an efficient algorithm bridging the new problem with a previous solvable optimization problem to do multi-class feature selection. A global convergence proof for our method is also presented. Via the proposed efficient algorithm, we select features across multiple classes with jointly sparsity, i.e., each feature has either small or large score over all classes. Comprehensive experiments have been performed on six bioinformatics data sets to show that our method can obtain better or competitive performance compared with exiting state-of-art multi-class feature selection approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SolarMap: Multifaceted Visual Analytics for Topic Exploration

    Publication Year: 2011 , Page(s): 101 - 110
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1214 KB) |  | HTML iconHTML  

    Documents in rich text corpora often contain multiple facets of information. For example, an article from a medical document collection might consist of multifaceted information about symptoms, treatments, causes, diagnoses, prognoses, and preventions. Thus, documents in the collection may have different relations across each of these various facets. Topic analysis and exploration for such multi-relational corpora is a challenging visual analytic task. This paper presents Solar Map, a multifaceted visual analytic technique for visually exploring topics in multi-relational data. Solar Map simultaneously visualizes the topic distribution of the underlying entities from one facet together with keyword distributions that convey the semantic definition of each cluster along a secondary facet. Solar Map combines several visual techniques including 1) topic contour clusters and interactive multifaceted keyword topic rings, 2) a global layout optimization algorithm that aligns each topic cluster with its corresponding keywords, and 3) 2) an optimal temporal network segmentation and layout method that renders temporal evolution of clusters. Finally, the paper concludes with two case studies and quantitative user evaluation which show the power of the Solar Map technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficiently Mining Unordered Trees

    Publication Year: 2011 , Page(s): 111 - 120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (499 KB) |  | HTML iconHTML  

    Frequent tree patterns have many applications in different domains such as XML document mining, user web log analysis, network routing and bioinformatics. In this paper, we first introduce three new tree encodings and accordingly present an efficient algorithm for finding frequent patterns from rooted unordered trees with the assumption that children of every node in database trees are identically labeled. Then, we generalize the method and propose the UITree algorithm to find frequent patterns from rooted unordered trees without any restriction. Compared to other algorithms in the literature, UItree manages occurrences of a candidate tree in database trees more efficiently. Our extensive experiments on both real and synthetic datasets show that UITree significantly outperforms the most efficient existing works on mining unordered trees. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CEMiner -- An Efficient Algorithm for Mining Closed Patterns from Time Interval-Based Data

    Publication Year: 2011 , Page(s): 121 - 130
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (410 KB) |  | HTML iconHTML  

    The mining of closed sequential patterns has attracted researchers for its capability of using compact results to preserve the same expressive power as conventional mining. However, existing studies only focus on time point-based data. Few research efforts have elaborated on discovering closed sequential patterns from time interval-based data, where each data persists for a period of time. Mining closed time interval-based patterns, also called closed temporal patterns, is an arduous problem since the pair wise relationships between two interval-based events are intrinsically complex. In this paper, an efficient algorithm, CEMiner is developed to discover closed temporal patterns from interval-based data. Algorithm CEMiner employs some optimization techniques to effectively reduce the search space. The experimental results on both synthetic and real datasets indicate that CEMiner not only significantly outperforms the prior interval-based mining algorithms in terms of execution time but also possesses graceful scalability. The experiment conducted on real dataset shows the practicability of time interval-based closed pattern mining. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.