Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 8 • Date Aug. 2014

Filter Results

Displaying Results 1 - 21 of 21
  • A Review on Multi-Label Learning Algorithms

    Publication Year: 2014 , Page(s): 1819 - 1837
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2260 KB) |  | HTML iconHTML  

    Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made toward this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multi-label learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multi-label learning are outlined for reference purposes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Evolutionary Multiobjective Approach for Community Discovery in Dynamic Networks

    Publication Year: 2014 , Page(s): 1838 - 1852
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2312 KB) |  | HTML iconHTML  

    The discovery of evolving communities in dynamic networks is an important research topic that poses challenging tasks. Evolutionary clustering is a recent framework for clustering dynamic networks that introduces the concept of temporal smoothness inside the community structure detection method. Evolutionary-based clustering approaches try to maximize cluster accuracy with respect to incoming data of the current time step, and minimize clustering drift from one time step to the successive one. In order to optimize both these two competing objectives, an input parameter that controls the preference degree of a user towards either the snapshot quality or the temporal quality is needed. In this paper the detection of communities with temporal smoothness is formulated as a multiobjective problem and a method based on genetic algorithms is proposed. The main advantage of the algorithm is that it automatically provides a solution representing the best trade-off between the accuracy of the clustering obtained, and the deviation from one time step to the successive. Experiments on synthetic data sets show the very good performance of the method when compared with state-of-the-art approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Close Dominance Graph: An Efficient Framework for Answering Continuous Top- k Dominating Queries

    Publication Year: 2014 , Page(s): 1853 - 1865
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2039 KB) |  | HTML iconHTML  

    There are two preference-based queries commonly used in database systems: (1) top-k query and (2) skyline query. By combining the ranking rule used in top-(k) query and the notion of dominance relationships utilized in the skyline query, a top-(k) dominating query emerges, providing a new perspective on data processing. This query returns the (k) records with the highest domination scores from the dataset. However, the processing of the top-(k) dominating query is complex when the dataset operates under a streaming model. With new data being continuously generated while stale data being removed from the database, a continuous top-(k) dominating query (cTKDQ) requires that updated results can be returned to users at any time. This work explores the cTKDQ problem and proposes a unique indexing structure, called a Close Dominance Graph (CDG), to support the processing of a cTKDQ. The CDG provides comprehensive information regarding the dominance relationship between records, which is vital in answering a cTKDQ with a limited search space. The update process for a cTKDQ is then converted to a simple update affecting a small portion of the CDG. Experimental results show that this scheme is able to offer much better performance when compared with existing solutions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collaborative Online Multitask Learning

    Publication Year: 2014 , Page(s): 1866 - 1876
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1739 KB) |  | HTML iconHTML  

    We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is needed. This requirement leaves conventional batch learning algorithms out of consideration. Second, classical classification methods, be it batch or online, often encounter a dilemma when applied to a group of tasks, i.e., on one hand, a single classification model trained on the entire collection of data from all tasks may fail to capture characteristics of individual task; on the other hand, a model trained independently on individual tasks may suffer from insufficient training data. To overcome these challenges, in this paper, we propose a collaborative online multitask learning method, which learns a global model over the entire data of all tasks. At the same time, individual models for multiple related tasks are jointly inferred by leveraging the global model through a collaborative online learning approach. We illustrate the efficacy of the proposed technique on a synthetic dataset. We also evaluate it on three real-life problems-spam email filtering, bioinformatics data classification, and micro-blog sentiment detection. Experimental results show that our method is effective and scalable at the online classification of multiple related tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining Process Model Precision and Generalization with Weighted Artificial Negative Events

    Publication Year: 2014 , Page(s): 1877 - 1889
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1748 KB) |  | HTML iconHTML  

    Process mining encompasses the research area which is concerned with knowledge discovery from event logs. One common process mining task focuses on conformance checking, comparing discovered or designed process models with actual real-life behavior as captured in event logs in order to assess the “goodness” of the process model. This paper introduces a novel conformance checking method to measure how well a process model performs in terms of precision and generalization with respect to the actual executions of a process as recorded in an event log. Our approach differs from related work in the sense that we apply the concept of so-called weighted artificial negative events toward conformance checking, leading to more robust results, especially when dealing with less complete event logs that only contain a subset of all possible process execution behavior. In addition, our technique offers a novel way to estimate a process model's ability to generalize. Existing literature has focused mainly on the fitness (recall) and precision (appropriateness) of process models, whereas generalization has been much more difficult to estimate. The described algorithms are implemented in a number of ProM plugins, and a Petri net conformance checking tool was developed to inspect process model conformance in a visual manner. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Geometric Monitoring of Heterogeneous Streams

    Publication Year: 2014 , Page(s): 1890 - 1903
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1696 KB) |  | HTML iconHTML  

    Interest in stream monitoring is shifting toward the distributed case. In many applications the data is high volume, dynamic, and distributed, making it infeasible to collect the distinct streams to a central node for processing. Often, the monitoring problem consists of determining whether the value of a global function, defined on the union of all streams, crossed a certain threshold. We wish to reduce communication by transforming the global monitoring to the testing of local constraints, checked independently at the nodes. Geometric monitoring (GM) proved useful for constructing such local constraints for general functions. Alas, in GM the constraints at all nodes share an identical structure and are thus unsuitable for handling heterogeneous streams. Therefore, we propose a general approach for monitoring heterogeneous streams (HGM), which defines constraints tailored to fit the data distributions at the nodes. While we prove that optimally selecting the constraints is NP-hard, we provide a practical solution, which reduces the running time by hierarchically clustering nodes with similar data distributions and then solving simpler optimization problems. We also present a method for efficiently recovering from local violations at the nodes. Experiments yield an improvement of over an order of magnitude in communication relative to GM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Influence Spreading Path and Its Application to the Time Constrained Social Influence Maximization Problem and Beyond

    Publication Year: 2014 , Page(s): 1904 - 1917
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1951 KB) |  | HTML iconHTML  

    Influence maximization is a fundamental research problem in social networks. Viral marketing, one of its applications, is to get a small number of users to adopt a product, which subsequently triggers a large cascade of further adoptions by utilizing “Word-of-Mouth” effect in social networks. Time plays an important role in the influence spread from one user to another and the time needed for a user to influence another varies. In this paper, we propose the time constrained influence maximization problem. We show that the problem is NP-hard, and prove the monotonicity and submodularity of the time constrained influence spread function. Based on this, we develop a greedy algorithm. To improve the algorithm scalability, we propose the concept of Influence Spreading Path in social networks and develop a set of new algorithms for the time constrained influence maximization problem. We further parallelize the algorithms for achieving more time savings. Additionally, we generalize the proposed algorithms for the conventional influence maximization problem without time constraints. All of the algorithms are evaluated over four public available datasets. The experimental results demonstrate the efficiency and effectiveness of the algorithms for both conventional influence maximization problem and its time constrained version. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays

    Publication Year: 2014 , Page(s): 1918 - 1931
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1772 KB) |  | HTML iconHTML  

    The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. In this paper, we describe a new two-level suffix array-based index structure that requires significantly less disk space than previous approaches. Key to the saving is the use of disk blocks that are based on prefixes rather than the more usual uniform-sampling approach, allowing reductions between blocks and subparts of other blocks. We also describe a new in-memory structure-the condensed BWT- and show that it allows common patterns to be resolved without access to the text. Experiments using 64 GB of English web text on a computer with 4 GB of main memory demonstrate the speed and versatility of the new approach. For this data, the index is around one-third the size of previous two-level mechanisms; and the memory footprint of as little as 1% of the text size means that queries can be processed more quickly than is possible with a compact FM-INDEX. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Predictive Choice Models for Decision Optimization

    Publication Year: 2014 , Page(s): 1932 - 1945
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1345 KB) |  | HTML iconHTML  

    Probabilistic predictive models are often used in decision optimization applications. Optimal decision making in these applications critically depends on the performance of the predictive models, especially the accuracy of their probability estimates. In this paper, we propose a probabilistic model for revenue maximization and cost minimization across applications in which a decision making agent is faced with a group of possible customers and either offers a variable discount on a product or service or expends a variable cost to attract positive responses. The model is based directly on optimizing expected revenue and makes explicit the relationship between revenue and the customer's response behavior. We derive an expectation maximization (EM) procedure for learning the parameters of the model from historical data, prove that the model is asymptotically insensitive to selection bias in historical decisions, and demonstrate in a series of experiments the method's utility for optimizing financial aid decisions at an international institute of higher learning. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Meta-Blocking: Taking Entity Resolutionto the Next Level

    Publication Year: 2014 , Page(s): 1946 - 1960
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1091 KB) |  | HTML iconHTML  

    Entity Resolution is an inherently quadratic task that typically scales to large data collections through blocking. In the context of highly heterogeneous information spaces, blocking methods rely on redundancy in order to ensure high effectiveness at the cost of lower efficiency (i.e., more comparisons). This effect is partially ameliorated by coarse-grained block processing techniques that discard entire blocks either a-priori or during the resolution process. In this paper, we introduce meta-blocking as a generic procedure that intervenes between the creation and the processing of blocks, transforming an initial set of blocks into a new one with substantially fewer comparisons and equally high effectiveness. In essence, meta-blocking aims at extracting the most similar pairs of entities by leveraging the information that is encapsulated in the block-to-entity relationships. To this end, it first builds an abstract graph representation of the original set of blocks, with the nodes corresponding to entity profiles and the edges connecting the co-occurring ones. During the creation of this structure all redundant comparisons are discarded, while the superfluous ones can be removed by pruning of the edges with the lowest weight. We analytically examine both procedures, proposing a multitude of edge weighting schemes, graph pruning algorithms as well as pruning criteria. Our approaches are schema-agnostic, thus accommodating any type of blocks. We evaluate their performance through a thorough experimental study over three large-scale, real-world data sets, with the outcomes verifying significant efficiency enhancements at a negligible cost in effectiveness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Influence Propagation of Web Videos

    Publication Year: 2014 , Page(s): 1961 - 1973
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1781 KB) |  | HTML iconHTML  

    We propose a novel approach to analyze how a popular video is propagated in the cyberspace, to identify if it originated from a certain sharing-site, and to identify how it reached the current popularity in its propagation. In addition, we also estimate their influences across different websites outside the major hosting website. Web video is gaining significance due to its rich and eye-ball grabbing content. This phenomenon is evidently amplified and accelerated by the advance of Web 2.0. When a video receives some degree of popularity, it tends to appear on various websites including not only video-sharing websites but also news websites, social networks or even Wikipedia. Numerous video-sharing websites have hosted videos that reached a phenomenal level of visibility and popularity in the entire cyberspace. As a result, it is becoming more difficult to determine how the propagation took place - was the video a piece of original work that was intentionally uploaded to its major hosting site by the authors, or did the video originate from some small site then reached the sharing site after already getting a good level of popularity, or did it originate from other places in the cyberspace but the sharing site made it popular. Existing study regarding this flow of influence is lacking. Literature that discuss the problem of estimating a video's influence in the whole cyberspace also remains rare. In this article we introduce a novel framework to identify the propagation of popular videos from its major hosting site's perspective, and to estimate its influence. We define a Unified Virtual Community Space (UVCS) to model the propagation and influence of a video, and devise a novel learning method called Noise-reductive Local-and-Global Learning (NLGL) to effectively estimate a video's origin and influence. Without losing generality, we conduct experiments on annotated dataset collected from a major video sharing site to evaluate the effectiveness of the framework. - urrounding the collected videos and their ranks, some interesting discussions regarding the propagation and influence of videos as well as user behavior are also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Discovery of Gathering Patterns over Trajectories

    Publication Year: 2014 , Page(s): 1974 - 1988
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1068 KB) |  | HTML iconHTML  

    The increasing pervasiveness of location-acquisition technologies has enabled collection of huge amount of trajectories for almost any kind of moving objects. Discovering useful patterns from their movement behaviors can convey valuable knowledge to a variety of critical applications. In this light, we propose a novel concept, called gathering, which is a trajectory pattern modeling various group incidents such as celebrations, parades, protests, traffic jams and so on. A key observation is that these incidents typically involve large congregations of individuals, which form durable and stable areas with high density. In this work, we first develop a set of novel techniques to tackle the challenge of efficient discovery of gathering patterns on archived trajectory dataset. Afterwards, since trajectory databases are inherently dynamic in many real-world scenarios such as traffic monitoring, fleet management and battlefield surveillance, we further propose an online discovery solution by applying a series of optimization schemes, which can keep track of gathering patterns while new trajectory data arrive. Finally, the effectiveness of the proposed concepts and the efficiency of the approaches are validated by extensive experiments based on a real taxicab trajectory dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PASS: A Parallel Activity-Search System

    Publication Year: 2014 , Page(s): 1989 - 2001
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1870 KB) |  | HTML iconHTML  

    Given a set A of activities expressed via temporal stochastic automata, and a set O of observations (detections of low level events), we study the problem of identifying instances of activities from A in O. While past work has developed algorithms to solve this problem, in this paper, we develop methods to significantly scale these algorithms. Our PASS architecture consists of three parts: (i) leveraging past work to represent all activities in A via a single “merged” graph, (ii) partitioning the graph into a set of C subgraphs, where (C + 1) is the number of compute nodes in a cluster, and (iii) developing a parallel activity detection algorithm that uses a different compute node in the cluster to intensively process each subgraph. We propose three possible partitioning methods and a parallel activity-search detection (PASS_Detect) algorithm that coordinates computations across nodes in the cluster. We report on experiments showing that our algorithms enable us to handle both large numbers of observations per second as well as large merged graphs. In particular, on a cluster with 9 compute nodes, PASS can reliably handle between 400K and 569K observations per second and merged graphs with as many as 50K vertices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic Aspect Mining Model for Drug Reviews

    Publication Year: 2014 , Page(s): 2002 - 2013
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2047 KB) |  | HTML iconHTML  

    Recent findings show that online reviews, blogs, and discussion forums on chronic diseases and drugs are becoming important supporting resources for patients. Extracting information from these substantial bodies of texts is useful and challenging. We developed a generative probabilistic aspect mining model (PAMM) for identifying the aspects/topics relating to class labels or categorical meta-information of a corpus. Unlike many other unsupervised approaches or supervised approaches, PAMM has a unique feature in that it focuses on finding aspects relating to one class only rather than finding aspects for all classes simultaneously in each execution. This reduces the chance of having aspects formed from mixing concepts of different classes; hence the identified aspects are easier to be interpreted by people. The aspects found also have the property that they are class distinguishing: They can be used to distinguish a class from other classes. An efficient EM-algorithm is developed for parameter estimation. Experimental results on reviews of four different drugs show that PAMM is able to find better aspects than other common approaches, when measured with mean pointwise mutual information and classification accuracy. In addition, the derived aspects were also assessed by humans based on different specified perspectives, and PAMM was found to be rated highest. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Right-Protected Data Publishing with Provable Distance-Based Mining

    Publication Year: 2014 , Page(s): 2014 - 2028
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2157 KB) |  | HTML iconHTML  

    Protection of one's intellectual property is a topic with important technological and legal facets. We provide mechanisms for establishing the ownership of a dataset consisting of multiple objects. The algorithms also preserve important properties of the dataset, which are important for mining operations, and so guarantee both right protection and utility preservation. We consider a right-protection scheme based on watermarking. Watermarking may distort the original distance graph. Our watermarking methodology preserves important distance relationships, such as: the Nearest Neighbors (NN) of each object and the Minimum Spanning Tree (MST) of the original dataset. This leads to preservation of any mining operation that depends on the ordering of distances between objects, such as NN-search and classification, as well as many visualization techniques. We prove fundamental lower and upper bounds on the distance between objects post-watermarking. In particular, we establish a restricted isometry property, i.e., tight bounds on the contraction/expansion of the original distances. We use this analysis to design fast algorithms for NN-preserving and MST-preserving watermarking that drastically prune the vast search space. We observe two orders of magnitude speedup over the exhaustive schemes, without any sacrifice in NN or MST preservation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Evaluation of Trajectory Queries over Imprecise Location Data

    Publication Year: 2014 , Page(s): 2029 - 2044
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2484 KB) |  | HTML iconHTML  

    Trajectory queries, which retrieve nearby objects for every point of a given route, can be used to identify alerts of potential threats along a vessel route, or monitor the adjacent rescuers to a travel path. However, the locations of these objects (e.g., threats, succours) may not be precisely obtained due to hardware limitations of measuring devices, as well as complex natures of the surroundings. For such data, we consider a common model, where the possible locations of an object are bounded by a closed region, called “imprecise region”. Ignoring or coarsely wrapping imprecision can render low query qualities, and cause undesirable consequences such as missing alerts of threats and poor response rescue time. Also, the query is quite time-consuming, since all points on the trajectory are considered. In this paper, we study how to efficiently evaluate trajectory queries over imprecise objects, by proposing a novel concept, u-bisector, which is an extension of bisector specified for imprecise data. Based on the u-bisector, we provide an efficient and versatile solution which supports different shapes of commonly-used imprecise regions (e.g., rectangles, circles, and line segments). Extensive experiments on real datasets show that our proposal achieves better efficiency, quality, and scalability than its competitors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scaling Up Synchronization-Inspired Partitioning Clustering

    Publication Year: 2014 , Page(s): 2045 - 2057
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2030 KB) |  | HTML iconHTML  

    Based on the extensive Kuramoto model, synchronization-inspired partitioning clustering algorithm was recently proposed and is attracting more and more attentions, due to the fact that it simulates the synchronization phenomena in clustering where each data object is regarded as a phase oscillator and the dynamic behavior of the objects is simulated over time. In order to circumvent the serious difficulty that its existing version can only be effectively carried out on considerably small/medium datasets, a novel scalable synchronization-inspired partitioning clustering algorithm termed LSSPC, based on the center-constrained minimal enclosing ball and the reduced set density estimator, is proposed for large dataset applications. LSSPC first condenses a large scale dataset into its reduced dataset by using a fast minimal-enclosing-ball based approximation for the reduced set density estimator, thus achieving an asymptotic time complexity that is linear in the size of dataset and a space complexity that is independent of this size. Then it carries out clustering adaptively on the obtained reduced dataset by using Sync with the Davies-Bouldin clustering criterion and a new order parameter which can help us observe the degree of local synchronization. Finally, it finishes clustering by using the proposed algorithm CRD on the remaining objects in the large dataset, which can capture the outliers and isolated clusters effectively. The effectiveness of the proposed clustering algorithm LSSPC for large datasets is theoretically analyzed and experimentally verified by running on artificial and real datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trip Planner Over Probabilistic Time-Dependent Road Networks

    Publication Year: 2014 , Page(s): 2058 - 2071
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1384 KB) |  | HTML iconHTML  

    Recently, the management of transportation systems has become increasingly important in many real applications such as location-based services, supply chain management, traffic control, and so on. These applications usually involve queries over spatial road networks with dynamically changing and complicated traffic conditions. In this paper, we model such a network by a probabilistic time-dependent graph (PT-Graph), whose edges are associated with uncertain delay functions. We propose a useful query in the PT-Graph, namely a trip planner query (TPQ), which retrieves trip plans that traverse a set of query points in PT-Graph, having the minimum traveling time with high confidence. To tackle the efficiency issue, we present the pruning methods time interval pruning and probabilistic pruning to effectively rule out false alarms of trip plans. Furthermore, we design a pre-computation technique based on the cost model and construct an index structure over the pre-computed data to enable the pruning via the index. We integrate our proposed pruning methods into an efficient query procedure to answer TPQs. Through extensive experiments, we demonstrate the efficiency and effectiveness of our TPQ query answering approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pruning Incremental Linear Model Trees with Approximate Lookahead

    Publication Year: 2014 , Page(s): 2072 - 2076
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (687 KB) |  | HTML iconHTML  

    Incremental linear model trees with approximate lookahead are fast, but produce overly large trees. This is due to non-optimal splitting decisions boosted by a possibly unlimited number of examples obtained from a data source. To keep the processing speed high and the tree complexity low, appropriate incremental pruning techniques are needed. In this paper, we introduce a pruning technique for the class of incremental linear model trees with approximate lookahead on stationary data sources. Experimental results show that the advantage of approximate lookahead in terms of processing speed can be further improved by producing much smaller and consequently more explanatory, less memory consuming trees on high-dimensional data. This is done at the expense of only a small increase in prediction error. Additionally, the pruning algorithm can be tuned to either produce less accurate model trees at a much higher processing speed or, alternatively, more accurate trees at the expense of higher processing times. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Publication Year: 2014 , Page(s): 2077
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE
  • Rock Stars of Cybersecurity Conference [Advertisement]

    Publication Year: 2014 , Page(s): 2078
    Save to Project icon | Request Permissions | PDF file iconPDF (1862 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University

Associate Editor-in-Chief
Xuemin Lin
University of New South Wales

Associate Editor-in-Chief
Lei Chen
Hong Kong University of Science and Technology