Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 10 • Date Oct. 2004

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2004 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (147 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2004 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (77 KB)  
    Freely Available from IEEE
  • An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces

    Publication Year: 2004 , Page(s): 1169 - 1184
    Cited by:  Papers (19)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1464 KB) |  | HTML iconHTML  

    Existing models for nearest neighbor search in multidimensional spaces are not appropriate for query optimization because they either lead to erroneous estimation or involve complex equations that are expensive to evaluate in real-time. This article proposes an alternative method that captures the performance of nearest neighbor queries using approximation. For uniform data, our model involves closed formulae that are very efficient to compute and accurate for up to 10 dimensions. Further, the proposed equations can be applied on nonuniform data with the aid of histograms. We demonstrate the effectiveness of the model by using it to solve several optimization problems related to nearest neighbor search. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-based image retrieval based on a fuzzy approach

    Publication Year: 2004 , Page(s): 1185 - 1199
    Cited by:  Papers (28)  |  Patents (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1184 KB) |  | HTML iconHTML  

    A typical content-based image retrieval (CBIR) system would need to handle the vagueness in the user queries as well as the inherent uncertainty in image representation, similarity measure, and relevance feedback. We discuss how fuzzy set theory can be effectively used for this purpose and describe an image retrieval system called FIRST (fuzzy image retrieval system) which incorporates many of these ideas. FIRST can handle exemplar-based, graphical-sketch-based, as well as linguistic queries involving region labels, attributes, and spatial relations. FIRST uses fuzzy attributed relational graphs (FARGs) to represent images, where each node in the graph represents an image region and each edge represents a relation between two regions. The given query is converted to a FARG, and a low-complexity fuzzy graph matching algorithm is used to compare the query graph with the FARGs in the database. The use of an indexing scheme based on a leader clustering algorithm avoids an exhaustive search of the FARG database. We quantify the retrieval performance of the system in terms of several standard measures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Managing deadline miss ratio and sensor data freshness in real-time databases

    Publication Year: 2004 , Page(s): 1200 - 1216
    Cited by:  Papers (50)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1584 KB) |  | HTML iconHTML  

    The demand for real-time data services is increasing in many applications including e-commerce, agile manufacturing, and telecommunications network management. In these applications, it is desirable to execute transactions within their deadlines, i.e., before the real-world status changes, using fresh (temporally consistent) data. However, meeting these fundamental requirements is challenging due to dynamic workloads and data access patterns in these applications. Further, transaction timeliness and data freshness requirements may conflict. We define average/transient deadline miss ratio and new data freshness metrics to let a database administrator specify the desired quality of real-time data services for a specific application. We also present a novel QoS management architecture for real-time databases to support the desired QoS even in the presence of unpredictable workloads and access patterns. To prevent overload and support the desired QoS, the presented architecture applies feedback control, admission control, and flexible freshness management schemes. A simulation study shows that our QoS-aware approach can achieve a near zero miss ratio and perfect freshness, meeting basic requirements for real-time transaction processing. In contrast, baseline approaches fail to support the desired miss ratio and/or freshness in the presence of unpredictable workloads and data access patterns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovery of PTUIE handling from source codes through recognizing its probable properties

    Publication Year: 2004 , Page(s): 1217 - 1231
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1096 KB) |  | HTML iconHTML  

    Automated recovery of system features and their designs from program source codes is important in reverse engineering and system comprehension. It also helps in the testing of software. An error that is made by users in an input to an execution of a transaction and discovered only after the completion of the execution is called a posttransaction user-input error (PTUIE) of the transaction. For a transaction in any database application, usually, it is essential to-provide transactions for correcting the effect that could result from any PTUIE of the transaction. We discover some probable properties that exist between the control flow graph of a transaction and the control flow graphs of transactions for correcting PTUIE of the former transaction. Through recognizing these properties, we present a novel approach for the automated approximate recovery of provisions and designs for transactions to correct PTUIE of transactions in a database application. The approach recognizes these properties through analyzing the source codes of transactions in the database application statically. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specifying mining algorithms with iterative user-defined aggregates

    Publication Year: 2004 , Page(s): 1232 - 1246
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    We present a way of exploiting domain knowledge in the design and implementation of data mining algorithms, with special attention to frequent patterns discovery, within a deductive framework. In our framework, domain knowledge is represented by way of deductive rules, and data mining algorithms are specified by means of iterative user-defined aggregates and implemented by means of user-defined predicates. This choice allows us to exploit the full expressive power of deductive rules without loosing in performance. Iterative user-defined aggregates have a fixed scheme, in which user-defined predicates are to be added. This feature allows the modularization of data mining algorithms, thus providing a way to integrate the proper domain knowledge exploitation in the right point. As a case study, we present how user-defined aggregates can be exploited to specify and implement a version of the a priori algorithm. Some performance analyzes and comparisons are discussed in order to show the effectiveness of the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient subspace sampling framework for high-dimensional data reduction, selectivity estimation, and nearest-neighbor search

    Publication Year: 2004 , Page(s): 1247 - 1262
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1168 KB) |  | HTML iconHTML  

    Data reduction can improve the storage, transfer time, and processing requirements of very large data sets. One of the challenges of designing effective data reduction techniques is to be able to preserve the ability to use the reduced format directly for a wide range of database and data mining applications. We propose the novel idea of hierarchical subspace sampling in order to create a reduced representation of the data. The method is naturally able to estimate the local implicit dimensionalities of each point very effectively and, thereby, create a variable dimensionality reduced representation of the data. Such a technique is very adaptive about adjusting its representation depending upon the behavior of the immediate locality of a data point. An important property of the subspace sampling technique is that the overall efficiency of compression improves with increasing database size. Because of its sampling approach, the procedure is extremely fast and scales linearly both with data set size and dimensionality. We propose new and effective solutions to problems such as selectivity estimation and approximate nearest-neighbor search. These are achieved by utilizing the locality specific subspace characteristics of the data which are revealed by the subspace sampling technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective and authentic third-party distribution of XML documents

    Publication Year: 2004 , Page(s): 1263 - 1278
    Cited by:  Papers (29)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1256 KB) |  | HTML iconHTML  

    Third-party architectures for data publishing over the Internet today are receiving growing attention, due to their scalability properties and to the ability of efficiently managing large number of subjects and great amount of data. In a third-party architecture, there is a distinction between the Owner and the Publisher of information. The Owner is the producer of information, whereas Publishers are responsible for managing (a portion of) the Owner information and for answering subject queries. A relevant issue in this architecture is how the Owner can ensure a secure and selective publishing of its data, even if the data are managed by a third-party, which can prune some of the nodes of the original document on the basis of subject queries and access control policies. An approach can be that of requiring the Publisher to be trusted with regard to the considered security properties. However, the serious drawback of this solution is that large Web-based systems cannot be easily verified to be secure and can be easily penetrated. For these reasons, we propose an alternative approach, based on the use of digital signature techniques, which does not require the Publisher to be trusted. The security properties we consider are authenticity and completeness of a query response, where completeness is intended with regard to the access control policies stated by the information Owner. In particular, we show that, by embedding in the query response one digital signature generated by the Owner and some hash values, a subject is able to locally verify the authenticity of a query response. Moreover, we present an approach that, for a wide range of queries, allows a subject to verify the completeness of query results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient phrase-based document indexing for Web document clustering

    Publication Year: 2004 , Page(s): 1279 - 1296
    Cited by:  Papers (68)  |  Patents (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1712 KB) |  | HTML iconHTML  

    Document clustering techniques mostly rely on single term analysis of the document data set, such as the vector space model. To achieve more accurate document clustering, more informative features including phrases and their weights are particularly important in such scenarios. Document clustering is particularly useful in many applications such as automatic categorization of documents, grouping search engine results, building a taxonomy of documents, and others. This article presents two key parts of successful document clustering. The first part is a novel phrase-based document index model, the document index graph, which allows for incremental construction of a phrase-based index of the document set with an emphasis on efficiency, rather than relying on single-term indexes only. It provides efficient phrase matching that is used to judge the similarity between documents. The model is flexible in that it could revert to a compact representation of the vector space model if we choose not to index phrases. The second part is an incremental document clustering algorithm based on maximizing the tightness of clusters by carefully watching the pair-wise document similarity distribution inside clusters. The combination of these two components creates an underlying model for robust and accurate document similarity calculation that leads to much improved results in Web document clustering over traditional methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Making the threshold algorithm access cost aware

    Publication Year: 2004 , Page(s): 1297 - 1301
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB) |  | HTML iconHTML  

    Assume a database storing N objects with d numerical attributes or feature values. All objects in the database can be assigned an overall score that is derived from their single feature values (and the feature values of a user-defined query). The problem considered here is then to efficiently retrieve the k objects with minimum (or maximum) overall score. The well-known threshold algorithm (TA) was proposed as a solution to this problem. TA views the database as a set of d sorted lists storing the feature values. Even though TA is optimal with regard to the number of accesses, its overall access cost can be high since, in practice, some list accesses may be more expensive than others. We therefore propose to make TA access cost aware by choosing the next list to access such that the overall cost is minimized. Our experimental results show that this overall cost is close to the optimal cost and significantly lower than the cost of prior approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient time-bound hierarchical key assignment scheme

    Publication Year: 2004 , Page(s): 1301 - 1304
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    The access privileges in distributed systems can be effectively organized as a hierarchical tree that consists of distinct classes. A hierarchical time-bound key assignment scheme is to assign distinct cryptographic keys to distinct classes according to their privileges so that users from a higher class can use their class key to derive the keys of lower classes, and the keys are different for each time period; therefore, key derivation is constrained by both the class relation and the time period. We propose, based on a tamper-resistant device, a new time-bound key assignment scheme that greatly improves the computational performance and reduces the implementation cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blocking reduction strategies in hierarchical text classification

    Publication Year: 2004 , Page(s): 1305 - 1308
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB) |  | HTML iconHTML  

    One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and extended multiplicative. Our experiments using support vector machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comments on "A practical (t, n) threshold proxy signature scheme based on the RSA cryptosystem"

    Publication Year: 2004 , Page(s): 1309 - 1311
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (87 KB)  

    In a (t, n) threshold proxy signature scheme, the original signer can delegate his/her signing capability to n proxy signers such that any t or more proxy signers can sign messages on behalf of the former, but t-1 or less of them cannot do the same thing. Such schemes have been suggested for use in a number of applications, particularly, in distributed computing where delegation of rights is quite common. Based on the RSA cryptosystem, [M. -S. Hwang et al. (2003) recently proposed an efficient (t, n) threshold proxy signature scheme. We identify several security weaknesses in their scheme and show that their scheme is insecure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Call For Papers

    Publication Year: 2004 , Page(s): 1312
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • TKDE Information for authors

    Publication Year: 2004 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (77 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2004 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (147 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University