By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 4 • Date July-Aug. 2003

Filter Results

Displaying Results 1 - 25 of 27
  • Guest editors' introduction

    Page(s): 769 - 770
    Save to Project icon | Request Permissions | PDF file iconPDF (209 KB)  
    Freely Available from IEEE
  • The Subgraph Bisimulation Problem

    Page(s): 1055 - 1056
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (211 KB) |  | HTML iconHTML  

    We study the complexity of the Subgraph Bisimulation Problem, which relates to Graph Bisimulation as Subgraph Isomorphism relates to Graph Isomorphism, and we prove its NP-Completeness. Our analysis is motivated by its applications to semistructured databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image representations and feature selection for multimedia database search

    Page(s): 911 - 920
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1511 KB) |  | HTML iconHTML  

    The success of a multimedia information system depends heavily on the way the data is represented. Although there are "natural" ways to represent numerical data, it is not clear what is a good way to represent multimedia data, such as images, video, or sound. We investigate various image representations where the quality of the representation is judged based on how well a system for searching through an image database can perform-although the same techniques and representations can be used for other types of object detection tasks or multimedia data analysis problems. The system is based on a machine learning method used to develop object detection models from example images that can subsequently be used for examples to detect-search-images of a particular object in an image database. As a base classifier for the detection task, we use support vector machines (SVM), a kernel based learning method. Within the framework of kernel classifiers, we investigate new image representations/kernels derived from probabilistic models of the class of images considered and present a new feature selection method which can be used to reduce the dimensionality of the image representation without significant losses in terms of the performance of the detection-search-system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using optimistic atomic broadcast in transaction processing systems

    Page(s): 1018 - 1032
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2509 KB) |  | HTML iconHTML  

    Atomic broadcast primitives are often proposed as a mechanism to allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed for atomic broadcast which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery to the application. We develop this idea further and show how applications can take even more advantage of the optimistic assumption by overlapping the coordination phase of the atomic broadcast algorithm with the processing of delivered messages. In particular, we present a replicated database architecture that employs the new atomic broadcast primitive in such a way that communication and transaction processing are fully overlapped, providing high performance without relaxing transaction correctness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stability analysis of regional and national voting schemes by a continuous model

    Page(s): 1037 - 1042
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (382 KB) |  | HTML iconHTML  

    The previous discrete-model-based stability analysis of regional and national voting has been extended to a continuous-model-based analysis in the simultaneous presence of white and concentrated components of noise, reconfirming the previous conclusion that regional voting with smaller sized regions always demonstrates an improved stability over those with larger sized regions, including the national voting in its limiting case in particular. The conclusion remains valid as long as the weak distribution assumption is valid. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Yin/Yang Web: a unified model for XML syntax and RDF semantics

    Page(s): 797 - 812
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1816 KB) |  | HTML iconHTML  

    XML is the W3C standard document format for writing and exchanging information on the Web. RDF is the W3C standard model for describing the semantics and reasoning about information on the Web. Unfortunately, RDF and XML-although very close to each other-are based on two different paradigms. We argue that, in order to lead the Semantic Web to its full potential, the syntax and the semantics of information need to work together. To this end, we develop a model theory for the XML XQuery 1.0 and XPath 2.0 Data Model, which provides a unified model for both XML and RDF. This unified model can serve as the basis for Web applications that deal with both data and semantics. We illustrate the use of this model on a concrete information integration scenario. Our approach enables each side of the fence to benefit from the other, notably, we show how the RDF world can take advantage of XML Schema description and XML query languages, and how the XML world can take advantage of the reasoning capabilities available for RDF. Our approach can also serve as a foundation for the next layer of the Semantic Web, the ontology layer, and we present a layering of an ontology language on top of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-demand forecasting of stock prices using a real-time predictor

    Page(s): 1033 - 1037
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (545 KB) |  | HTML iconHTML  

    This paper presents a fuzzy stochastic prediction method for real-time predicting of stock prices. A complete contrast to the crisp stochastic method, it requires a fuzzy linguistic summary approach to computing parameters. This approach, which is found to be better than the gray prediction method, can eliminate outliers and limit the data to a normal condition for prediction, with a comparatively very small deviation of 4.5 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temporal probabilistic object bases

    Page(s): 921 - 939
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (920 KB) |  | HTML iconHTML  

    There are numerous applications where we have to deal with temporal uncertainty associated with objects. The ability to automatically store and manipulate time, probabilities, and objects is important. We propose a data model and algebra for temporal probabilistic object bases (TPOBs), which allows us to specify the probability with which an event occurs at a given time point. In explicit TPOB-instances, the sets of time points along with their probability intervals are explicitly enumerated. In implicit TPOB-instances, sets of time points are expressed by constraints and their probability intervals by probability distribution functions. Thus, implicit object base instances are succinct representations of explicit ones; they allow for an efficient implementation of algebraic operations, while their explicit counterparts make defining algebraic operations easy. We extend the relational algebra to both explicit and implicit instances and prove that the operations on implicit instances correctly implement their counterpart on explicit instances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query expansion by mining user logs

    Page(s): 829 - 839
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (836 KB) |  | HTML iconHTML  

    Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Progressive partition miner: an efficient algorithm for mining general temporal association rules

    Page(s): 1004 - 1017
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2205 KB) |  | HTML iconHTML  

    We explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of association rule mining is not able to handle the publication database due to the following fundamental problems, i.e., 1) lack of consideration of the exhibition period of each individual item and 2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm progressive-partition-miner (abbreviated as PPM) to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. Algorithm PPM is also designed to employ a filtering threshold in each partition to early prune out those cumulatively infrequent 2-itemsets. The feature that the number of candidate 2-itemsets generated by PPM is very close to the number of frequent 2-itemsets allows us to employ the scan reduction technique to effectively reduce the number of database scans. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by other competitive schemes that are directly extended from existing methods. The correctness of PPM is proven and some of its theoretical properties are derived. Sensitivity analysis of various parameters is conducted to provide many insights into Algorithm PPM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applying automatically derived gene-groups to automatically predict and refine metabolic pathways

    Page(s): 883 - 894
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5244 KB) |  | HTML iconHTML  

    This paper describes an automated technique to predict integrated pathways and refine existing metabolic pathways using the information of automatically derived, functionally similar gene-groups and orthologs (functionally equivalent genes) derived by the comparison of complete microbial genomes archived in GenBank. The described method integrates automatically derived orthologous and homologous gene-groups (http://www.mcs.kent.edu/∼arvind/orthos.html) with the biochemical pathway template available at the KEGG database (http://www.genome.ad.jp), the enzyme information derived from the SwissProt enzyme database (http:// expasys.hcuge.ch/), and the Ligand database (http://www.genome.ad.jp). The technique refines existing pathways (based upon the network of reactions of enzymes) by associating corresponding nonenzymatic and regulatory proteins to enzymes and operons and by identifying substituting homologs. The technique is suitable for building and refining integrated pathways using evolutionary diverse organisms. A methodology and the corresponding algorithm are presented. The technique is illustrated by comparing the genomes of E coli and B. subtilis with M. tuberculosis. The findings about integrated pathways are briefly discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Peculiarity oriented multidatabase mining

    Page(s): 952 - 960
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    Peculiarity rules are a new class of rules which can be discovered by searching relevance among a relatively small number of peculiar data. Peculiarity oriented mining in multiple data sources is different from, and complementary to, existing approaches for discovering new, surprising, and interesting patterns hidden in data. A theoretical framework for peculiarity oriented mining is presented. Within the proposed framework, we give a formal interpretation and comparison of three classes of rules, namely, association rules, exception rules, and peculiarity rules, as well as describe how to mine interesting peculiarity rules in multiple databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • External sorting: run formation revisited

    Page(s): 961 - 972
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1929 KB) |  | HTML iconHTML  

    External mergesort begins with a run formation phase creating the initial sorted runs. Run formation can be done by a load-sort-store algorithm or by replacement selection. A load-sort-store algorithm repeatedly fills available memory with input records, sorts them, and writes the result to a run file. Replacement selection produces longer runs than load-sort-store algorithms and completely overlaps sorting and I/O, but it has poor locality of reference resulting in frequent cache misses and the classical algorithm works only for fixed-length records. This paper introduces batched replacement selection: a cache-conscious version of replacement selection that works also for variable-length records. The new algorithm resembles AlphaSort in the sense that it creates small in-memory runs and merges them to form the output runs. Its performance is experimentally compared with three other run formation algorithms: classical replacement selection, Quicksort, and AlphaSort. The experiments show that batched replacement selection is considerably faster than classic replacement selection. For small records (average 100 bytes), CPU time was reduced by about 50 percent and elapsed time by 47-63 percent. It was also consistently faster than Quicksort, but it did not always outperform AlphaSort. Replacement selection produces fewer runs than Quicksort and AlphaSort. The experiments confirmed that this reduces the merge time whereas the effect on the overall sort time depends on the number of disks available. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Managing and sharing servants' reputations in P2P systems

    Page(s): 840 - 854
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1875 KB) |  | HTML iconHTML  

    Peer-to-peer information sharing environments are increasingly gaining acceptance on the Internet as they provide an infrastructure in which the desired information can be located and downloaded while preserving the anonymity of both requestors and providers. As recent experience with P2P environments such as Gnutella shows, anonymity opens the door to possible misuses and abuses by resource providers exploiting the network as a way to spread tampered-with resources, including malicious programs, such as Trojan Horses and viruses. We propose an approach to P2P security where servants can keep track, and share with others, information about the reputation of their peers. Reputation sharing is based on a distributed polling algorithm by which resource requestors can assess the reliability of perspective providers before initiating the download. The approach complements existing P2P protocols and has a limited impact on current implementations. Furthermore, it keeps the current level of anonymity of requestors and providers, as well as that of the parties sharing their view on others' reputations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specifying and enforcing application-level Web security policies

    Page(s): 771 - 783
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB) |  | HTML iconHTML  

    Application-level Web security refers to vulnerabilities inherent in the code of a Web-application itself (irrespective of the technologies in which it is implemented or the security of the Web-server/back-end database on which it is built). In the last few months, application-level vulnerabilities have been exploited with serious consequences: Hackers have tricked e-commerce sites into shipping goods for no charge, usernames and passwords have been harvested, and confidential information (such as addresses and credit-card numbers) has been leaked. We investigate new tools and techniques which address the problem of application-level Web security. We 1) describe a scalable structuring mechanism facilitating the abstraction of security policies from large Web-applications developed in heterogeneous multiplatform environments; 2) present a set of tools which assist programmers in developing secure applications which are resilient to a wide range of common attacks; and 3) report results and experience arising from our implementation of these techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image retrieval based on regions of interest

    Page(s): 1045 - 1049
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1876 KB) |  | HTML iconHTML  

    Query-by-example is the most popular query model in recent content-based image retrieval (CBIR) systems. A typical query image includes relevant objects (e.g., Eiffel Tower), but also irrelevant image areas (including background). The irrelevant areas limit the effectiveness of existing CBIR systems. To overcome this limitation, the system must be able to determine similarity based on relevant regions alone. We call this class of queries region-of-interest (ROI) queries and propose a technique for processing them in a sampling-based matching framework. A new similarity model is presented and an indexing technique for this new environment is proposed. Our experimental results confirm that traditional approaches, such as Local Color Histogram and Correlogram, suffer from the involvement of irrelevant regions. Our method can handle ROI queries and provide significantly better performance. We also assessed the performance of the proposed indexing technique. The results clearly show that our retrieval procedure is effective for large image data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable consistency maintenance in content distribution networks using cooperative leases

    Page(s): 813 - 828
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3193 KB) |  | HTML iconHTML  

    We argue that cache consistency mechanisms designed for stand-alone proxies do not scale to the large number of proxies in a content distribution network and are not flexible enough to allow consistency guarantees to be tailored to object needs. To meet the twin challenges of scalability and flexibility, we introduce the notion of cooperative consistency along with a mechanism, called cooperative leases, to achieve it. By supporting Δ-consistency semantics and by using a single lease for multiple proxies, cooperative leases allow the notion of leases to be applied in a flexible, scalable manner to CDNs. Further, the approach employs application-level multicast to propagate server notifications to proxies in a scalable manner. We implement our approach in the Apache Web server and the Squid proxy cache and demonstrate its efficacy using a detailed experimental evaluation. Our results show a factor of 2.5 reduction in server message overhead and a 20 percent reduction in server state space overhead when compared to original leases albeit at an increased interproxy communication overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buffer queries

    Page(s): 895 - 910
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6753 KB) |  | HTML iconHTML  

    A class of commonly asked queries in a spatial database is known as buffer queries. An example of such a query is to "find house-power line pairs that are within 50 meters of each other." A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distance d of each other. Given nonpoint spatial objects, evaluation of buffer queries could be a costly operation, even when the numbers of objects in the input data sets are relatively small. This paper addresses the problem of how to evaluate this class of queries efficiently. A fundamental problem with buffer query evaluation is to find an efficient algorithm for solving the minimum distance (miniDist) problem for lines and regions. An efficient minDist algorithm, which only requires a subsequence of segments from each object to be examined, is derived. Finding a fast minDist algorithm is the first step in evaluating a buffer query efficiently. It is observed that many, and sometimes even most, candidates can be proven in the answer without resorting to the relatively expensive minDist operation. A candidate is first evaluated with a least expensive technique-called O-object filtering. If it fails, a more costly operation, called 1-object filtering, is applied. Finally, if both filterings fail, the most expensive minDist algorithm is invoked. To show the effectiveness of the these techniques, they are incorporated into the well-known tree join algorithm and tested with real-life as well as artificial data sets. Extensive experiments show that the proposed algorithm outperforms existing techniques by a wide margin in both execution time as well as IO accesses. More importantly, the performance gain improves drastically with the increase of distance values. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effectively finding relevant Web pages from linkage information

    Page(s): 940 - 951
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1073 KB) |  | HTML iconHTML  

    This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A polynomial algorithm for optimal univariate microaggregation

    Page(s): 1043 - 1044
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (215 KB) |  | HTML iconHTML  

    Microaggregation is a technique used by statistical agencies to limit disclosure of sensitive microdata. Noting that no polynomial algorithms are known to microaggregate optimally, Domingo-Ferrer and Mateo-Sanz have presented heuristic microaggregation methods. This paper is the first to present an efficient polynomial algorithm for optimal univariate microaggregation. Optimal partitions are shown to correspond to shortest paths in a network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An approach for measuring semantic similarity between words using multiple information sources

    Page(s): 871 - 882
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1066 KB) |  | HTML iconHTML  

    Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On agent-mediated electronic commerce

    Page(s): 985 - 1003
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    This paper surveys and analyzes the state of the art of agent-mediated electronic commerce (e-commerce), concentrating particularly on the business-to-consumer (B2C) and business-to-business (B2B) aspects. From the consumer buying behavior perspective, agents are being used in the following activities: need identification, product brokering, buyer coalition formation, merchant brokering, and negotiation. The roles of agents in B2B e-commerce are discussed through the business-to-business transaction model that identifies agents as being employed in partnership formation, brokering, and negotiation. Having identified the roles for agents in B2C and B2B e-commerce, some of the key underpinning technologies of this vision are highlighted. Finally, we conclude by discussing the future directions and potential impediments to the wide-scale adoption of agent-mediated e-commerce. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Security of Tzeng's time-bound key assignment scheme for access control in a hierarchy

    Page(s): 1054 - 1055
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB) |  | HTML iconHTML  

    Tzeng (2002) proposed a time-bound cryptographic key assignment scheme for access control in a partial-order hierarchy. In this paper, we show that Tzeng's scheme is insecure against the collusion attack whereby three users conspire to access some secret class keys that they should not know according to Tzeng's scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In-place reconstruction of version differences

    Page(s): 973 - 984
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1632 KB)  

    In-place reconstruction of differenced data allows information on devices with limited storage capacity to be updated efficiently over low-bandwidth channels. Differencing encodes a version of data compactly as a set of changes from a previous version. Transmitting updates to data as a version difference saves both time and bandwidth. In-place reconstruction rebuilds the new version of the data in the storage or memory the current version occupies-no scratch space is needed for a second version. By combining these technologies, we support highly mobile applications on space-constrained hardware. We present an algorithm that modifies a differentially encoded version to be in-place reconstructible. The algorithm trades a small amount of compression to achieve this property. Our treatment includes experimental results that show our implementation to be efficient in space and time and verify that compression losses are small. Also, we give results on the computational complexity of performing this modification while minimizing lost compression. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Searching with numbers

    Page(s): 855 - 870
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4423 KB) |  | HTML iconHTML  

    A large fraction of the useful Web is comprised of specification documents that largely consist of (attribute name, numeric value) pairs embedded in text. Examples include product information, classified advertisements, resumes, etc. The approach taken in the past to search these documents by first establishing correspondences between values and their names has achieved limited success because of the difficulty of extracting this information from free text. We propose a new approach that does not require this correspondence to be accurately established. Provided the data has "low reflectivity", we can do effective search even if the values in the data have not been assigned attribute names and the user has omitted attribute names in the query. We give algorithms and indexing structures for implementing the search. We also show how hints (i.e., imprecise, partial correspondences) from automatic data extraction techniques can be incorporated into our approach for better accuracy on high reflectivity data sets. Finally, we validate our approach by showing that we get high precision in our answers on real data sets from a variety of domains. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University