By Topic

Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on

Date 22-24 Aug. 2007

Filter Results

Displaying Results 1 - 25 of 117
  • [Title page i]

    Publication Year: 2007, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (95 KB)
    Freely Available from IEEE
  • Sponsors

    Publication Year: 2007, Page(s): ii
    Request permission for commercial reuse | PDF file iconPDF (72 KB) | HTML iconHTML
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2007, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (152 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2007, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (79 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2007, Page(s):v - xii
    Request permission for commercial reuse | PDF file iconPDF (252 KB)
    Freely Available from IEEE
  • Message from General Chairs

    Publication Year: 2007, Page(s): xiii
    Request permission for commercial reuse | PDF file iconPDF (150 KB) | HTML iconHTML
    Freely Available from IEEE
  • Message from Program Chairs

    Publication Year: 2007, Page(s): xiv
    Request permission for commercial reuse | PDF file iconPDF (149 KB) | HTML iconHTML
    Freely Available from IEEE
  • Conference Chairs

    Publication Year: 2007, Page(s): xv
    Request permission for commercial reuse | PDF file iconPDF (152 KB)
    Freely Available from IEEE
  • Committees

    Publication Year: 2007, Page(s):xvi - xviii
    Request permission for commercial reuse | PDF file iconPDF (178 KB)
    Freely Available from IEEE
  • Keynotes and Tutorials

    Publication Year: 2007, Page(s): xix
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (111 KB)

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • E-mail Clustering Based on Profile and Multi-attribute Values

    Publication Year: 2007, Page(s):3 - 8
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3639 KB) | HTML iconHTML

    Although modern day people gather many data from the network, the users want only the information needed. Using this technology, the users can extract on the data that satisfy the query. As the previous studies use the single data in the document, frequency of the data for example, it cannot be considered as the effective data clustering method. What is needed is the effective clustering technolog... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Document Categorization Model Based on LSA and BPNN

    Publication Year: 2007, Page(s):9 - 14
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (277 KB) | HTML iconHTML

    This paper proposed a new document categorization model using the methods of latent semantic analysis (LSA) and back-propagation neural network (BPNN). In traditional word-matching based document categorization system, the most popular and straightforward approach to represent the document is vector space model (VSM). However, this approach has drawbacks. Firstly, because it needs a large number o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Dynamic SOM Algorithm for Clustering Large-Scale Document Collection

    Publication Year: 2007, Page(s):15 - 20
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (316 KB) | HTML iconHTML

    A dynamic SOM algorithm of incremental gradient descent to cluster large-scale document collection is proposed in this paper. In comparison with other SOM algorithms (e.g. GHSOM), the size of output layer in our algorithm can be gradually reduced and dynamically by inserting suitable number of neurons, thus the number of underutilized neurons can be reduced greatly and the training results of this... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of Web Clustering Based on Genetic Algorithm with Latent Semantic Indexing Technology

    Publication Year: 2007, Page(s):21 - 26
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (297 KB) | HTML iconHTML

    This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Kernel-based Sentiment Classification for Chinese Sentence

    Publication Year: 2007, Page(s):27 - 32
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (332 KB) | HTML iconHTML

    Recent years have seen a large growth in the online customer reviews. Classifying these reviews into positive or negative ones would be helpful in business intelligence applications and recommender systems. This paper aims to solve the sentiment classification at a fine-grained level, i.e. the sentence level. The challenging aspect of this problem that distinguishes it from the traditional classif... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Leveraging World Knowledge in Chinese Text Classification

    Publication Year: 2007, Page(s):33 - 38
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (159 KB) | HTML iconHTML

    In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades' endeavor, it seems that these approaches have all reached a plateau. In this paper, we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structure Analysis and Computation-Based Chinese Question Classification

    Publication Year: 2007, Page(s):39 - 44
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (337 KB) | HTML iconHTML

    Question classification is the key element of question answering system (QA). It governs answer extraction range and method, and further affects entire system performance. Most question classification methods are based on pattern sets or knowledge databases. Their disadvantages are that the set or database scale will become larger more and more, and need much human work. By comparing question clas... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Chinese Text Classification without Automatic Word Segmentation

    Publication Year: 2007, Page(s):45 - 50
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (135 KB) | HTML iconHTML

    Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Formalization of Four Types of "ZAI' Viewpoint Aspect Sentences

    Publication Year: 2007, Page(s):51 - 56
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (406 KB) | HTML iconHTML

    "ZAI' viewpoint aspect has long time been a burning problem with linguistic research. However, up till now, few studies on "ZAI' viewpoint aspect's derivations and their formalizations have been done. This paper addresses the formalization of "ZAI' progressive and its three types of derivation sentences by an automatic parsing using CTT(Copenhagen Tree Tracer). This paper first advances that "ZAI'... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Feature Space Expression to Analyze Dependency of Korean Clauses with a Composite Kernel

    Publication Year: 2007, Page(s):57 - 62
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (490 KB) | HTML iconHTML

    Analyzing of dependency relation among clauses is one of the most critical parts in parsing Korean sentences because it generates severe ambiguities. To get successful results of analyzing dependency relation, this task has been the target of various machine learning methods including SVM. Especially, kernel methods are usually used to analyze dependency relation and it is reported that they show ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multilayered Bilingual Word-Alignment Algorithm

    Publication Year: 2007, Page(s):63 - 68
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (321 KB) | HTML iconHTML

    Bilingual word alignment is to find the corresponding word-level translation between the source and the target language sentences. It is widely used in the area of natural language processing, like machine translation, cross-language information retrieval, bilingual dictionary compilation. However, bilingual word alignment is a very difficult task involving many challenges such as morphology, synt... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Template-Based English-Chinese Translation System Using FOPA and UAMRT

    Publication Year: 2007, Page(s):69 - 74
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (422 KB) | HTML iconHTML

    This paper presents a template-based English-Chinese translation system characterized by two important features: Fast Optimal Parsing Algorithm (FOPA) and Universal Algorithm of Matching and Replacing Templates (UAMRT). First, the FOPA parses an English sentence into an optimal parse tree or template structure quickly. Second, the UAMRT matches each source template with the optimal structure and r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Korean Spacing by Improving Viterbi Segmentation

    Publication Year: 2007, Page(s):75 - 80
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (249 KB) | HTML iconHTML

    This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word unigram language model is simple and fast, but has two problems: data sparseness and improper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Co... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Divide-Conquer Strategy for Both English and Chinese Text Chunking

    Publication Year: 2007, Page(s):81 - 86
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB) | HTML iconHTML

    The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, a divide-conquer strategy is proposed and applied in the identificati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Heuristic Approach for Segmentation Granularity Problem in Chinese Information Retrieval

    Publication Year: 2007, Page(s):87 - 91
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (323 KB) | HTML iconHTML

    In Chinese information retrieval, documents are usually segmented into words and then indexed by these words. However, segmentation granularity problem (SDP) should be considered because small granularity may lead to low precision and efficiency while big granularity may cause low recall. To solve the problem, this paper proposes an intuitive and heuristic approach. Two-level index for the segment... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.