By Topic

Measurement of turkish word semantic similarity and text categorization application

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
M. Fatih Amasyali ; Bilgisayar Mühendisli¿i Bölümü, Yildiz Teknik Üniversitesi, Turkey ; Aytunc Beken

In literature, texts to be classified are generally represented in the large dimensional bag of words space in which every dimension equals to a word or ngram. In this study, firstly the words are placed in a semantic space. The word's coordinates in semantic spaces needs the similarity of the words according to their meanings. Harris states that two words' semantic similarity is related to the number of documents which the words are both in. We used his hypothesis for Turkish words. Firstly, we obtained word co-occurrence matrix from a Web corpus. Then, the numerical coordinates of the words are calculated by using multi dimensional scaling. Texts coordinates are obtained from word coordinates which passes in the texts. In our experiments, Turkish news texts are classified into 5 classes. We get more successful results than the traditional bag of words space. Our approach is not for only Turkish words/texts, but also for all other languages.

Published in:

2009 IEEE 17th Signal Processing and Communications Applications Conference

Date of Conference:

9-11 April 2009