By Topic

Data Mining and Optimization (DMO), 2012 4th Conference on

Date 2-4 Sept. 2012

Filter Results

Displaying Results 1 - 25 of 34
  • [Cover]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (1445 KB)  
    Freely Available from IEEE
  • [Title page]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (179 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (43 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): iii - iv
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • Preface

    Page(s): v
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE
  • Committee

    Page(s): vi - vii
    Save to Project icon | Request Permissions | PDF file iconPDF (23 KB)  
    Freely Available from IEEE
  • Optimization: What does it actually mean?

    Page(s): viii
    Save to Project icon | Request Permissions | PDF file iconPDF (20 KB)  
    Freely Available from IEEE
  • Spatial and temporal analysis of deforestation and forest degradation in Selangor: Implication to carbon stock above ground

    Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • Spatial and temporal analysis of deforestation and forest degradation in Selangor: Implication to carbon stock above ground

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2414 KB) |  | HTML iconHTML  

    This paper aims to develop an operational methodology for monitoring spatial and temporal changes due to deforestation in Selangor over a 22 year period. The driving forces determining the changes were also analysed. Overall, the results show that the causes of deforestation were the economic factors, namely agriculture intensification, and population dynamics, related to the process of urbanization. However, deforestation statistics shows only a total of 10 percent decrease; it is the degradation of the remaining forest that is the major concern. Knowledge on deforestation and its driving forces in Selangor is very important as it provides the basis for the calculation of the total amount of carbon stock above ground. It also gives insight into the appropriate intervention measures that can be taken to increase carbon stock, thus reducing the release of carbon dioxide emission to the atmosphere. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Topic detections in Arabic Dark websites using improved Vector Space Model

    Page(s): 6 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (940 KB)  

    Terrorist group's forums remain a threat for all web users. It stills need to be inspired with algorithms to detect the informative contents. In this paper, we investigate most discussed topics on Arabic Dark Web forums. Arabic Textual contents extracted from selected Arabic Dark Web forums. Vector Space Model (VSM) used as text representation with two different term weighing schemas, Term Frequency (TF) and Term Frequency - Inverse Document Frequency (TF-IDF). Pre-processing phase plays a significant role in processing extracted terms. That consists of filtering, tokenization and stemming. Stemming step is based on proposed stemmer without a root dictionary. Using one of the well-know clustering algorithm k-means to cluster of the terms. The experimental results were presented and showed the most shared terms between the selected forums. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiobjective genetic algorithm-based method for job shop scheduling problem: Machines under preventive and corrective maintenance activities

    Page(s): 13 - 17
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB) |  | HTML iconHTML  

    In this paper we consider a multiobjective job shop scheduling problem. The machines are subject to availability constraints that are due to preventive maintenance, machine breakdowns or tool replacement. Two optimization criteria were considered; the makespan for the jobs and the total cost for the maintenance activities. The job shop scheduling problem without considering the availability constraints is known to be NP-Hard. Because of the complexity of the problem, we develop a two-phase genetic algorithm based heuristic to solve the addressed problem. A set of pareto optimal solutions is obtained in the first phase containing relatively large number of solutions. This makes difficult the choice of the most suitable solution. For this reason the second phase will filter the obtained set so as to reduce its size. Performance of the proposed heuristic is evaluated through computational experiments on the benchmark of Muth & Thomson mt06 of 6×6 and 10 different sizes benchmarks of Lawrence. The results show that the heuristic gives solutions close to those obtained in the classic job shop scheduling problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hair data model: A new data model for Spatio-Temporal data mining

    Page(s): 18 - 22
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (742 KB) |  | HTML iconHTML  

    Spatio-Temporal data is related to many of the issues around us such as satellite images, weather maps, transportation systems and so on. Furthermore, this information is commonly not static and can change over the time. Therefore the nature of this kind of data are huge, analysing data is a complex task. This research aims to propose an intermediate data model that can represented suitable for Spatio-Temporal data and performing data mining task easily while facing problem in frequently changing the data. In order to propose suitable data model, this research also investigate the analytical parameters, the structure and its specifications for Spatio-Temporal data. The concept of proposed data model is inspired from the nature of hair which has specific properties and its growth over the time. In order to have better looking and quality, the data is needed to maintain over the time such as combing, cutting, colouring, covering, cleaning etc. The proposed data model is represented by using mathematical model and later developed the data model tools. The data model is developed based on the existing relational and object-oriented models. This paper deals with the problems of available Spatio-Temporal data models for utilizing data mining technology and defines a new model based on analytical attributes and functions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hybrid model using genetic algorithm and neural network for predicting dengue outbreak

    Page(s): 23 - 27
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (427 KB) |  | HTML iconHTML  

    Prediction of dengue outbreak becomes crucial in Malaysia because this infectious disease remains one of the main health issues in the country. Malaysia has a good surveillance system but there have been insufficient findings on suitable model to predict future outbreaks. While there are previous studies on dengue prediction models in Malaysia, unfortunately some of these models still have constraints in finding good parameter with high accuracy. The aim of this paper is to design a more promising model for predicting dengue outbreak by using a hybrid model based on genetic algorithm for the determination of weight in neural network model. Several model architectures are designed and the parameters are adjusted to achieve optimal prediction performance. Sample data that covers dengue and rainfall data of five districts in Selangor collected from State Health Department of Selangor (SHD) and Malaysian Meteorological Department is used as a case study to evaluate the proposed model. However, due to incomplete collection of real data, a sample data with similar behavior was created for the purpose of preliminary experiment. The result shows that the hybrid model produces the better prediction compared to standalone models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An algorithm for the selection of planting lining technique towards optimizing land Area: An algorithm for planting lining technique selection

    Page(s): 28 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (651 KB) |  | HTML iconHTML  

    This paper presents the design of algorithm solution for selecting a planting lining technique. The three techniques with different planting lining direction lead to different number of trees, therefore the technique promotes the highest number of tree is optimal technique. Optimization refers to the maximum number for better area utilization. The huge possible solution and uncertain result make the problem complex and it requires an intelligent expect for the solution. The algorithm is designed based on two basic works in which to calculate number of trees and divide an area into blocks. This algorithm solution generated the dataset based coordinates areas to analyze the techniques. The result shows that for small area the technique to be chosen is inconsistent but in large area the technique-3 is preferred. The series of generate results by the algorithm is also reported in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A feature selection model for binary classification of imbalanced data based on preference for target instances

    Page(s): 35 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (770 KB) |  | HTML iconHTML  

    Telemarketers of online job advertising firms face significant challenges understanding the advertising demands of small-sized enterprises. The effective use of data mining approach can offer e-recruitment companies an improved understanding of customers' patterns and greater insights of purchasing trends. However, prior studies on classifier built by data mining approach provided limited insights into the customer targeting problem of job advertising companies. In this paper we develop a single feature evaluator and propose an approach to select a desired feature subset by setting a threshold. The proposed feature evaluator demonstrates its stability and outstanding performance through empirical experiments in which real-world customer data of an e-recruitment firm are used. Practically, the findings together with the model may help telemarketers to better understand their customers. Theoretically, this paper extends existing research on feature selection for binary classification of imbalanced data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • K-means clustering pre-analysis for fault diagnosis in an aluminium smelting process

    Page(s): 43 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (853 KB) |  | HTML iconHTML  

    Developing a fault detection and diagnosis system of complex processes usually involve large volumes of highly correlated data. In the complex aluminium smelting process, there are difficulties in isolating historical data into different classes of faults for developing a fault diagnostic model. This paper presents a new application of using a data mining tool, k-means clustering in order to determine precisely how data corresponds to different classes of faults in the aluminium smelting process. The results of applying the clustering technique on real data sets show that the boundary of each class of faults can be identified. This means the faulty data can be isolated accurately to enable for the development of a fault diagnostic model that can diagnose faults effectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Edge preserving image enhancement via harmony search algorithm

    Page(s): 47 - 52
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB) |  | HTML iconHTML  

    Population based metaheuristic algorithms have been providing efficient solutions to the problems posed by various domains including image processing. In this contribution we address the problem of image enhancement with a specific focus on preserving the edges inherent in images with the aid of a musically inspired harmony search based metaheuristic algorithm. We demonstrate the significance of our proposed intuitive approach which combines efficient techniques from the image processing domain as well as from the optimization domain. Pertaining to the problem under consideration, further we compare our results with the state-of-the-art histogram equalization approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary-based feature construction with substitution for data summarization using DARA

    Page(s): 53 - 58
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (629 KB) |  | HTML iconHTML  

    The representation of input data set is important for learning task. In data summarization, the representation of the multi-instances stored in non-target tables that have many-to-one relationship with record stored in target table influences the descriptive accuracy of the summarized data. If the summarized data is fed into a classifier as one of the input features, the predictive accuracy of the classifier will also be affected. This paper proposes an evolutionary-based feature construction approach namely Fixed-Length Feature Construction with Substitution (FLFCWS) to address the problem by means of optimizing the feature construction for relational data summarization. This approach allows initial features to be used more than once in constructing newly constructed features. This is performed in order to exploit all possible interactions among attributes which involves an application of genetic algorithm to find a relevant set of features. The constructed features will be used to generate relevant patterns that characterize non-target records associated to the target record as an input representation for data summarization process. Several feature scoring measures are used as fitness function to find the best set of constructed features. The experimental results show that there is an improvement of predictive accuracy for classifying data summarized based on FLFCWS approach which indirectly improves the descriptive accuracy of the summarized data. It shows that FLFCWS approach can generate promising set of constructed features to describe the characteristics of non-target records for data summarization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Direct Ensemble Classifier for Imbalanced Multiclass Learning

    Page(s): 59 - 66
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (669 KB) |  | HTML iconHTML  

    Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks. Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain. Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers. In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers. The DECIML framework has been tested with several benchmark dataset and shows promising results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ABC algorithm as feature selection for biomarker discovery in mass spectrometry analysis

    Page(s): 67 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (725 KB) |  | HTML iconHTML  

    Mass spectrometry technique is gradually gaining momentum among the recent techniques deployed by several analytical research labs which intends to study biological or chemical properties of complex structures such as protein sequences. Literature reveals that reasoning voluminous mass spectrometry data via sophisticated computational techniques inspired by observing natural processes adapted by biological life has been yielding fruitful results towards the advancement of fields including bioinformatics and proteomics. Such advanced approaches provide efficient ways to mine mass spectrometry data in order to extract discriminating features that aid in discovering vital information, specifically discovering disease-related protein patterns in complex protein sequences. This study reveals the use of artificial bee colony (ABC) as a new feature selection technique incorporated with SVM classifier. Results achieved 96 and 100% for sensitivity and specificity respectively in discriminating cirrhosis and liver cancer cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Solving flexible manufacturing system distributed scheduling problem subject to maintenance using harmony search algorithm

    Page(s): 73 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (977 KB) |  | HTML iconHTML  

    Flexible manufacturing system is one of the industrial branches that highly competitive and rapidly expand. Globalization of the industrial system has encouraged the development of distributed manufacturing, including flexible manufacturing system. As such, the complexity of the problem faced in this new environment promotes current researcher to develop various approaches in optimizing the production scheduling. Approaches such as petri net, ant colony, genetic algorithm, intelligent agents, particle swarm optimization, and tabu search are used to apprehend optimization issues. In reality, maintenance is one of the core parts which is important to the manufacturing scheduling as it will affect greatly toward the manufacturing scheduling when the machine breakdown happen. Unfortunately, most approaches disregard the preventive maintenance in the production scheduling problem. In this paper, a harmony search algorithm is introduced to address the problem which includes maintenance. The problem description is successfully represented and the algorithm performance is studied with several parameter tunings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Opposition based Particle Swarm Optimization with student T mutation (OSTPSO)

    Page(s): 80 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (661 KB) |  | HTML iconHTML  

    Particle swarm optimization (PSO) is a stochastic algorithm, used for the optimization problems, proposed by Kennedy [1] in 1995. PSO is a recognized algorithm for optimization problems, but suffers from premature convergence. This paper presents an Opposition-based PSO (OPSO) to accelerate the convergence of PSO and at the same time, avoid early convergence. The proposed OPSO method is coupled with the student T mutation. Results from the experiment performed on the standard benchmark functions show an improvement on the performance of PSO. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fuzzy rule-based for predicting machining performance for SNTR carbide in milling titanium alloy (Ti-6Al-4v)

    Page(s): 86 - 90
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (641 KB) |  | HTML iconHTML  

    Rule-based reasoning and fuzzy logic are used to develop a model to predict the surface roughness value of milling process. The process parameters considered in this study are cutting speed, feed rate, and radial rake angle, each has five linguistic values. The fuzzy rule-based model is developed using MATLAB fuzzy logic toolbox. Nine linguistic values and twenty four IF-THEN rules are created for model development. Predicted result of the proposed model has been compared to the experimental result, and it gave a good agreement with the correlation 0.9845. The differences between experimental result and predicted result have been proven with estimation error value 0.0008. The best predicted value of surface roughness using the fuzzy rule-based is located at combination of High cutting speed, VeryLow feed rate, and High radial rake angle. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Meaningless to meaningful Web log data for generation of Web pre-caching decision rules using Rough Set

    Page(s): 91 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (907 KB) |  | HTML iconHTML  

    Web caching and pre-fetching are vital technologies that can increase the speed of Web loading processes. Since speed and memory are crucial aspects in enhancing the performance of mobile applications and websites, a better technique for Web loading process should be investigated. The weaknesses of the conventional Web caching policy include meaningless information and uncertainty of knowledge representation in Web logs data from the proxy cache to mobile-client. The organisation and learning task of the knowledge-processing for Web logs data require explicit representation to deal with uncertainties. This is due to the exponential growth of rules for finding a suitable knowledge representation from the proxy cache to the mobileclient. Consequently, Rough Set is chosen in this research to generate Web pre-caching decision rules to ensure the meaningless Web log data can be changed to meaningful information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Differential Evolution Algorithm for the University course timetabling problem

    Page(s): 99 - 102
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (714 KB) |  | HTML iconHTML  

    The University course timetabling problem is known as a NP-hard problem. It is a complex problem wherein the problem size can become huge due to limited resources (e.g. amount of rooms, their capacities and number availability of lecturers) and the requirements for these resources. The university course timetabling problem involves assigning a given number of events to a limited number of timeslots and rooms under a given set of constraints; the objective is to satisfy the hard constraints and minimize the violation of soft constraints. In this paper, a Differential Evolution (DE) algorithm is proposed. DE algorithm relies on the mutation operation to reduce the convergence time while reducing the penalty cost of solution. The proposed algorithm is tested over eleven benchmark datasets (representing one large, five medium and five small problems). Experimental results show that our approach is able to generate competitive results when compared with previous available approaches. Possible extensions upon this simple approach are also discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.