By Topic

Machine Learning for Signal Processing, 2005 IEEE Workshop on

Date 28-28 Sept. 2005

Filter Results

Displaying Results 1 - 25 of 74
  • [Breaker page]

    Publication Year: 2005 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (23 KB)  
    Freely Available from IEEE
  • Local Linear ICA for Mutual Information Estimation in Feature Selection

    Publication Year: 2005 , Page(s): 3 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    Mutual information is an important tool in many applications. Specifically, in classification systems, feature selection based on MI estimation between features and class labels helps to identify the most useful features directly related to the classification performance. MI estimation is extremely difficult and imprecise in high dimensional feature spaces with an arbitrary distribution. We propose a framework using ICA and sample-spacing based entropy estimators to estimate MI. In this framework, the higher dimensional MI estimation is reduced to independent multiple one-dimensional MI estimation problems. This approach is computationally efficient, however, its precision heavily relies on the results of ICA. In our previous work, we assumed the feature space has linear structure, hence linear ICA was adopted. Nevertheless, this is a weak assumption, which might not be true in many applications. Although nonlinear ICA can solve any ICA problem in theory, its complexity and the requirement of data samples restrict its application. A better trade-off between linear and non-linear ICA could be local linear ICA, which uses piecewise linear ICA to approximate the non-linear relationships among the data. In this paper, we propose that by replacing linear ICA with local linear ICA, we can get precise MI estimation without greatly increasing the computational complexity. The experiment results also substantiate this claim View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Proposal for Blind Fir Equalization of Time-Varying Channels

    Publication Year: 2005 , Page(s): 9 - 14
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (173 KB) |  | HTML iconHTML  

    The multimodal and time-varying aspects of blind equalization problems in communication systems are treated here by means of an immune-inspired strategy capable of estimating the coefficients of the FIR equalization filter in an unsupervised manner. The associated optimization problem is solved by means of a population-based search technique characterized by a dynamic control of the population size and diversity maintenance. Static and time-varying channels have been proposed in simulated scenarios, aiming at indicating the tracking capability derived from the adaptive adjustment of the coefficients of the blind equalizer View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overcomplete Blind Source Separation by Combining ICA and Binary Time-Frequency Masking

    Publication Year: 2005 , Page(s): 15 - 20
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB) |  | HTML iconHTML  

    A limitation in many source separation tasks is that the number of source signals has to be known in advance. Further, in order to achieve good performance, the number of sources cannot exceed the number of sensors. In many real-world applications these limitations are too strict. We propose a novel method for over-complete blind source separation. Two powerful source separation techniques have been combined, independent component analysis and binary time-frequency masking. Hereby, it is possible to iteratively extract each speech signal from the mixture. By using merely two microphones we can separate up to six mixed speech signals under anechoic conditions. The number of source signals is not assumed to be known in advance. It is also possible to maintain the extracted signals as stereo signals View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ICA by Maximization of Nongaussianity using Complex Functions

    Publication Year: 2005 , Page(s): 21 - 26
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2901 KB) |  | HTML iconHTML  

    We use complex, hence analytic, functions to achieve independent component analysis (ICA) by maximization of nonGaussianity and introduce the complex maximization of nonGaussianity (CMN) algorithm. We show that CMN converges to the principal component of the source distribution and that the algorithm provides robust performance for both circular and non-circular sources View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blind Source Separation and Sparse Bump Modelling of Time Frequency Representation of Eeg Signals: New Tools for Early Detection of Alzheimer's Disease

    Publication Year: 2005 , Page(s): 27 - 32
    Cited by:  Papers (40)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB) |  | HTML iconHTML  

    The early detection of Alzheimer's disease (AD) is an important challenge. In this paper, we propose a novel method for early detection of AD using only electroencephalographic (EEG) recordings for patients with mild cognitive impairment (MCI) without any clinical symptoms of the disease who later developed AD. In our method, first a blind source separation algorithm is applied to extract the most significant spatiotemporal uncorrelated components; afterward these components are wavelet transformed; subsequently the wavelets or more generally time frequency representation (TFR) is approximated with sparse bump modeling approach. Finally, reliable and discriminant features are selected and reduced with orthogonal forward regression and the random probe methods. The proposed features were finally fed to a simple neural network classifier. The presented method leads to a substantially improved performance (93% correctly classified - improved sensitivity and specificity) over classification results previously published on the same set of data. We hope that the new computational and machine learning tools provide some new insights in a wide range of clinical settings, both diagnostic and predictive View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing Nonlinear Algorithm in Multimicrophone Signal Processing

    Publication Year: 2005 , Page(s): 33 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (490 KB) |  | HTML iconHTML  

    We address in this paper a method for blind source separation of multi-microphone signals. The multi-microphone is modelled as a nonlinear mapping system, the nonlinear characteristic takes into consideration the sensor effect and natural phenomena. The observations (recorded signals) are modelled as post nonlinear mixtures. The proposed nonlinear algorithm is a generalization of serial gradient algorithm, cross-correlations, and Gram-Charlier series, which is extended in two ways: (1) to deal with nonlinear mapping, and (2) to be able to adapt to the actual statistical distributions of the sources by estimating the kernel density distribution at the output signals. The theory of the proposed learning algorithm is discussed. Simulations show that the algorithm is able to find the underlying sources from the post-nonlinear mixture observations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Breaker page]

    Publication Year: 2005 , Page(s): 41
    Save to Project icon | Request Permissions | PDF file iconPDF (22 KB)  
    Freely Available from IEEE
  • Multi-Scale Kernel Methods for Classification

    Publication Year: 2005 , Page(s): 43 - 48
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (611 KB) |  | HTML iconHTML  

    We propose the enhancement of support vector machines for classification, by the use of multi-scale kernel structures (based on wavelet philosophy) which can be linearly combined in a spatially varying way. This provides a good tradeoff between ability to generalize well in areas of sparse training vectors and ability to fit fine detail of the decision surface in areas where the training vector density is sufficient to provide this information. Our algorithm is a sequential machine learning method in that progressively finer kernel functions are incorporated in successive stages of the learning process. Its key advantage is the ability to find the appropriate kernel scale for every local region of the input space View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Robust Linear Programming Based Boosting Algorithm

    Publication Year: 2005 , Page(s): 49 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (301 KB) |  | HTML iconHTML  

    AdaBoost has been successfully used in many signal processing systems for data classification. It has been observed that on highly noisy data AdaBoost leads to overfitting. In this paper, a new regularized boosting algorithm LPnorm2-AdaBoost (LPNA), arising from the close connection between AdaBoost and linear programming, is proposed to mitigate the overfitting problem. In the algorithm, the data distribution skewness is controlled during the learning process to prevent outliers from spoiling decision boundaries by introducing a smooth convex penalty function (l2 norm) into the objective of the minimax problem. A stabilized column generation technique is used to transform the optimization problem into a simple linear programming problem. The effectiveness of the proposed algorithm is demonstrated through experiments on a wide variety of datasets View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Determination of the Number of Clusters Using Spectral Algorithms

    Publication Year: 2005 , Page(s): 55 - 60
    Cited by:  Papers (6)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (425 KB) |  | HTML iconHTML  

    We introduce a novel spectral clustering algorithm that allows us to automatically determine the number of clusters in a dataset. The algorithm is based on a theoretical analysis of the spectral properties of block diagonal affinity matrices; in contrast to established methods, we do not normalise the rows of the matrix of eigenvectors, and argue that the non-normalised data contains key information that allows the automatic determination of the number of clusters present. We present several examples of datasets successfully clustered by our algorithm, both artificial and real, obtaining good results even without employing refined feature extraction techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Extension of Iterative Scaling for Joint Decision-Level and Feature-Level Fusion in Ensemble Classification

    Publication Year: 2005 , Page(s): 61 - 66
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB) |  | HTML iconHTML  

    Improved iterative scaling (IIS) is a simple, powerful algorithm for learning maximum entropy (ME) conditional probability models that has found great utility in natural language processing and related applications. In nearly all prior work on IIS, one considers discrete-valued feature functions, depending on the data observations and class label, and encodes statistical constraints on these discrete-valued random variables. Moreover, most significantly for our purposes, the (ground-truth) constraints are measured from frequency counts, based on hard (0-1) training set instances of feature values. Here, we extend IIS for the case where the training (and test) set consists of instances of probability mass functions on the features, rather than instances of hard feature values. We show that the IIS methodology extends in a natural way for this case. This extension has applications 1) to ME aggregation of soft classifier outputs in ensemble classification and 2) to ME classification on mixed discrete-continuous feature spaces. Moreover, we combine these methods, yielding an ME method that jointly performs (soft) decision-level fusion and feature-level fusion in making ensemble decisions. We demonstrate favorable comparisons against both standard boosting and bagging on UC Irvine benchmark data sets. We also discuss some of our continuing research directions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supervised Neural Network Training using the Minimum Error Entropy Criterion with Variable-Size and Finite-Support Kernel Estimates

    Publication Year: 2005 , Page(s): 67 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    The insufficiency of mere second-order statistics in many application areas have been discovered and more advanced concepts including higher-order statistics, especially those stemming from information theory like error entropy minimization are now being studied and applied in many contexts by researchers in machine learning and signal processing. The main drawback of using minimization of output error entropy for adaptive system training is the computational load when fixed-size kernel estimates are employed. Entropy estimators based on sample spacing, on the other hand, have lower computational cost, however they are not differentiable, which makes them unsuitable for adaptive learning. In this paper, a nonparametric entropy estimator that blends the desirable properties of both techniques in a variable-size finite-support kernel estimation methodology is presented. This yields an estimator suitable for adaptation, yet has computational complexity similar to sample spacing techniques. The estimator is illustrated in supervised adaptive system training using the minimum error entropy criterion View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral Clustering with Mean Shift Preprocessing

    Publication Year: 2005 , Page(s): 73 - 78
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3081 KB) |  | HTML iconHTML  

    Clustering is a fundamental problem in machine learning with numerous important applications in statistical signal processing, pattern recognition, and computer vision, where unsupervised analysis of data classification structures are required. The current state-of-the-art in clustering is widely accepted to be the so-called spectral clustering. Spectral clustering, based on pairwise affinities of samples imposes very large computational requirements. In this paper, we propose a vector quantization preprocessing stage for spectral clustering similar to the classical mean-shift principle for clustering. This preprocessing reduces the dimensionality of the matrix on which spectral techniques will be applied, resulting in significant computational savings View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Breaker page]

    Publication Year: 2005 , Page(s): 79
    Save to Project icon | Request Permissions | PDF file iconPDF (22 KB)  
    Freely Available from IEEE
  • Hierarchical Feature Subset Selection for Features Computed from the Continuous Wavelet Transform

    Publication Year: 2005 , Page(s): 81 - 86
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB) |  | HTML iconHTML  

    An algorithm for feature subset selection is proposed in which the correlation structure of the features is exploited. Especially in pattern recognition applications when features are computed from the continuous wavelet transform features are highly correlated and the algorithm is shown to be performing better. The algorithm is a hybrid filter/wrapper approach for feature subset selection. The filter removes irrelevant and redundant features. The wrapper part of the algorithm can be conceived as a hierarchical search for features: a search at the cluster level followed by a search at within-cluster level. It is shown that a significant increase in performance for the ACO (ant colony optimization) and the GA (genetic algorithm) optimization algorithms are obtained, both examples of meta heuristic optimization algorithms. However our approach is not limited to meta-heuristic search algorithms. Essentially any search algorithm can be plugged into the proposed algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dimensionality Reduction using a Mixed Norm Penalty Function

    Publication Year: 2005 , Page(s): 87 - 92
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (519 KB) |  | HTML iconHTML  

    The dimensionality of a problem that is addressed by neural networks is related to the number of hidden neuron in the network. Pruning neural networks to reduce the number of hidden neurons reduces the dimensionality of the system, produces a more efficient computation and yields a network with better ability to generalize beyond the training data. This work introduces a novel penalty function that is shown to reduce the number of active neurons. The performance of this function is superior to other known penalty functions. To best implement this function, we use bi-level optimization, which enables us to reduce dimensionality while maintaining good classification performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Very Fast and Efficient Linear Classification Algorithm

    Publication Year: 2005 , Page(s): 93 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (230 KB) |  | HTML iconHTML  

    We present a new, very fast and efficient learning algorithm for binary linear classification derived from an earlier neural model developed by one of the authors. The original method was based on the idea of describing the solution cone, i.e., the convex region containing the separating vectors for a given set of patterns and then updating this region every time a new pattern is introduced. The drawback of that method was the high memory and computational costs required for computing and storing the edges that describe the cone. In the modification presented here we avoid these problems by obtaining just one solution vector inside the cone using an iterative rule, thus greatly simplifying and accelerating the process at the cost of very few misclassification errors. Even these errors can be corrected, to a large extend, using various techniques. Our method was tested on the real-world application of named entities recognition obtaining results comparable to other state of the art classification methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Analysis of Thewarped Discrete Cosine Transform Cepstrum with MFCC Using Different Classifiers

    Publication Year: 2005 , Page(s): 99 - 104
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    In this paper, we continue our investigation of the warped discrete cosine transform cepstrum (WDCTC), which was earlier introduced as a new speech processing feature (Muralishankar et al., 2005). Here, we study the statistical properties of the WDCTC and compare them with the mel-frequency cepstral coefficients (MFCC). We report some interesting properties of the WDCTC when compared to the MFCC: its statistical distribution is more Gaussian-like with lower variance, it obtains better vowel cluster separability, it forms tighter vowel clusters and generates better codebooks. Further, we employ the WDCTC and MFCC features in a 5-vowel recognition task using vector quantization (VQ), 1-nearest neighbour (1-NN), probabilistic neural network (PNN) and Gaussian discriminant analysis (GDA) as classifiers. Finally, we discuss the vowel recognition results in the context of the statistical properties of the WDCTC and MFCC. In our experiments, the WDCTC consistently outperforms the MFCC View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Proposal Distribution with Gradient Measures for Tracking

    Publication Year: 2005 , Page(s): 105 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (279 KB) |  | HTML iconHTML  

    Particle filters have become a useful tool for the task of object tracking due to their applicability to a wide range of situations. To be able to obtain an accurate estimate from a particle filter a large number of particles is usually necessary. A crucial step in the design of a particle filter is the choice of the proposal distribution. A common choice for the proposal distribution is to use the transition distribution which models the dynamics of the system but takes no account of the current measurements. We present a particle filter for tracking rigid objects in video sequences that makes use of image gradients in the current frame to improve the proposal distribution. The gradient information is efficiently incorporated in the filter to minimise the computational cost. Results from synthetic and natural sequences show that the gradient information improves the accuracy and reduces the number of particles required View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Threshold Learning from Samples Drawn from the Null Hypothesis for the Generalized Likelihood Ratio CUSUM Test

    Publication Year: 2005 , Page(s): 111 - 116
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (458 KB) |  | HTML iconHTML  

    Although optimality of sequential tests for the detection of a change in the parameter of a model has been widely discussed, the test parameter tuning is still an issue. In this communication, we propose a learning strategy to set the threshold of the GLR CUSUM statistics to take a decision with a desired false alarm probability. Only data before the change point are required to perform the learning process. Extensive simulations are performed to assess the validity of the proposed method. The paper is concluded by opening the path to a new approach to multi-modal feature based event detection for video parsing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of Multiple-Level Hybrid Classifier for Intrusion Detection System

    Publication Year: 2005 , Page(s): 117 - 122
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB) |  | HTML iconHTML  

    As the number of networked computers grows, intrusion detection is an essential component in keeping networks secure. However, constructing and maintaining a misuse detection system is very labor-intensive since attack scenarios and patterns need to be analyzed and categorized, and the corresponding rules and patterns need to be carefully hand-coded. Thus, data mining can be used to ease this inconvenience. This paper proposes a multiple-level hybrid classifier, an intrusion detection system that uses a combination of tree classifiers and clustering algorithms to detect intrusions. Performance of this new algorithm is compared to other popular approaches such as MADAM ID and 3-level tree classifiers, and significant improvement has been achieved from the viewpoint of both high intrusion detection rate and reasonably low false alarm rate View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Feature Extraction Using Recursive Cluster-Based Linear Discriminant with Application to Face Recognition

    Publication Year: 2005 , Page(s): 123 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (225 KB) |  | HTML iconHTML  

    Two new recursive procedures for extracting discriminant features, termed recursive modified linear discriminant (RMLD) and recursive cluster-based linear discriminant (RCLD) are proposed in this paper. The two new methods, RMLD and RCLD overcome two major shortcomings of Fisher linear discriminant (FLD): it can fully exploit all information available for discrimination; it removes the constraint on the total number of features that can be extracted. Extensive experiments of comparing the new algorithm with the traditional FLD and some of its variations, LDA based on null space of SW, modified FLD (MFLD), and recursive FLD (RFLD), have been carried out on various types of face recognition problems for both Yale and JAFFE databases, in which the resulting improvement of the performances by the new feature extraction scheme is significant View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hypotheses Control-Based Strategies for the Simplification of Bayesian Multiuser Detectors

    Publication Year: 2005 , Page(s): 129 - 134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (437 KB) |  | HTML iconHTML  

    This paper deals with the development of several simplification strategies, which can be applied to those communication problems analyzed using the Bayesian formulation. Specifically, we will focus our interest on the multiuser detection problem in wireless DS/CDMA environments, where the complexity of the theoretical algorithm grows exponentially with the number of active users as well as with the number of symbols received. The implementation of these algorithms is, most of the times, an unfeasible task and reduced-complexity suboptimal algorithms must then be developed. This paper constitutes a continuation of previous work of the author on Bayesian single- and multi-user detectors for wireless communications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cascade Jump Support Vector Machine Classifiers

    Publication Year: 2005 , Page(s): 135 - 139
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (303 KB) |  | HTML iconHTML  

    In this paper we present a new support vector machine (SVM) based classifier that is able to achieve better generalization as compared to the standard SVM. Better generalization is achieved by using a cascade of modified proximal SVMs to remove simpler examples before presenting the difficult examples to a more complex SVM. The cascade structure uses the discrimination afforded by different feature spaces (by using different kernels) to simplify the classification task View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.