Skip to Main Content
The goal of machine learning is to build automated systems that can classify and recognize complex patterns in data. The representation of the data plays an important role in determining what types of patterns can be automatically discovered. Many algorithms for machine learning assume that the data are represented as elements in a metric space. The performance of these algorithms can depend sensitively on the manner in which distances are measured. When data are represented as points in a multidimensional vector space, simple Euclidean distances are often used to measure the dissimilarity between different examples. However, such distances often do not yield reliable judgments; in addition, they cannot highlight the distinctive features that play a role in certain types of classification, but not others. Naturally, for different types of clustering, different ways of measuring dissimilarity were needed. In particular, different metrics for computing distances between feature vectors. This paper describes two algorithms for learning such distance metrics based on recent developments in convex optimization.