During a music performance, the musician adds expressiveness to the musical message by changing timing, dynamics, and timbre of the musical events to communicate an expressive intention. Traditionally, the analysis of music expression is based on measurements of the deviations of the acoustic parameters with respect to the written score. In this paper, we employ machine learning techniques to understand the expressive communication and to derive audio features at an intermediate level, between music intended as a structured language and notes intended as sound at a more physical level. We start by extracting audio features from expressive performances that were recorded by asking the musicians to perform in order to convey different expressive intentions. We use a sequential forward selection procedure to rank and select a set of features for a general description of the expressions, and a second one specific for each instrument. We show that higher recognition ratings are achieved by using a set of four features which can be specifically related to qualitative descriptions of the sound by physical metaphors. These audio features can be used to retrieve expressive content on audio data, and to design the next generation of search engines for music information retrieval.