Image with MPEG-7 descriptors as features may loss local details. In this work, we combine MPEG-7 descriptors with local feature key points to cover both global and local image characteristics. Images are classified by a Radial Basis Function Neural Network (RBFNN) trained via a minimization of Localized Generalization Error Model (L-GEM). In this paper, we extract local feature key points by the Scale Invariant Feature Transform (SIFT). Four color and three texture MPEG-7 descriptors are extracted. Experimental results show that the introduction of local feature key points effectively improves the testing accuracy of image classification.