Cloud Computing has been increasingly accepted as a promising computing paradigm in industry, with one of the most common delivery models being Infrastructure as a Service (IaaS). An increasing number of providers have started to supply public IaaS services with different terminologies, definitions, and goals. As such, understanding the full scope of performance evaluation of candidate services would be crucial and beneficial for both service customers (e.g., cost-benefit analysis) and providers (e.g., direction of improvement). Given the numerous and diverse IaaS service features to be evaluated, a natural strategy is to implement different types of evaluation experiments separately. Unfortunately, it could be hard to fairly distinguish between different experimental types due to different environments and techniques that may be adopted by different evaluators. To overcome such obstacles, we have first established a novel taxonomy to help profile and clarify the nature of IaaS services performance evaluation, and then built a three-layer conceptual model to generalize the existing performance evaluation practices. Using relevant elements/classifiers in the taxonomy and conceptual model, evaluators can construct natural language-style descriptions and experimental design blueprints to outline the evaluation scope, and also to guide new evaluation implementations. In essence, the generated descriptions and blueprints abstractly define and characterize the actual evaluation work. This enables relatively fair and rational comparisons between different performance evaluations according to their abstract characteristics.