Skip to Main Content
To obtain high performance from the IBM 3090 Vector Facility, we must investigate vector instruction constructs in terms of the loop context of the application algorithm. We exemplify the method by linear algebra subroutines for basic matrix operations and a linear equation solver. In these examples, we clarify the mathematical meaning that each loop is computed by analyzing the loops in terms of a generic algorithm. This analysis helps us to achieve optimal loop selection. We then obtain additional performance gain by considering cache capacity. These procedures suggest that there are three levels of performance classification. They also show that program structure yields great benefits in terms of performance and generality of the program.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.