Skip to Main Content
We present a practical approach to optimizing whole-system performance of throughput-constrained embedded devices. The proposed methodology relies on a combination of systematic profiling and system-level analysis techniques which leverage hardware, application and tools knowledge. On our target platform - the Intel XScale iXP425 processor, an initial performance assessment identified instruction cache behaviour as the main factor of performance degradation. By taking into consideration the configuration and coding style of the operating system, the structure of the required device drivers, the architecture of the application software stack, and the choice and configuration of the compiler, the performance optimization process resulted in a six-fold increase of processing capabilities of the system expressed in packets per second.