We explore processor-cache affinity scheduling of parallel network protocol processing, in a setting in which protocol processing executes on a shared-memory multiprocessor concurrently with a general workload of non-protocol activity. We find that affinity-based scheduling can significantly reduce the communication delay associated with protocol processing, enabling the host to support a greater number of concurrent streams and to provide higher maximum throughput to individual streams. In addition, we compare the performance of two parallelization alternatives, locking and independent protocol stacks (IPS), with very different caching behaviors. We find that IPS (which maximizes cache affinity) delivers much lower message latency and significantly higher message throughput capacity, yet exhibits less robust response to infra-stream burstiness and limited intra-stream scalability
Published in:
High Performance Distributed Computing, 1995., Proceedings of the Fourth IEEE International Symposium on
Date of Conference: 2-4 Aug 1995