Skip to Main Content
This paper explores the hardware and software mechanisms necessary for an efficient programmable 10 Gigabit Ethernet network interface card. Network interface processing requires support for the following characteristics: a large volume of frame data, frequently accessed frame metadata, and high frame rate processing. This paper proposes three mechanisms to improve programmable network interface efficiency. First, a partitioned memory organization enables low-latency access to control data and high-bandwidth access to frame contents from a high-capacity memory. Second, a distributed task-queue mechanism enables parallelization of frame processing across many low-frequency cores, while using software to maintain total frame ordering. Finally, the addition of two new atomic read-modify-write instructions reduces frame ordering overheads by 50%. Combining these hardware and software mechanisms enables a network interface card to saturate a full-duplex 10 Gb/s Ethernet link by utilizing 6 processor cores and 4 banks of on-chip SRAM operating at 166 MHz, along with external 500 MHz GDDR SDRAM.