To increase both the capacity and the processing speed of input-queued (IQ) switches, a fair scalable architecture (FSA) has been proposed. By employing FSA which comprises several chips of cascaded sub-scheduler, a large-scale high performance network scheduler can be realized without the capacity limitation of monolithic device. Besides, each sub-scheduler of FSA system can be configured using any existing dynamic scheduling algorithm to realize best-effort matching. In this paper, we present the detailed design program of FSA, and then its FPGA implementation with Xilinx Vertex-4 devices. The simulation and synthesis results indicate that the solution achieves a good tradeoff between performance and hardware complexity. The design also supports multicast traffic.