High-performance routers for next generation networks are required to scale with the increasing `port number` due to developments of various network access technologies. It has been common practice to build scalable multi-stage multi-layer switches in circuit switching. For packet switches, however, researchers have so far had no consensus on how to employ the multi-stage multi-layer architecture. The underlying reason is that packet switching performance is much affected by their buffering strategies. In single-stage crossbar routers, the well-understood combined-input-output-queuing (CIOQ, at one extreme is input queuing, at the other is output queuing) has rich and growing theories about its performance analysis. In multi-stage multi-layer switches, however, researchers were not much clear about how or whether it is necessary to introduce buffers into the middle stages. In this study, the authors investigate this problem and propose to buffer all packets in the central stage instead of the input side. By making all buffers fully shared (in a distributed way), we show that routers with multi-stage multi-layer switching and single-stage shared buffering (MMS-SSB) are `scalable` in terms of not only the hardware cost, but the efficient scheduling algorithms as well. In addition, both analysis and simulations show that performance of the MMS-SSB switch with our proposed scheduling algorithms is `independent` of outside traffic.