Architectures for high-speed switches have been intensely studied over the past few years due to the need to rapidly scale switch capacities to match the large capacity increases in optical transmission and to handle the fast growth in Internet traffic. Several key research results (McKeown, 1996), (Dai, 2000), showing how fundamental throughput limits can be overcome, have made input-buffered architectures popular for implementing high-speed switches. These architectures also have the inherent advantage of not needing a large speed-up at the output ports. However, a major bottleneck in scaling input-buffered switches is the scheduler needed to keep the switch stable and achieve high-throughput. Several approaches have been proposed to solve this problem including pipelining the implementation, reducing the frequency of schedule computations, etc. An alternate approach which eliminates the scheduler has been proposed but this requires switch speed-up and mechanisms to re-sequence out-of-order packets. In this paper, we develop a scheme that virtually eliminates the scheduling bottleneck. This is achieved by an intelligent combination of parallel implementation and a new scheduling algorithm that works well with partial information regarding arrivals and inexact knowledge of current backlog between various input-output ports. We show that our scheme is stable, achieves very low delays in comparison to other schemes, performs very well even at very high utilization and for various traffic patterns.