Recently, high-end reconfigurable computing systems have been built that employ Field Programmable Gate Arrays (FPGAs) as hardware accelerators for general-purpose processors. These systems not only provide new opportunities for high-performance computing, but also pose new challenges to application developers. In this paper, we build a design model for hybrid designs that utilize both the processors and the FPGAs. The model characterizes a reconfigurable computing system using various parameters. Based on the model, we propose a design methodology for hardware/software co-design. The methodology partitions workload between the processors and the FPGAs, maintains load balance in the system, and realizes scalability over multiple nodes. Designs are proposed for several computationally intensive applications: matrix multiplication, matrix factorization and the Floyd-Warshall algorithm for the all-pairs shortest-paths problem. To illustrate our ideas, the proposed hybrid designs are implemented on a Cray XD1. Experimental results show that our designs utilize both the processors and the FPGAs efficiently, and overlap most of the data transfer overheads and network communication costs with the computations. Our designs achieve up to 90% of the total performance of the nodes, and 90% of the performance predicted by the design model. In addition, our designs scale over a large number of nodes.