The Message Passing Interface (MPI) is a standard in parallel computing, and can also be used as a high- performance programming model for Grid application development. How to execute MPI applications efficiently over a computational Grid has become a big challenge to developers, due to the distributed nature of Grid resources and complex hierarchies of Grid links. In this paper, we present three useful techniques for improving the performance of MPI applications over a computational Grid. We introduce the multithreaded model to the implementation of MPI point-to- point operations, to overlap communication with computation and speed up point-to-point operations. To enable the porting of MPI applications to a Grid composed of multiple private-IP clusters, a cross- subnet communication mechanism based on NAT has been designed. To improve the performance of MPI collective operations over a computational Grid, we implements a kind of topology-aware collective communication algorithms based on a local communicator creation mechanism. These three techniques are adopted in an ongoing Grid-enabled MPI implementation called FiTMPI.