In conjunction with Moore's Law, computer speeds are expected to double approximately every two years, but with the current challenges that computer manufacturers are facing to double speeds of individual processors, due to various reasons, such as processor temperatures, multiprocessor architectures have become more popular nowadays. Eventually, this has led to an increased interest in standards for writing parallel applications. The Message Passing Interface (MPI) has become the de facto standard for writing parallel applications. Our growing needs for the latest high performance computing solutions in Saudi Aramco, the world's largest oil producing company, has given us the opportunity to evaluate three of the most commonly used MPI implementations, MVAPICH, Open MPI, and Intel MPI on Intel's latest Nehalem processor. In this paper, we describe our test bed environment along with the evaluations that we did using the High Performance Linpack (HPL) benchmark, which is the standard benchmark for ranking the world's Top 500 supercomputers. We also discuss our own developed Web tool that is used to suggest tuned input values for the HPL benchmark. We show the performance numbers in GFLOPS along with the run times and system efficiencies when running on 32, 64 and 128 Infiniband Nehalem Linux cluster nodes using the three implementations of MPI. We finally discuss our results in terms of performance and scalability and we share our interpretations and future work.