In this paper we develop a graphics processing unit (GPU)-based massively parallel approach for efficient computation of electromagnetic scattering via a proposed double-layer vegetation model composed of vegetation and ground layers. The proposed vector radiative transfer (VRT) model for vegetation scattering considers different sizes and orientations of the leaves. It uses the Monte Carlo method to calculate the backward scattering coefficients of rough ground and vegetation where the leaves are approximated as a large number of randomly oriented flat ellipsoids and the ground is treated as a Gaussian random rough surface. In the original CPU-based sequential code, the Monte Carlo simulation to calculate the electromagnetic scattering of vegetation takes up 97.2% of the total execution time. In this paper we take advantage of the massively parallel compute capability of NVIDIA Fermi GTX480 with the Compute Unified Device Architecture (CUDA) to compute the multiple scattering of all the leaf groups simultaneously. Our parallel design includes the registers for faster memory access, the shared memory for parallel reduction, the pipelined multiple-stream asynchronous transfer, the parallel random number generator and the CPU-GPU heterogeneous computation. By using these techniques, we achieved speedup of 213-fold on the NVIDIA GTX 480 GPU and 291-fold on the NVIDIA GTX 590 GPU as compared with its single-core CPU counterpart.