In this paper, we propose a depth-map merging based multiple view stereo method for large-scale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets.