I. Introduction
With the rapid development of high-resolution satellites and remote sensing technology, building extraction from remote sensing imagery is of great significance for geographic applications, such as urban planning [1], [2], population estimation [3], and land cover mapping [4]. As a binary segmentation task, the main purpose of building extraction is to assign each pixel in a remote-sensing image as a building or nonbuilding label. With the increasing building extraction requirements and the growing number of high-resolution remote sensing images, it is crucial to find an efficient way to accurately and automatically extract buildings.