Building footprint maps are vital to many remote sensing (RS) applications, such as 3-D building modeling, urban planning, and disaster management. Due to the complexity of buildings, the accurate and reliable generation of the building footprint from RS imagery is still a challenging task. In this article, an end-to-end building footprint generation approach that integrates convolution neural network (CNN) and graph model is proposed. CNN serves as the feature extractor, while the graph model can take spatial correlation into consideration. Moreover, we propose to implement the feature pairwise conditional random field (FPCRF) as a graph model to preserve sharp boundaries and fine-grained segmentation. Experiments are conducted on four dif...