A benchmark GaoFen-7 dataset for building extraction from satellite images

Sci Data. 2024 Feb 10;11(1):187. doi: 10.1038/s41597-024-03009-5.

Abstract

Accurate building extraction is crucial for urban understanding, but it often requires a substantial number of building samples. While some building datasets are available for model training, there remains a lack of high-quality building datasets covering urban and rural areas in China. To fill this gap, this study creates a high-resolution GaoFen-7 (GF-7) Building dataset utilizing the Chinese GF-7 imagery from six Chinese cities. The dataset comprises 5,175 pairs of 512 × 512 image tiles, covering 573.17 km2. It contains 170,015 buildings, with 84.8% of the buildings in urban areas and 15.2% in rural areas. The usability of the GF-7 Building dataset has been proved with seven convolutional neural networks, all achieving an overall accuracy (OA) exceeding 93%. Experiments have shown that the GF-7 building dataset can be used for building extraction in urban and rural scenarios. The proposed dataset boasts high quality and high diversity. It supplements existing building datasets and will contribute to promoting new algorithms for building extraction, as well as facilitating intelligent building interpretation in China.

Publication types

  • Dataset