Automatic Building Extraction from Google Earth Images under Complex Backgrounds Based on Deep Instance Segmentation Network

Qi Wen; Kaiyu Jiang; Wei Wang; Qingjie Liu; Qing Guo; Lingling Li; Ping Wang

doi:10.3390/s19020333

Automatic Building Extraction from Google Earth Images under Complex Backgrounds Based on Deep Instance Segmentation Network

Sensors (Basel). 2019 Jan 15;19(2):333. doi: 10.3390/s19020333.

Authors

Qi Wen¹, Kaiyu Jiang², Wei Wang³, Qingjie Liu^{4

5}, Qing Guo⁶, Lingling Li⁷, Ping Wang⁸

Affiliations

¹ National Disaster Reduction Center of China, Beijing 100124, China. whistlewen@aliyun.com.
² School of Computer Science and Engineering, Beihang University, Beijing 100191, China. kyjiang@buaa.edu.cn.
³ National Disaster Reduction Center of China, Beijing 100124, China. wangwei@ndrcc.gov.cn.
⁴ School of Computer Science and Engineering, Beihang University, Beijing 100191, China. qingjie.liu@buaa.edu.cn.
⁵ The State Key Laboratory of Virtual Reality of Technology and Systems, Beihang University, Beijing 100191, China. qingjie.liu@buaa.edu.cn.
⁶ The People's Insurance Company of China, Beijing 100022, China. guoqing@picc.com.cn.
⁷ National Disaster Reduction Center of China, Beijing 100124, China. lilingling@ndrcc.gov.cn.
⁸ National Disaster Reduction Center of China, Beijing 100124, China. wangping@ndrcc.gov.cn.

Abstract

Building damage accounts for a high percentage of post-natural disaster assessment. Extracting buildings from optical remote sensing images is of great significance for natural disaster reduction and assessment. Traditional methods mainly are semi-automatic methods which require human-computer interaction or rely on purely human interpretation. In this paper, inspired by the recently developed deep learning techniques, we propose an improved Mask Region Convolutional Neural Network (Mask R-CNN) method that can detect the rotated bounding boxes of buildings and segment them from very complex backgrounds, simultaneously. The proposed method has two major improvements, making it very suitable to perform building extraction task. Firstly, instead of predicting horizontal rectangle bounding boxes of objects like many other detectors do, we intend to obtain the minimum enclosing rectangles of buildings by adding a new term: the principal directions of the rectangles θ. Secondly, a new layer by integrating advantages of both atrous convolution and inception block is designed and inserted into the segmentation branch of the Mask R-CNN to make the branch to learn more representative features. We test the proposed method on a newly collected large Google Earth remote sensing dataset with diverse buildings and very complex backgrounds. Experiments demonstrate that it can obtain promising results.

Keywords: Mask R-CNN; building extraction; deep learning; instance segmentation; receptive field block; rotation bounding box.

Abstract

Grants and funding