3DMGNet: 3D Model Generation Network Based on Multi-Modal Data Constraints and Multi-Level Feature Fusion

Ende Wang; Lei Xue; Yong Li; Zhenxin Zhang; Xukui Hou

doi:10.3390/s20174875

3DMGNet: 3D Model Generation Network Based on Multi-Modal Data Constraints and Multi-Level Feature Fusion

Sensors (Basel). 2020 Aug 28;20(17):4875. doi: 10.3390/s20174875.

Authors

Ende Wang^{1

2

3}, Lei Xue⁴, Yong Li⁵, Zhenxin Zhang⁶, Xukui Hou^{1

2

3}

Affiliations

¹ Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
² Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China.
³ Key Laboratory of Image Understanding and Computer Vision, Liaoning Province, Shenyang 110016, China.
⁴ School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, China.
⁵ College of Information Science and Engineering, Northeastern University, Shenyang 110819, China.
⁶ Key Lab of 3D Information Acquisition and Application, MOE, and College of Resource Environment and Tourism, Capital Normal University, Beijing 100048, China.

Abstract

Due to the limitation of less information in a single image, it is very difficult to generate a high-precision 3D model based on the image. There are some problems in the generation of 3D voxel models, e.g., the information loss at the upper level of a network. To solve these problems, we design a 3D model generation network based on multi-modal data constraints and multi-level feature fusion, named as 3DMGNet. Moreover, 3DMGNet is trained by self-supervised method to achieve 3D voxel model generation from an image. The image feature extraction network (2DNet) and 3D feature extraction network (3D auxiliary network) are used to extract the features of the image and 3D voxel model. Then, feature fusion is used to integrate the low-level features into the high-level features in the 3D auxiliary network. To extract more effective features, each layer of the feature map in feature extraction network is processed by an attention network. Finally, the extracted features generate 3D models by a 3D deconvolution network. The feature extraction of 3D model and the generation of voxelization play an auxiliary role in the training of the whole network for the 3D model generation based on an image. Additionally, a multi-view contour constraint method is proposed, to enhance the effect of the 3D model generation. In the experiment, the ShapeNet dataset is adapted to prove the effect of the 3DMGNet, which verifies the robust performance of the proposed method.

Keywords: 3D model generation; attention mechanism; deep learning; feature fusion; multi-modal data constraints.

Publication types

Letter

Abstract

Publication types

Grants and funding