12-lead electrocardiogram (ECG) is a widely used method in the diagnosis of cardiovascular disease (CVD). With the increase in the number of CVD patients, the study of accurate automatic diagnosis methods via ECG has become a research hotspot. The use of deep learning-based methods can reduce the influence of human subjectivity and improve the diagnosis accuracy. In this paper, we propose a 12-lead ECG automatic diagnosis method based on channel features and temporal features fusion. Specifically, we design a gated CNN-Transformer network, in which the CNN block is used to extract signal embeddings to reduce data complexity. The dual-branch transformer structure is used to effectively extract channel and temporal features in low-dimensional embeddings, respectively. Finally, the features from the two branches are fused by the gating unit to achieve automatic CVD diagnosis from 12-lead ECG. The proposed end-to-end approach has more competitive performance than other deep learning algorithms, which achieves an overall diagnostic accuracy of 85.3% in the 12-lead ECG dataset of CPSC-2018.