Bridging the Gap of AutoGraph Between Academia and Industry: Analyzing AutoGraph Challenge at KDD Cup 2020

Zhen Xu; Lanning Wei; Huan Zhao; Rex Ying; Quanming Yao; Wei-Wei Tu; Isabelle Guyon

doi:10.3389/frai.2022.905104

Bridging the Gap of AutoGraph Between Academia and Industry: Analyzing AutoGraph Challenge at KDD Cup 2020

Front Artif Intell. 2022 Jun 16:5:905104. doi: 10.3389/frai.2022.905104. eCollection 2022.

Authors

Zhen Xu¹, Lanning Wei^{1

2}, Huan Zhao¹, Rex Ying³, Quanming Yao⁴, Wei-Wei Tu¹, Isabelle Guyon^{5

6}

Affiliations

¹ 4Paradigm, Beijing, China.
² Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
³ Department of Computer Science, Stanford University, Stanford, CA, United States.
⁴ Department of Electronic Engineering, Tsinghua University, Beijing, China.
⁵ ChaLearn, Stanford, CA, United States.
⁶ Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (INRIA), Centre National de la Recherche Scientifique (CNRS), University Paris-Saclay, Gif-sur-Yvette, France.

Abstract

Graph structured data is ubiquitous in daily life and scientific areas and has attracted increasing attention. Graph Neural Networks (GNNs) have been proved to be effective in modeling graph structured data and many variants of GNN architectures have been proposed. However, much human effort is often needed to tune the architecture depending on different datasets. Researchers naturally adopt Automated Machine Learning on Graph Learning, aiming to reduce human effort and achieve generally top-performing GNNs, but their methods focus more on the architecture search. To understand GNN practitioners' automated solutions, we organized AutoGraph Challenge at KDD Cup 2020, emphasizing automated graph neural networks for node classification. We received top solutions, especially from industrial technology companies like Meituan, Alibaba, and Twitter, which are already open sourced on GitHub. After detailed comparisons with solutions from academia, we quantify the gaps between academia and industry on modeling scope, effectiveness, and efficiency, and show that (1) academic AutoML for Graph solutions focus on GNN architecture search while industrial solutions, especially the winning ones in the KDD Cup, tend to obtain an overall solution (2) with only neural architecture search, academic solutions achieve on average 97.3% accuracy of industrial solutions (3) academic solutions are cheap to obtain with several GPU hours while industrial solutions take a few months' labors. Academic solutions also contain much fewer parameters.

Keywords: Automated Machine Learning; Graph Neural Networks; data challenge; graph machine learning; node classification.