Rethink reporting of evaluation results in AI
Science
.
2023 Apr 14;380(6641):136-138.
doi: 10.1126/science.adf6369.
Epub 2023 Apr 13.
Authors
Ryan Burnell
1
,
Wout Schellaert
2
,
John Burden
1
3
,
Tomer D Ullman
4
,
Fernando Martinez-Plumed
2
,
Joshua B Tenenbaum
5
,
Danaja Rutar
1
,
Lucy G Cheke
1
6
,
Jascha Sohl-Dickstein
7
,
Melanie Mitchell
8
,
Douwe Kiela
9
,
Murray Shanahan
10
11
,
Ellen M Voorhees
12
,
Anthony G Cohn
13
14
15
16
,
Joel Z Leibo
10
,
Jose Hernandez-Orallo
1
2
3
Affiliations
1
Leverhulme Centre for the Future of Intelligence, University of Cambridge, Cambridge, UK.
2
Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de Valencia, València, Spain.
3
Centre for the Study of Existential Risk, University of Cambridge, Cambridge, UK.
4
Department of Psychology, Harvard University, Cambridge, MA, USA.
5
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
6
Department of Psychology, University of Cambridge, Cambridge, UK.
7
Brain team, Google, Mountainview, CA, USA.
8
Santa Fe Institute, Santa Fe, NM, USA.
9
Stanford University, Stanford, CA, USA.
10
DeepMind, London, UK.
11
Department of Computing, Imperial College London, London, UK.
12
National Institute of Standards and Technology (Retired), Gaithersburg, MD, USA.
13
School of Computing, University of Leeds, Leeds, UK.
14
Alan Turing Institute, London, UK.
15
Tongji University, Shanghai, China.
16
Shandong University, Jinan, China.
PMID:
37053341
DOI:
10.1126/science.adf6369
Abstract
Aggregate metrics and lack of access to results limit understanding.