STELA: a community-centred approach to norm elicitation for AI alignment

Stevie Bergman; Nahema Marchal; John Mellor; Shakir Mohamed; Iason Gabriel; William Isaac

doi:10.1038/s41598-024-56648-4

STELA: a community-centred approach to norm elicitation for AI alignment

Sci Rep. 2024 Mar 19;14(1):6616. doi: 10.1038/s41598-024-56648-4.

Authors

Stevie Bergman¹, Nahema Marchal², John Mellor¹, Shakir Mohamed¹, Iason Gabriel¹, William Isaac¹

Affiliations

¹ Google DeepMind, London, UK.
² Google DeepMind, London, UK. nahemamarchal@google.com.

Abstract

Value alignment, the process of ensuring that artificial intelligence (AI) systems are aligned with human values and goals, is a critical issue in AI research. Existing scholarship has mainly studied how to encode moral values into agents to guide their behaviour. Less attention has been given to the normative questions of whose values and norms AI systems should be aligned with, and how these choices should be made. To tackle these questions, this paper presents the STELA process (SocioTEchnical Language agent Alignment), a methodology resting on sociotechnical traditions of participatory, inclusive, and community-centred processes. For STELA, we conduct a series of deliberative discussions with four historically underrepresented groups in the United States in order to understand their diverse priorities and concerns when interacting with AI systems. The results of our research suggest that community-centred deliberation on the outputs of large language models is a valuable tool for eliciting latent normative perspectives directly from differently situated groups. In addition to having the potential to engender an inclusive process that is robust to the needs of communities, this methodology can provide rich contextual insights for AI alignment.

MeSH terms

Artificial Intelligence*
Humans
Language*
Morals
Rest