PROGpedia: Collection of source-code submitted to introductory programming assignments

Data Brief. 2023 Jan 10:46:108887. doi: 10.1016/j.dib.2023.108887. eCollection 2023 Feb.

Abstract

Learning how to program is a difficult task. To acquire the required skills, novice programmers must solve a broad range of programming activities, always supported with timely, rich, and accurate feedback. Automated assessment tools play a major role in fulfilling these needs, being a common presence in introductory programming courses. As programming exercises are not easy to produce and those loaded into these tools must adhere to specific format requirements, teachers often opt for reusing them for several years. Therefore, most automated assessment tools, particularly Mooshak, store hundreds of submissions to the same programming exercises, as these need to be kept after automatically processed for possible subsequent manual revision. Our dataset consists of the submissions to 16 programming exercises in Mooshak proposed in multiple years within the 2003-2020 timespan to undergraduate Computer Science students at the Faculty of Sciences from the University of Porto. In particular, we extract their code property graphs and store them as CSV files. The analysis of this data can enable, for instance, the generation of more concise and personalized feedback based on similar accepted submissions in the past, the identification of different strategies to solve a problem, the understanding of a student's thinking process, among many other findings.

Keywords: AST, Abstract Syntax Tree; Abstract Syntax Tree; Automated assessment; CLI, Command-Line Interface; CPG, Code Property Graph; CS, Computer Science; CSV, Comma-Separated Values; Code Property Graph; Control-flow; Data-flow; Programming learning; Semantic representation; Source code.