The MongoDB injection dataset: A comprehensive collection of MongoDB - NoSQL injection attempts and vulnerabilities

Data Brief. 2024 Mar 6:54:110289. doi: 10.1016/j.dib.2024.110289. eCollection 2024 Jun.

Abstract

We present the 'NoSQL Injection Dataset for MongoDB, a comprehensive collection of data obtained from diverse projects focusing on NoSQL attacks on MongoDB databases. In the present era, we can classify databases into three main types: structured, semi-structured, and unstructured. While structured databases have played a prominent role in the past, unstructured databases like MongoDB are currently experiencing remarkable growth. Consequently, the vulnerabilities associated with these databases are also increasing. Hence, we have gathered a comprehensive dataset comprising 400 NoSQL injection commands. These commands are segregated into two categories: 221 malicious commands and 179 benign commands. The dataset was meticulously curated by combining both manually authored commands and those acquired through web scraping from reputable sources. The collected dataset serves as a valuable resource for studying and analysing NoSQL injection vulnerabilities, offering insights into potential security threats and aiding in the development of robust protection mechanisms against such attacks. The dataset includes a blend of complex and simple commands that have been enhanced. The dataset is well-suited for machine learning and data analysis, especially for security enthusiasts. The security professionals can use this dataset to train or fine tune the AI-models or LLMs in order to achieve higher attack detection accuracy. The security enthusiasts can also augment this dataset to generate more NoSQL commands and create robust security tools.

Keywords: Database vulnerabilities; Datasets; MongoDB; No-SQL databases; No-SQL injection; No-SQL injection commands; Non-structured databases.