Packaged foods with pulse ingredients in Europe: A dataset of text-mined product formulations

Data Brief. 2022 Apr 14:42:108173. doi: 10.1016/j.dib.2022.108173. eCollection 2022 Jun.

Abstract

There is a lack of methods and tools to reveal robust information on the ingredients used in packaged foods. To tackle this challenge, we developed an original method to parse ingredient lists of packaged foods. We built a dataset of food product innovations with their parsed ingredient lists. We explain the parser algorithm used to provide this dataset; and a benchmark method assessing the performance of the parsing techniques applied on those food ingredient lists. The primary data we used to test and apply this method were retrieved from MINTEL-GNPD. These data cover new food products containing pulse ingredients launched on European markets over the last decade. This work brings original results informing on the diversity of pulse species used in food products, and on the technological features of these ingredients from whole-grain to ultra-processed uses (such as protein isolates). The parsing techniques we developed can be reused to analyse other ingredient lists. This method also makes it possible to assess marketed crop biodiversity in relation to how species diversity is represented in food products, as well as the level of complexity of food formulations. Hence, this work contributes towards providing more complete information on the characteristics of foodstuffs supplied on markets for both private and public stakeholders.

Keywords: Food ingredient parsing; Food science and technology; Market biodiversity; Processed food; Pulses - legumes; Text mining.