A hotel's customers personal, behavioral, demographic, and geographic dataset from Lisbon, Portugal (2015-2018)

Data Brief. 2020 Nov 24:33:106583. doi: 10.1016/j.dib.2020.106583. eCollection 2020 Dec.

Abstract

This data article describes a hotel customer dataset with 31 variables describing a total of 83,590 instances (customers). It comprehends three full years of customer behavioral data. In addition to personal and behavioral information, the dataset also contains demographic and geographical information. This dataset contributes to reducing the lack of real-world business data that can be used for educational and research purposes. The dataset can be used in data mining, machine learning, and other analytical field problems in the scope of data science. Due to its unit of analysis, it is a dataset especially suitable for building customer segmentation models, including clustering and RFM (Recency, Frequency, and Monetary value) models, but also be used in classification and regression problems.

Keywords: Classification; Clustering; Data mining; Data science; Hospitality; Machine learning; RFM modeling; Regression.