WLCD: a dataset of lifestyle in relation with women's cancer

BMC Res Notes. 2023 Aug 22;16(1):179. doi: 10.1186/s13104-023-06458-0.

Abstract

Objectives: Social media text mining has been widely used to extract information about the experiences and needs of patients regarding various diseases, especially cancer. Understanding these issues is necessary for further management in primary care. Researchers have identified that lifestyle factors such as diet, exercise, alcohol, and Smoking are associated with cancer risks, particularly women's cancer. Considering the growing trend in the global burden of women's cancer, it is essential to monitor up-to-date data sources using text mining.

Data description: We have prepared six independent datasets regarding lifestyle components and women's cancer: (1) a dataset of nutrition containing 10,161 tweets; (2) a dataset of exercise containing 9412 tweets; (3) a dataset of alcohol containing 2132 tweets; (4) a dataset of Smoking containing 4316 tweets; and (5) a dataset of lifestyle (term) containing 1861 tweets. We also construct an additional dataset: (6) a dataset by summing other components containing 27,882 tweets. These data are provided to discover people's perspectives, knowledge, and experiences regarding lifestyle and women's cancer. Hence, it should be valuable for healthcare providers to develop more efficient patient management approaches.

Keywords: Cancer; Lifestyle; Text-mining; Twitter; Women.

MeSH terms

  • Data Mining
  • Ethanol
  • Female
  • Humans
  • Life Style
  • Neoplasms*
  • Smoking
  • Tobacco Smoking

Substances

  • Ethanol