Big Data Analysis of Sports and Physical Activities among Korean Adolescents

Int J Environ Res Public Health. 2020 Aug 2;17(15):5577. doi: 10.3390/ijerph17155577.

Abstract

The Korean government (Ministry of Culture, Sports and Tourism, Ministry of Health and Welfare, and Ministry of Education) has framed policies and conducted many projects to encourage adolescents to be more physically active. Despite these efforts, the participation rate of physical activity in Korean adolescents keeps decreasing. Thus, the purpose of this study was to analyze the perception of sports and physical activity in Korean adolescents through big data analysis of the last 10 years and to provide research data and statistical direction with regard to sports and physical activity participation in Korean adolescents. For data collection, data from 1 January 2010 to 31 December 2019 were collected from Naver (NAVER Corp., Seongnam, Korea), Daum (Kakao Corp., Jeju, Korea), and Google (Alphabet Inc., Mountain View, CA, USA), which are the most widely used search engines in Korea, using TEXTOM 4.0 (The Imc Inc., Daegu, Korea), a big data collection and analysis solution. Keywords such as "adolescent + sports + physical activity" were used. TEXTOM 4.0 can generate various collection lists at once using keywords. Collected data were processed through text mining (frequency analysis, term frequency-inverse document frequency analysis) and social network analysis (SNA) (degree centrality, convergence of iterated correlations analysis) by using TEXTOM 4.0 and UCINET 6 social network analysis software (Analytic Technologies Corp., Lexington, KY, USA). A total of 9278 big data (10.36 MB) were analyzed. Frequency analysis of the top 50 terms through text mining showed exercise (872), mind (851), health (824), program (782), and burden (744) in a descending order. Term frequency-inverse document frequency analysis revealed exercise (2108.070), health (1961.843), program (1928.765), mind (1861.837), and burden (1722.687) in a descending order. SNA showed that the terms with the greatest degree of centrality were exercise (0.02857), program (0.02406), mind (0.02079), health (0.02062), and activity (0.01872) in a descending order. Convergence of the iterated correlations analysis indicated five clusters: exercise and health, child to adult, sociocultural development, therapy, and program. However, female gender, sports for all, stress, and wholesome did not have a high enough correlation to form one cluster. Thus, this study provides basic data and statistical direction to increase the rate of physical activity participation in Korean adolescents by drawing significant implications based on terms and clusters through bid data analysis.

Keywords: Korean adolescents; physical activities; sports.

MeSH terms

  • Adolescent
  • Big Data*
  • Child
  • Data Analysis
  • Exercise*
  • Female
  • Humans
  • Male
  • Republic of Korea
  • Sports*