Machine learning and statistical models for analyzing multilevel patent data

Sci Rep. 2023 Aug 7;13(1):12783. doi: 10.1038/s41598-023-37922-3.

Abstract

A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country's healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Cluster Analysis
  • Humans
  • Likelihood Functions
  • Machine Learning*
  • Models, Statistical*