A training set selection strategy for a universal near-infrared quantitative model

Yan-Hua Jia; Xu-Ping Liu; Yan-Chun Feng; Chang-Qin Hu

doi:10.1208/s12249-011-9638-6

A training set selection strategy for a universal near-infrared quantitative model

AAPS PharmSciTech. 2011 Jun;12(2):738-45. doi: 10.1208/s12249-011-9638-6. Epub 2011 Jun 4.

Authors

Yan-Hua Jia¹, Xu-Ping Liu, Yan-Chun Feng, Chang-Qin Hu

Affiliation

¹ Institute of Medicinal Biotechnology, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, People's Republic of China.

Abstract

The purpose of this article is to propose an empirical solution to the problem of how many clusters of complex samples should be selected to construct the training set for a universal near infrared quantitative model based on the Naes method. The sample spectra were hierarchically classified into clusters by Ward's algorithm and Euclidean distance. If the sample spectra were classified into two clusters, the 1/50 of the largest Heterogeneity value in the cluster with larger variation was set as the threshold to determine the total number of clusters. One sample was then randomly selected from each cluster to construct the training set, and the number of samples in training set equaled the number of clusters. In this study, 98 batches of rifampicin capsules with API contents ranging from 50.1% to 99.4% were studied with this strategy. The root mean square errors of cross validation and prediction were 2.54% and 2.31% for the model for rifampicin capsules, respectively. Then, we evaluated this model in terms of outlier diagnostics, accuracy, precision, and robustness. We also used the strategy of training set sample selection to revalidate the models for cefradine capsules, roxithromycin tablets, and erythromycin ethylsuccinate tablets, and the results were satisfactory. In conclusion, all results showed that this training set sample selection strategy assisted in the quick and accurate construction of quantitative models using near-infrared spectroscopy.

Publication types

Comparative Study

MeSH terms

Cluster Analysis
Models, Chemical*
Quantitative Structure-Activity Relationship
Random Allocation
Rifampin / chemistry*
Rifampin / standards*
Spectroscopy, Near-Infrared / methods
Spectroscopy, Near-Infrared / standards*

Substances

Rifampin