Crowd-Sourced Reliability of an Assessment of Lower Facial Aging Using a Validated Visual Scale

Jason D Kelly; Bryan Comstock; Timothy M Kowalewski; James M Smartt

doi:10.1097/GOX.0000000000003315

Crowd-Sourced Reliability of an Assessment of Lower Facial Aging Using a Validated Visual Scale

Plast Reconstr Surg Glob Open. 2021 Jan 25;9(1):e3315. doi: 10.1097/GOX.0000000000003315. eCollection 2021 Jan.

Authors

Jason D Kelly¹, Bryan Comstock², Timothy M Kowalewski¹, James M Smartt³

Affiliations

¹ Department of Mechanical Engineering, University of Minnesota, Minneapolis, Minn.
² Department of General Internal Medicine, University of Washington, Seattle, Wa.
³ Bucky Plastic Surgery, Philadelphia, Pa.

Abstract

Reliable and valid assessments of the visual endpoints of aesthetic surgery procedures are needed. Currently, most assessments are based on the opinion of patients and their plastic surgeons. The objective of this research was to analyze the reliability of crowdworkers assessing de-identified photographs using a validated scale that depicts lower facial aging.

Methods: Twenty photographs of the facial nasolabial region of various non-identifiable faces were obtained for which various degrees of facial aging were present. Independent crowds of 100 crowd workers were tasked with assessing the degree of aging using a photograph numeric scale. Independent groups of crowdworkers were surveyed at 4 different times (weekday daytime, weekday nighttime, weekend daytime, weekend nighttime), once a week for 2 weeks.

Results: Crowds assessing midface region photographs had an overall correlation of R = 0.979 (weekday daytime R = 0.991; weekday nighttime R = 0.985; weekend daytime R = 0.997; weekend nighttime R = 0.985). Bland-Altman test for test-retest agreement showed a normal distribution of assessments over the various times tested, with the differences in the majority of photographs being within 1 SD of the average difference in ratings.

Conclusions: Crowd assessments of facial aging in de-identified photographs displayed very strong concordance with each other, regardless of time of day or week. This shows promise toward obtaining reliable assessments of pre and postoperative results for aesthetic surgery procedures. More work must be done to quantify the reliability of assessments for other pretreatment states or the corresponding results following treatment.