Contrastive language and vision learning of general fashion concepts

Patrick John Chia; Giuseppe Attanasio; Federico Bianchi; Silvia Terragni; Ana Rita Magalhães; Diogo Goncalves; Ciro Greco; Jacopo Tagliabue

doi:10.1038/s41598-022-23052-9

Contrastive language and vision learning of general fashion concepts

Sci Rep. 2022 Nov 8;12(1):18958. doi: 10.1038/s41598-022-23052-9.

Authors

Patrick John Chia¹, Giuseppe Attanasio², Federico Bianchi³, Silvia Terragni^{4

5}, Ana Rita Magalhães⁶, Diogo Goncalves⁶, Ciro Greco⁷, Jacopo Tagliabue⁷

Affiliations

¹ Coveo, Montreal, Canada. pchia@coveo.com.
² Bocconi University, Milan, Italy.
³ Stanford University, Stanford, CA, USA.
⁴ Telepathy Labs, Zurich, Switzerland.
⁵ University of Milano-Bicocca, Milan, Italy.
⁶ Farfetch, Porto, Portugal.
⁷ South Park Commons, New York, USA.

Abstract

The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from general and transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry. We demonstrate the effectiveness of the representations learned by FashionCLIP with extensive tests across a variety of tasks, datasets and generalization probes. We argue that adaptations of large pre-trained models such as CLIP offer new perspectives in terms of scalability and sustainability for certain types of players in the industry. Finally, we detail the costs and environmental impact of training, and release the model weights and code as open source contribution to the community.

MeSH terms

Generalization, Psychological
Language*
Natural Language Processing*
Spatial Learning