Image

Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study

Giovanna Castellano, Nicola Fanelli, Raffaele Scaringi, Gennaro Vessio

FAPER Workshop - ICIAP 2023

This paper explores the complex task of generating textual descriptions for the images of artworks with neural networks. A novel synthetic dataset of captions for the images in ArtGraph/WikiArt is collected using ChatGPT and is refined with CLIP. A VLP model (GIT) and a vision transformer are fine-tuned using instance weighting and multi-task learning to generate rich and expressive artwork descriptions.