IMPLICIT TEXT-IMAGE FINE-GRAINED MATCHING VIA VISUAL CONTRASTIVE ATTENTION
-
Graphical Abstract
-
Abstract
The text-image fine-grained matching task aims to align fine-grained entities in pictures and texts (eg: aligning target objects in pictures with phrase involved in text). Different from previous studies, this paper proposes a novel implicit scene-oriented text-image fine-grained matching task, which focuses on processing fine-grained matching relationships that need to rely on context or more external knowledge to identify. In particular, for this new task, this paper formulated a corresponding corpus annotation specification and annotated a text-image fine-grained matching dataset for implicit scenes. On this basis, this paper proposed a method based on visual contrastive attention to alleviate the problem of sparse semantic matching information in this new task. Experimental results show that the proposed method of visual contrastive attention achieves significant performance improvement on implicit matching task.
-
-