reevaluate-clip / README.md
xuemduan's picture
Update README.md
661cbef verified
metadata
language: en
license: mit
tags:
  - clip
  - multimodal
  - contrastive-learning
  - cultural-heritage
  - reevaluate
  - information-retrieval
datasets:
  - xuemduan/reevaluate-image-text-pairs
model-index:
  - name: REEVALUATE CLIP Fine-tuned Models
    results:
      - task:
          type: image-text-retrieval
          name: Image-Text Retrieval
        dataset:
          name: Cultural Heritage Hybrid Dataset
          type: xuemduan/reevaluate-image-text-pairs
        metrics:
          - name: I2T R@1
            type: recall@1
            value: <TOBE_FILL_IN>
          - name: I2T R@5
            type: recall@5
            value: <TOBE_FILL_IN>
          - name: T2I R@1
            type: recall@1
            value: <TOBE_FILL_IN>

Domain-Adaptive CLIP for Multimodal Retrieval

The fine-tuned CLIP (Vit-L/14) used in Knowledge-Enhanced Multimodal Retrieval


📦 Available Models

Model Description Data Type
reevaluate-clip Fine-tuned on images, query texts, and description texts Image+Text

🧾 Dataset

The models were trained and evaluated on the REEVLAUATE Image-Text Pair Dataset, which contains 43,500 image–text pairs derived from Wikidata and Pilot Museums.

Each artefact is described by:

  • Image: artefact image
  • Description text: BLIP-generated natural language portion + meatadata portion
  • Query text: User query-like text

Dataset: xuemduan/reevaluate-image-text-pairs


🚀 Usage

from transformers import CLIPProcessor, CLIPModel
from PIL import Image

model = CLIPModel.from_pretrained("xuemduan/reevaluate-clip")
processor = CLIPProcessor.from_pretrained("xuemduan/reevaluate-clip")

image = Image.open("artefact.jpg")
text = "yellow flower paintings"

image_embeds = model.get_image_features(**processor(images=image, return_tensors="pt"))
text_embeds = model.get_text_features(**processor(text=[text], return_tensors="pt"))

# normalize
image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)

similarity = (image_embeds @ text_embeds.T)
print(similarity)