Papers
arxiv:2511.18519

CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

Published on Nov 23
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

CHIPS, a data-centric approach for domain adaptation of CLIP, uses a utility score to optimize image-text pair selection, achieving state-of-the-art performance with minimal data.

AI-generated summary

Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an underexplored factor in this process. We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT? We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals: faithfulness via a curvature-aware, Newton-style alignment computed in CLIP's end-point subspace; scalability via an InfoNCE-aware curvature estimator with Johnson-Lindenstrauss (JL) sketching; and retention via a selection-aware relevance weight combined with learnability to balance target adaptation against general-domain preservation. We justify this design theoretically by proving a lower-bound guarantee on the proxy's correlation with full-parameter alignment and by characterizing the bias-variance trade-offs introduced by curvature mixing and JL sketching. We evaluate CHIPS empirically across various settings: 1) CHIPS attains state-of-the-art performance among selection baselines on 17 medical benchmarks, matches full-dataset CPT with 30% of the data, and outperforms half-dataset CPT using only 10%; 2) on 31 general-domain benchmarks, CHIPS yields the smallest performance drop under 10-30% data-retention budgets. Code, data, and checkpoints will be released.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.18519 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.18519 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.18519 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.