You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

RegionRet

RegionRet is a LoRA adapter model for region-level vision-language retrieval, fine-tuned from ColQwen2.5-Base using Parameter-Efficient Fine-Tuning (PEFT).

Model Details

  • Model Type: LoRA Adapter (PEFT)
  • Base Model: ColQwen2.5-Base
  • Task Type: Feature Extraction
  • Framework: PEFT 0.14.0

LoRA Configuration

  • Rank (r): 32
  • LoRA Alpha: 32
  • LoRA Dropout: 0.1
  • Target Modules: MLP projections (down_proj, gate_proj, up_proj) and attention projections (k_proj, q_proj, v_proj, o_proj), plus custom_text_proj

Model Architecture

  • Processor: ColQwen2_5_Processor
  • Max Visual Tokens: 1536
  • Attention: Flash Attention 2
  • Precision: bfloat16

Uses

Please refer to https://github.com/Aeryn666/RegionRAG.

Training Details

Training Data

  • VisRAG-Ret-Train-In-domain-data
  • Visual-CoT (DocVQA, TextCap, TextVQA, InfographicsVQA)

Training Configuration

  • Loss Function: RegionContraLoss (global_tau=0.02, local_tau=0.25, local_coef=0.01)
  • Epochs: 5
  • Batch Size: 80 per device
  • Learning Rate: 2e-4
  • Precision: bfloat16
  • Gradient Checkpointing: Enabled

Limitations

  • Requires ColQwen2.5-Base base model to function
  • Optimized for region-level vision-language retrieval tasks
  • GPU with bfloat16 and Flash Attention 2 support recommended

Citation

If you use this model, please cite:

@misc{li2025regionragregionlevelretrievalaugmentedgeneration,
      title={RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding}, 
      author={Yinglu Li and Zhiying Lu and Zhihang Liu and Yiwei Sun and Chuanbin Liu and Hongtao Xie},
      year={2025},
      eprint={2510.27261},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.27261}, 
}

License

Please refer to the license of the base model ColQwen2.5.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support