sbintuitions
/

sarashina2.2-vision-3b

@@ -22,24 +22,24 @@ This model is based on [Sarashina2.2-3B-Instruct](https://huggingface.co/sbintui
 |Model|Params(B)|[BussinessSlide VQA](https://github.com/stockmarkteam/business-slide-questions)<sup>*1</sup>|[Heron-Bench](https://arxiv.org/abs/2404.07824)<sup>*1</sup>|[JDocQA](https://arxiv.org/abs/2403.19454)<sup>*1</sup>|[JMMMU](https://arxiv.org/abs/2410.17250)|
 |-|-|-|-|-|-|
-|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|3.932|**3.214**|<u>3.327</u>|<u>0.486</u>|
 |[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|3.516|2.000|3.019|0.450|
-|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|**4.105**|2.330|**3.596**|**0.493**|
 |[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|3.311|1.893|2.626|0.437|
 |[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|3.110|2.184|-<sup>*2</sup>|0.432|
 |[Stockmark-2-VL-100B-beta](https://huggingface.co/stockmark/Stockmark-2-VL-100B-beta)|96.5|<u>3.973</u>|<u>2.563</u>|3.168|-<sup>*2</sup>|
 *1. [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was used for LLM-as-a-Judge.
-*2. Score cannot be measured because some input data exceeds the model's `max_position_embeddings`.
 ### English Performance
 |Model|Params(B)|[DocVQA](https://arxiv.org/abs/2007.00398)|[InfoVQA](https://arxiv.org/abs/2104.12756)|[RealworldQA](https://huggingface.co/datasets/xai-org/RealworldQA)
 |-|-|-|-|-|
-|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|0.831|0.567|<u>0.625</u>|
-|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|<u>0.924</u>|<u>0.750</u>|0.586|
-|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|**0.948**|**0.798**|**0.712**|
 |[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|0.823|0.541|0.553|
 |[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|0.729|0.490|0.519||

 |Model|Params(B)|[BussinessSlide VQA](https://github.com/stockmarkteam/business-slide-questions)<sup>*1</sup>|[Heron-Bench](https://arxiv.org/abs/2404.07824)<sup>*1</sup>|[JDocQA](https://arxiv.org/abs/2403.19454)<sup>*1</sup>|[JMMMU](https://arxiv.org/abs/2410.17250)|
 |-|-|-|-|-|-|
+|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|3.932|3.214|3.327|0.486|
 |[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|3.516|2.000|3.019|0.450|
+|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|4.105|2.330|3.596|0.493|
 |[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|3.311|1.893|2.626|0.437|
 |[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|3.110|2.184|-<sup>*2</sup>|0.432|
 |[Stockmark-2-VL-100B-beta](https://huggingface.co/stockmark/Stockmark-2-VL-100B-beta)|96.5|<u>3.973</u>|<u>2.563</u>|3.168|-<sup>*2</sup>|
 *1. [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was used for LLM-as-a-Judge.
+*2. These scores cannot be measured because some input data exceeds the model's `max_position_embeddings`.
 ### English Performance
 |Model|Params(B)|[DocVQA](https://arxiv.org/abs/2007.00398)|[InfoVQA](https://arxiv.org/abs/2104.12756)|[RealworldQA](https://huggingface.co/datasets/xai-org/RealworldQA)
 |-|-|-|-|-|
+|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|0.831|0.567|0.625|
+|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|0.924|0.750|0.586|
+|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|0.948|0.798|0.712|
 |[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|0.823|0.541|0.553|
 |[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|0.729|0.490|0.519||