Upload README.md
Browse files
README.md
CHANGED
|
@@ -22,24 +22,24 @@ This model is based on [Sarashina2.2-3B-Instruct](https://huggingface.co/sbintui
|
|
| 22 |
|
| 23 |
|Model|Params(B)|[BussinessSlide VQA](https://github.com/stockmarkteam/business-slide-questions)<sup>*1</sup>|[Heron-Bench](https://arxiv.org/abs/2404.07824)<sup>*1</sup>|[JDocQA](https://arxiv.org/abs/2403.19454)<sup>*1</sup>|[JMMMU](https://arxiv.org/abs/2410.17250)|
|
| 24 |
|-|-|-|-|-|-|
|
| 25 |
-
|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|3.932
|
| 26 |
|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|3.516|2.000|3.019|0.450|
|
| 27 |
-
|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4
|
| 28 |
|[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|3.311|1.893|2.626|0.437|
|
| 29 |
|[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|3.110|2.184|-<sup>*2</sup>|0.432|
|
| 30 |
|[Stockmark-2-VL-100B-beta](https://huggingface.co/stockmark/Stockmark-2-VL-100B-beta)|96.5|<u>3.973</u>|<u>2.563</u>|3.168|-<sup>*2</sup>|
|
| 31 |
|
| 32 |
*1. [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was used for LLM-as-a-Judge.
|
| 33 |
|
| 34 |
-
*2.
|
| 35 |
|
| 36 |
### English Performance
|
| 37 |
|
| 38 |
|Model|Params(B)|[DocVQA](https://arxiv.org/abs/2007.00398)|[InfoVQA](https://arxiv.org/abs/2104.12756)|[RealworldQA](https://huggingface.co/datasets/xai-org/RealworldQA)
|
| 39 |
|-|-|-|-|-|
|
| 40 |
-
|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|0.831|0.567
|
| 41 |
-
|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8
|
| 42 |
-
|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4
|
| 43 |
|[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|0.823|0.541|0.553|
|
| 44 |
|[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|0.729|0.490|0.519||
|
| 45 |
|
|
|
|
| 22 |
|
| 23 |
|Model|Params(B)|[BussinessSlide VQA](https://github.com/stockmarkteam/business-slide-questions)<sup>*1</sup>|[Heron-Bench](https://arxiv.org/abs/2404.07824)<sup>*1</sup>|[JDocQA](https://arxiv.org/abs/2403.19454)<sup>*1</sup>|[JMMMU](https://arxiv.org/abs/2410.17250)|
|
| 24 |
|-|-|-|-|-|-|
|
| 25 |
+
|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|3.932|3.214|3.327|0.486|
|
| 26 |
|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|3.516|2.000|3.019|0.450|
|
| 27 |
+
|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|4.105|2.330|3.596|0.493|
|
| 28 |
|[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|3.311|1.893|2.626|0.437|
|
| 29 |
|[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|3.110|2.184|-<sup>*2</sup>|0.432|
|
| 30 |
|[Stockmark-2-VL-100B-beta](https://huggingface.co/stockmark/Stockmark-2-VL-100B-beta)|96.5|<u>3.973</u>|<u>2.563</u>|3.168|-<sup>*2</sup>|
|
| 31 |
|
| 32 |
*1. [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was used for LLM-as-a-Judge.
|
| 33 |
|
| 34 |
+
*2. These scores cannot be measured because some input data exceeds the model's `max_position_embeddings`.
|
| 35 |
|
| 36 |
### English Performance
|
| 37 |
|
| 38 |
|Model|Params(B)|[DocVQA](https://arxiv.org/abs/2007.00398)|[InfoVQA](https://arxiv.org/abs/2104.12756)|[RealworldQA](https://huggingface.co/datasets/xai-org/RealworldQA)
|
| 39 |
|-|-|-|-|-|
|
| 40 |
+
|[Sarashina2.2-Vision-3B](https://huggingface.co/sbintuitions/sarashina2.2-vision-3b)|3.8|0.831|0.567|0.625|
|
| 41 |
+
|[Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)|3.8|0.924|0.750|0.586|
|
| 42 |
+
|[Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)|4.4|0.948|0.798|0.712|
|
| 43 |
|[InternVL3_5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)|4.7|0.823|0.541|0.553|
|
| 44 |
|[Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b)|14.4|0.729|0.490|0.519||
|
| 45 |
|