Pre release update.
Browse files- README.md +60 -19
- added_tokens.json +1 -36
- chat_template.jinja +21 -2
- config.json +15 -48
- generation_config.json +4 -3
- model.safetensors +2 -2
- special_tokens_map.json +2 -8
- tokenizer.json +51 -366
- tokenizer_config.json +28 -308
- vocab.json +0 -0
README.md
CHANGED
|
@@ -17,26 +17,63 @@ Granite Docling 258M builds upon the IDEFICS3 architecture, but introduces two k
|
|
| 17 |
Granite-docling-258M is fully integrated into the Docling pipelines, carrying over existing [features](https://huggingface.co/ds4sd/SmolDocling-256M-preview) while introducing a number of powerful new features, including:
|
| 18 |
|
| 19 |
- 🔢 Enhanced Equation Recognition: More accurate detection and formatting of mathematical formulas
|
| 20 |
-
-
|
| 21 |
-
- 📸 Robust OCR for Documents in the wild: Accurately extracts text from handheld scans, photos, and low-quality images
|
| 22 |
-
- 🗝️ Key-Value Pair Extraction: Identifies structured key-value relationships (e.g., forms, receipts)
|
| 23 |
- 🧘 Improved Stability: Tends to avoid infinite loops more effectively
|
|
|
|
|
|
|
| 24 |
- 🌍 Japanese, Arabic and Chinese support (_experimental_)
|
| 25 |
|
| 26 |
|
| 27 |
## Evaluations
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
|
|
|
| 40 |
|
| 41 |
## Getting started
|
| 42 |
|
|
@@ -134,7 +171,7 @@ from pathlib import Path
|
|
| 134 |
MODEL_PATH = "ibm-granite/granite-docling-258M"
|
| 135 |
IMAGE_DIR = "img/" # Place your page images here
|
| 136 |
OUTPUT_DIR = "out/"
|
| 137 |
-
PROMPT_TEXT = "Convert page to
|
| 138 |
|
| 139 |
# Ensure output directory exists
|
| 140 |
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
|
@@ -154,9 +191,13 @@ image_names = []
|
|
| 154 |
for img_file in sorted(os.listdir(IMAGE_DIR)):
|
| 155 |
if img_file.lower().endswith((".png", ".jpg", ".jpeg")):
|
| 156 |
img_path = os.path.join(IMAGE_DIR, img_file)
|
| 157 |
-
|
|
|
|
| 158 |
|
| 159 |
-
prompt =
|
|
|
|
|
|
|
|
|
|
| 160 |
batched_inputs.append({"prompt": prompt, "multi_modal_data": {"image": image}})
|
| 161 |
image_names.append(os.path.splitext(img_file)[0])
|
| 162 |
|
|
@@ -242,14 +283,14 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
|
|
| 242 |
|
| 243 |
The architecture of granite-docling-258m consists of the following components:
|
| 244 |
|
| 245 |
-
(1) Vision encoder: siglip2-base-patch16-512
|
| 246 |
|
| 247 |
(2) Vision-language connector: pixel shuffle projector (as in idefics3)
|
| 248 |
|
| 249 |
(3) Large language model: Granite 165M.
|
| 250 |
|
| 251 |
-
We built upon Idefics3
|
| 252 |
-
|
| 253 |
# Training Data:
|
| 254 |
|
| 255 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets (2) internally created synthetic data targeting specific capabilities.
|
|
|
|
| 17 |
Granite-docling-258M is fully integrated into the Docling pipelines, carrying over existing [features](https://huggingface.co/ds4sd/SmolDocling-256M-preview) while introducing a number of powerful new features, including:
|
| 18 |
|
| 19 |
- 🔢 Enhanced Equation Recognition: More accurate detection and formatting of mathematical formulas
|
| 20 |
+
- 🧩 Flexible Inference Modes: Choose between full-page inference, bbox-guided region inference
|
|
|
|
|
|
|
| 21 |
- 🧘 Improved Stability: Tends to avoid infinite loops more effectively
|
| 22 |
+
- 🧮 Enhanceed Inline Equations: Better inline math recognition
|
| 23 |
+
- 🧾 Document Element QA: Answer questions about a document’s structure such as the presence and order of document elements
|
| 24 |
- 🌍 Japanese, Arabic and Chinese support (_experimental_)
|
| 25 |
|
| 26 |
|
| 27 |
## Evaluations
|
| 28 |
|
| 29 |
+
<table>
|
| 30 |
+
<thead>
|
| 31 |
+
<tr>
|
| 32 |
+
<th></th>
|
| 33 |
+
<th><b>smoldocling-256m-preview</b></th>
|
| 34 |
+
<th><b>granite-docling-258m</b></th>
|
| 35 |
+
</tr>
|
| 36 |
+
</thead>
|
| 37 |
+
<tbody>
|
| 38 |
+
<tr><td colspan="3"><b>Layout</b></td></tr>
|
| 39 |
+
<tr><td>MAP ↑</td><td>0.21</td><td><b>0.28</b></td></tr>
|
| 40 |
+
<tr><td>F1 ↑</td><td>0.79</td><td><b>0.85</b></td></tr>
|
| 41 |
+
<tr><td>Precision ↑</td><td>0.86</td><td><b>0.87</b></td></tr>
|
| 42 |
+
<tr><td>Recall ↑</td><td>0.82</td><td><b>0.89</b></td></tr>
|
| 43 |
+
|
| 44 |
+
<tr><td colspan="3"><b>Full Page OCR</b></td></tr>
|
| 45 |
+
<tr><td>Edit-distance ↓</td><td>0.48 (0.46)</td><td><b>0.46</b> (<b>0.44</b>)</td></tr>
|
| 46 |
+
<tr><td>F1 ↑</td><td><b>0.80</b> (0.76)</td><td>0.75 (<b>0.78</b>)</td></tr>
|
| 47 |
+
<tr><td>Precision ↑</td><td><b>0.89</b> (0.85)</td><td>0.81 (0.85)</td></tr>
|
| 48 |
+
<tr><td>Recall ↑</td><td><b>0.79</b> (0.74)</td><td>0.73 (<b>0.77</b>)</td></tr>
|
| 49 |
+
<tr><td>BLEU ↑</td><td><b>0.58</b> (0.54)</td><td>0.56 (<b>0.59</b>)</td></tr>
|
| 50 |
+
<tr><td>Meteor ↑</td><td>0.67 (0.67)</td><td>0.67 (<b>0.70</b>)</td></tr>
|
| 51 |
+
|
| 52 |
+
<tr><td colspan="3"><b>Code Recognition</b></td></tr>
|
| 53 |
+
<tr><td>Edit-distance ↓</td><td>0.114</td><td><b>0.013</b></td></tr>
|
| 54 |
+
<tr><td>F1 ↑</td><td>0.915</td><td><b>0.988</b></td></tr>
|
| 55 |
+
<tr><td>Precision ↑</td><td>0.94</td><td><b>0.99</b></td></tr>
|
| 56 |
+
<tr><td>Recall ↑</td><td>0.909</td><td><b>0.988</b></td></tr>
|
| 57 |
+
<tr><td>BLEU ↑</td><td>0.875</td><td><b>0.983</b></td></tr>
|
| 58 |
+
<tr><td>Meteor ↑</td><td>0.889</td><td><b>0.986</b></td></tr>
|
| 59 |
+
|
| 60 |
+
<tr><td colspan="3"><b>Equation Recognition</b></td></tr>
|
| 61 |
+
<tr><td>Edit-distance ↓</td><td>0.119</td><td><b>0.073</b></td></tr>
|
| 62 |
+
<tr><td>F1 ↑</td><td>0.947</td><td><b>0.968</b></td></tr>
|
| 63 |
+
<tr><td>Precision ↑</td><td>0.959</td><td><b>0.968</b></td></tr>
|
| 64 |
+
<tr><td>Recall ↑</td><td>0.941</td><td><b>0.969</b></td></tr>
|
| 65 |
+
<tr><td>BLEU ↑</td><td>0.824</td><td><b>0.893</b></td></tr>
|
| 66 |
+
<tr><td>Meteor ↑</td><td>0.878</td><td><b>0.927</b></td></tr>
|
| 67 |
+
|
| 68 |
+
<tr><td colspan="3"><b>Table Recognition (FinTabNet 150dpi)</b></td></tr>
|
| 69 |
+
<tr><td>TEDS (structure) ↑</td><td>0.82</td><td><b>0.97</b></td></tr>
|
| 70 |
+
<tr><td>TEDS (w/content) ↑</td><td>0.76</td><td><b>0.96</b></td></tr>
|
| 71 |
+
<tr><td colspan="3"><b>Other Benchmarks</b></td></tr>
|
| 72 |
+
<tr><td>MMStar ↑</td><td>0.17</td><td><b>0.3</b></td></tr>
|
| 73 |
+
<tr><td>OCRBench ↑</td><td>338</td><td><b>500</b></td></tr>
|
| 74 |
+
|
| 75 |
|
| 76 |
+
</table>
|
| 77 |
|
| 78 |
## Getting started
|
| 79 |
|
|
|
|
| 171 |
MODEL_PATH = "ibm-granite/granite-docling-258M"
|
| 172 |
IMAGE_DIR = "img/" # Place your page images here
|
| 173 |
OUTPUT_DIR = "out/"
|
| 174 |
+
PROMPT_TEXT = "Convert page to docling."
|
| 175 |
|
| 176 |
# Ensure output directory exists
|
| 177 |
os.makedirs(OUTPUT_DIR, exist_ok=True)
|
|
|
|
| 191 |
for img_file in sorted(os.listdir(IMAGE_DIR)):
|
| 192 |
if img_file.lower().endswith((".png", ".jpg", ".jpeg")):
|
| 193 |
img_path = os.path.join(IMAGE_DIR, img_file)
|
| 194 |
+
with Image.open(img_path) as im:
|
| 195 |
+
image = im.convert("RGB")
|
| 196 |
|
| 197 |
+
prompt = (
|
| 198 |
+
f"<|start_of_role|>user<|end_of_role|><image>{PROMPT_TEXT}<|end_of_text|>\n"
|
| 199 |
+
f"<|start_of_role|>assistant<|end_of_role|>"
|
| 200 |
+
)
|
| 201 |
batched_inputs.append({"prompt": prompt, "multi_modal_data": {"image": image}})
|
| 202 |
image_names.append(os.path.splitext(img_file)[0])
|
| 203 |
|
|
|
|
| 283 |
|
| 284 |
The architecture of granite-docling-258m consists of the following components:
|
| 285 |
|
| 286 |
+
(1) Vision encoder: [siglip2-base-patch16-512](https://huggingface.co/google/siglip2-base-patch16-512).
|
| 287 |
|
| 288 |
(2) Vision-language connector: pixel shuffle projector (as in idefics3)
|
| 289 |
|
| 290 |
(3) Large language model: Granite 165M.
|
| 291 |
|
| 292 |
+
We built upon [Idefics3](https://huggingface.co/docs/transformers/en/model_doc/idefics3) to train our model. We incorporated DocTags into our LLM’s supervised fine-tuning (SFT) data to help the model become familiar with the format, enabling faster convergence and mitigating issues previously observed with SmolDocling.
|
| 293 |
+
The model was trained using the [nanoVLM](https://github.com/huggingface/nanoVLM) framework, which provides a lightweight and efficient training setup for vision-language models
|
| 294 |
# Training Data:
|
| 295 |
|
| 296 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets (2) internally created synthetic data targeting specific capabilities.
|
added_tokens.json
CHANGED
|
@@ -1,38 +1,3 @@
|
|
| 1 |
{
|
| 2 |
-
"<
|
| 3 |
-
"<row_1_col_2>": 100353,
|
| 4 |
-
"<row_1_col_3>": 100354,
|
| 5 |
-
"<row_1_col_4>": 100355,
|
| 6 |
-
"<row_1_col_5>": 100356,
|
| 7 |
-
"<row_1_col_6>": 100357,
|
| 8 |
-
"<row_2_col_1>": 100358,
|
| 9 |
-
"<row_2_col_2>": 100359,
|
| 10 |
-
"<row_2_col_3>": 100360,
|
| 11 |
-
"<row_2_col_4>": 100361,
|
| 12 |
-
"<row_2_col_5>": 100362,
|
| 13 |
-
"<row_2_col_6>": 100363,
|
| 14 |
-
"<row_3_col_1>": 100364,
|
| 15 |
-
"<row_3_col_2>": 100365,
|
| 16 |
-
"<row_3_col_3>": 100366,
|
| 17 |
-
"<row_3_col_4>": 100367,
|
| 18 |
-
"<row_3_col_5>": 100368,
|
| 19 |
-
"<row_3_col_6>": 100369,
|
| 20 |
-
"<row_4_col_1>": 100370,
|
| 21 |
-
"<row_4_col_2>": 100371,
|
| 22 |
-
"<row_4_col_3>": 100372,
|
| 23 |
-
"<row_4_col_4>": 100373,
|
| 24 |
-
"<row_4_col_5>": 100374,
|
| 25 |
-
"<row_4_col_6>": 100375,
|
| 26 |
-
"<row_5_col_1>": 100376,
|
| 27 |
-
"<row_5_col_2>": 100377,
|
| 28 |
-
"<row_5_col_3>": 100378,
|
| 29 |
-
"<row_5_col_4>": 100379,
|
| 30 |
-
"<row_5_col_5>": 100380,
|
| 31 |
-
"<row_5_col_6>": 100381,
|
| 32 |
-
"<row_6_col_1>": 100382,
|
| 33 |
-
"<row_6_col_2>": 100383,
|
| 34 |
-
"<row_6_col_3>": 100384,
|
| 35 |
-
"<row_6_col_4>": 100385,
|
| 36 |
-
"<row_6_col_5>": 100386,
|
| 37 |
-
"<row_6_col_6>": 100387
|
| 38 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"<end_of_utterance>": 100352
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
}
|
chat_template.jinja
CHANGED
|
@@ -1,2 +1,21 @@
|
|
| 1 |
-
|
| 2 |
-
{
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- for message in messages -%}
|
| 2 |
+
{{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' -}}
|
| 3 |
+
{%- if message['content'] is string -%}
|
| 4 |
+
{{- message['content'] -}}
|
| 5 |
+
{%- else -%}
|
| 6 |
+
{%- for part in message['content'] -%}
|
| 7 |
+
{%- if part['type'] == 'text' -%}
|
| 8 |
+
{{- part['text'] -}}
|
| 9 |
+
{%- elif part['type'] == 'image' -%}
|
| 10 |
+
{{- '<image>' -}}
|
| 11 |
+
{%- endif -%}
|
| 12 |
+
{%- endfor -%}
|
| 13 |
+
{%- endif -%}
|
| 14 |
+
{{- '<|end_of_text|>
|
| 15 |
+
' -}}
|
| 16 |
+
{%- endfor -%}
|
| 17 |
+
{%- if add_generation_prompt -%}
|
| 18 |
+
{{- '<|start_of_role|>assistant' -}}
|
| 19 |
+
{%- if controls -%}{{- ' ' + controls | tojson() -}}{%- endif -%}
|
| 20 |
+
{{- '<|end_of_role|>' -}}
|
| 21 |
+
{%- endif -%}
|
config.json
CHANGED
|
@@ -1,52 +1,24 @@
|
|
| 1 |
{
|
| 2 |
-
"_flash_attn_2_enabled": true,
|
| 3 |
"architectures": [
|
| 4 |
"Idefics3ForConditionalGeneration"
|
| 5 |
],
|
| 6 |
-
"attention_bias": false,
|
| 7 |
-
"attention_dropout": 0.0,
|
| 8 |
"bos_token_id": 100264,
|
| 9 |
-
"
|
| 10 |
-
"
|
| 11 |
-
"hidden_act": "silu",
|
| 12 |
-
"hidden_size": 576,
|
| 13 |
"image_token_id": 100270,
|
| 14 |
-
"initializer_range": 0.02,
|
| 15 |
-
"intermediate_size": 1536,
|
| 16 |
-
"max_position_embeddings": 8192,
|
| 17 |
-
"mlp_bias": false,
|
| 18 |
"model_type": "idefics3",
|
| 19 |
-
"
|
| 20 |
-
"num_attention_heads": 9,
|
| 21 |
-
"num_hidden_layers": 30,
|
| 22 |
-
"num_key_value_heads": 3,
|
| 23 |
-
"pad_token_id": 128002,
|
| 24 |
-
"perceiver_config": {
|
| 25 |
-
"attention_dropout": 0.0,
|
| 26 |
-
"hidden_act": "silu",
|
| 27 |
-
"model_type": "vllama3",
|
| 28 |
-
"num_key_value_heads": 1,
|
| 29 |
-
"qk_layer_norms_perceiver": false,
|
| 30 |
-
"resampler_depth": 6,
|
| 31 |
-
"resampler_head_dim": 96,
|
| 32 |
-
"resampler_n_heads": 16,
|
| 33 |
-
"resampler_n_latents": 64
|
| 34 |
-
},
|
| 35 |
-
"pixel_shuffle_factor": 4,
|
| 36 |
-
"pretraining_tp": 1,
|
| 37 |
-
"qk_layer_norms": false,
|
| 38 |
-
"rms_norm_eps": 1e-05,
|
| 39 |
-
"rope_scaling": null,
|
| 40 |
-
"rope_theta": 100000.0,
|
| 41 |
"scale_factor": 4,
|
| 42 |
"text_config": {
|
|
|
|
| 43 |
"architectures": [
|
| 44 |
-
"
|
| 45 |
],
|
| 46 |
"attention_bias": false,
|
| 47 |
"attention_dropout": 0.0,
|
| 48 |
"bos_token_id": 100264,
|
| 49 |
-
"
|
|
|
|
| 50 |
"head_dim": 64,
|
| 51 |
"hidden_act": "silu",
|
| 52 |
"hidden_size": 576,
|
|
@@ -58,20 +30,18 @@
|
|
| 58 |
"num_attention_heads": 9,
|
| 59 |
"num_hidden_layers": 30,
|
| 60 |
"num_key_value_heads": 3,
|
|
|
|
| 61 |
"pretraining_tp": 1,
|
| 62 |
"rms_norm_eps": 1e-05,
|
| 63 |
"rope_scaling": null,
|
| 64 |
-
"rope_theta":
|
| 65 |
"tie_word_embeddings": true,
|
| 66 |
-
"
|
| 67 |
-
"
|
| 68 |
-
"vocab_size": 100480
|
| 69 |
},
|
| 70 |
"tie_word_embeddings": true,
|
| 71 |
-
"
|
| 72 |
-
"transformers_version": "4.53.0.dev0",
|
| 73 |
"use_cache": true,
|
| 74 |
-
"use_resampler": false,
|
| 75 |
"vision_config": {
|
| 76 |
"attention_dropout": 0.0,
|
| 77 |
"hidden_act": "gelu_pytorch_tanh",
|
|
@@ -89,11 +59,8 @@
|
|
| 89 |
"num_hidden_layers": 12,
|
| 90 |
"patch_size": 16,
|
| 91 |
"size": {
|
| 92 |
-
"longest_edge":
|
| 93 |
-
}
|
| 94 |
-
"tie_word_embeddings": false,
|
| 95 |
-
"torch_dtype": "bfloat16",
|
| 96 |
-
"use_base_siglip": true
|
| 97 |
},
|
| 98 |
-
"vocab_size":
|
| 99 |
}
|
|
|
|
| 1 |
{
|
|
|
|
| 2 |
"architectures": [
|
| 3 |
"Idefics3ForConditionalGeneration"
|
| 4 |
],
|
|
|
|
|
|
|
| 5 |
"bos_token_id": 100264,
|
| 6 |
+
"dtype": "bfloat16",
|
| 7 |
+
"eos_token_id": 100257,
|
|
|
|
|
|
|
| 8 |
"image_token_id": 100270,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
"model_type": "idefics3",
|
| 10 |
+
"pad_token_id": 100257,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"scale_factor": 4,
|
| 12 |
"text_config": {
|
| 13 |
+
"_name_or_path": "models/granitev06_hf_ai4k_sft_data_v4",
|
| 14 |
"architectures": [
|
| 15 |
+
"LlamaForCausalLM"
|
| 16 |
],
|
| 17 |
"attention_bias": false,
|
| 18 |
"attention_dropout": 0.0,
|
| 19 |
"bos_token_id": 100264,
|
| 20 |
+
"dtype": "bfloat16",
|
| 21 |
+
"eos_token_id": 100257,
|
| 22 |
"head_dim": 64,
|
| 23 |
"hidden_act": "silu",
|
| 24 |
"hidden_size": 576,
|
|
|
|
| 30 |
"num_attention_heads": 9,
|
| 31 |
"num_hidden_layers": 30,
|
| 32 |
"num_key_value_heads": 3,
|
| 33 |
+
"pad_token_id": 100257,
|
| 34 |
"pretraining_tp": 1,
|
| 35 |
"rms_norm_eps": 1e-05,
|
| 36 |
"rope_scaling": null,
|
| 37 |
+
"rope_theta": 100000.0,
|
| 38 |
"tie_word_embeddings": true,
|
| 39 |
+
"use_cache": false,
|
| 40 |
+
"vocab_size": 100352
|
|
|
|
| 41 |
},
|
| 42 |
"tie_word_embeddings": true,
|
| 43 |
+
"transformers_version": "4.56.1",
|
|
|
|
| 44 |
"use_cache": true,
|
|
|
|
| 45 |
"vision_config": {
|
| 46 |
"attention_dropout": 0.0,
|
| 47 |
"hidden_act": "gelu_pytorch_tanh",
|
|
|
|
| 59 |
"num_hidden_layers": 12,
|
| 60 |
"patch_size": 16,
|
| 61 |
"size": {
|
| 62 |
+
"longest_edge": 512
|
| 63 |
+
}
|
|
|
|
|
|
|
|
|
|
| 64 |
},
|
| 65 |
+
"vocab_size": 100352
|
| 66 |
}
|
generation_config.json
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
{
|
| 2 |
"_from_model_config": true,
|
| 3 |
"bos_token_id": 100264,
|
| 4 |
-
"eos_token_id":
|
| 5 |
-
"pad_token_id":
|
| 6 |
-
"transformers_version": "4.
|
|
|
|
| 7 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"_from_model_config": true,
|
| 3 |
"bos_token_id": 100264,
|
| 4 |
+
"eos_token_id": 100257,
|
| 5 |
+
"pad_token_id": 100257,
|
| 6 |
+
"transformers_version": "4.56.1",
|
| 7 |
+
"use_cache": false
|
| 8 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1cdad234deb1cde18ee6a586f849057f19851daf1fedce2e40aff791dbe46f61
|
| 3 |
+
size 515093104
|
special_tokens_map.json
CHANGED
|
@@ -23,7 +23,7 @@
|
|
| 23 |
}
|
| 24 |
],
|
| 25 |
"bos_token": {
|
| 26 |
-
"content": "<|
|
| 27 |
"lstrip": false,
|
| 28 |
"normalized": false,
|
| 29 |
"rstrip": false,
|
|
@@ -36,13 +36,7 @@
|
|
| 36 |
"rstrip": false,
|
| 37 |
"single_word": false
|
| 38 |
},
|
| 39 |
-
"pad_token":
|
| 40 |
-
"content": "<|pad|>",
|
| 41 |
-
"lstrip": false,
|
| 42 |
-
"normalized": false,
|
| 43 |
-
"rstrip": false,
|
| 44 |
-
"single_word": false
|
| 45 |
-
},
|
| 46 |
"unk_token": {
|
| 47 |
"content": "<|unk|>",
|
| 48 |
"lstrip": false,
|
|
|
|
| 23 |
}
|
| 24 |
],
|
| 25 |
"bos_token": {
|
| 26 |
+
"content": "<|start_of_role|>",
|
| 27 |
"lstrip": false,
|
| 28 |
"normalized": false,
|
| 29 |
"rstrip": false,
|
|
|
|
| 36 |
"rstrip": false,
|
| 37 |
"single_word": false
|
| 38 |
},
|
| 39 |
+
"pad_token": "<|end_of_text|>",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
"unk_token": {
|
| 41 |
"content": "<|unk|>",
|
| 42 |
"lstrip": false,
|
tokenizer.json
CHANGED
|
@@ -23,7 +23,7 @@
|
|
| 23 |
},
|
| 24 |
{
|
| 25 |
"id": 100258,
|
| 26 |
-
"content": "
|
| 27 |
"single_word": false,
|
| 28 |
"lstrip": false,
|
| 29 |
"rstrip": false,
|
|
@@ -32,7 +32,7 @@
|
|
| 32 |
},
|
| 33 |
{
|
| 34 |
"id": 100259,
|
| 35 |
-
"content": "
|
| 36 |
"single_word": false,
|
| 37 |
"lstrip": false,
|
| 38 |
"rstrip": false,
|
|
@@ -41,7 +41,7 @@
|
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"id": 100260,
|
| 44 |
-
"content": "
|
| 45 |
"single_word": false,
|
| 46 |
"lstrip": false,
|
| 47 |
"rstrip": false,
|
|
@@ -50,7 +50,7 @@
|
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"id": 100261,
|
| 53 |
-
"content": "
|
| 54 |
"single_word": false,
|
| 55 |
"lstrip": false,
|
| 56 |
"rstrip": false,
|
|
@@ -59,7 +59,7 @@
|
|
| 59 |
},
|
| 60 |
{
|
| 61 |
"id": 100262,
|
| 62 |
-
"content": "
|
| 63 |
"single_word": false,
|
| 64 |
"lstrip": false,
|
| 65 |
"rstrip": false,
|
|
@@ -68,7 +68,7 @@
|
|
| 68 |
},
|
| 69 |
{
|
| 70 |
"id": 100263,
|
| 71 |
-
"content": "
|
| 72 |
"single_word": false,
|
| 73 |
"lstrip": false,
|
| 74 |
"rstrip": false,
|
|
@@ -95,7 +95,7 @@
|
|
| 95 |
},
|
| 96 |
{
|
| 97 |
"id": 100266,
|
| 98 |
-
"content": "
|
| 99 |
"single_word": false,
|
| 100 |
"lstrip": false,
|
| 101 |
"rstrip": false,
|
|
@@ -104,7 +104,7 @@
|
|
| 104 |
},
|
| 105 |
{
|
| 106 |
"id": 100267,
|
| 107 |
-
"content": "
|
| 108 |
"single_word": false,
|
| 109 |
"lstrip": false,
|
| 110 |
"rstrip": false,
|
|
@@ -113,7 +113,7 @@
|
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"id": 100268,
|
| 116 |
-
"content": "
|
| 117 |
"single_word": false,
|
| 118 |
"lstrip": false,
|
| 119 |
"rstrip": false,
|
|
@@ -122,7 +122,7 @@
|
|
| 122 |
},
|
| 123 |
{
|
| 124 |
"id": 100269,
|
| 125 |
-
"content": "
|
| 126 |
"single_word": false,
|
| 127 |
"lstrip": false,
|
| 128 |
"rstrip": false,
|
|
@@ -554,7 +554,7 @@
|
|
| 554 |
},
|
| 555 |
{
|
| 556 |
"id": 100317,
|
| 557 |
-
"content": "
|
| 558 |
"single_word": false,
|
| 559 |
"lstrip": false,
|
| 560 |
"rstrip": false,
|
|
@@ -563,7 +563,7 @@
|
|
| 563 |
},
|
| 564 |
{
|
| 565 |
"id": 100318,
|
| 566 |
-
"content": "<paragraph",
|
| 567 |
"single_word": false,
|
| 568 |
"lstrip": false,
|
| 569 |
"rstrip": false,
|
|
@@ -662,7 +662,7 @@
|
|
| 662 |
},
|
| 663 |
{
|
| 664 |
"id": 100329,
|
| 665 |
-
"content": "<
|
| 666 |
"single_word": false,
|
| 667 |
"lstrip": false,
|
| 668 |
"rstrip": false,
|
|
@@ -743,7 +743,7 @@
|
|
| 743 |
},
|
| 744 |
{
|
| 745 |
"id": 100338,
|
| 746 |
-
"content": "
|
| 747 |
"single_word": false,
|
| 748 |
"lstrip": false,
|
| 749 |
"rstrip": false,
|
|
@@ -770,7 +770,7 @@
|
|
| 770 |
},
|
| 771 |
{
|
| 772 |
"id": 100341,
|
| 773 |
-
"content": "
|
| 774 |
"single_word": false,
|
| 775 |
"lstrip": false,
|
| 776 |
"rstrip": false,
|
|
@@ -779,7 +779,7 @@
|
|
| 779 |
},
|
| 780 |
{
|
| 781 |
"id": 100342,
|
| 782 |
-
"content": "
|
| 783 |
"single_word": false,
|
| 784 |
"lstrip": false,
|
| 785 |
"rstrip": false,
|
|
@@ -788,7 +788,7 @@
|
|
| 788 |
},
|
| 789 |
{
|
| 790 |
"id": 100343,
|
| 791 |
-
"content": "
|
| 792 |
"single_word": false,
|
| 793 |
"lstrip": false,
|
| 794 |
"rstrip": false,
|
|
@@ -797,7 +797,7 @@
|
|
| 797 |
},
|
| 798 |
{
|
| 799 |
"id": 100344,
|
| 800 |
-
"content": "
|
| 801 |
"single_word": false,
|
| 802 |
"lstrip": false,
|
| 803 |
"rstrip": false,
|
|
@@ -806,7 +806,7 @@
|
|
| 806 |
},
|
| 807 |
{
|
| 808 |
"id": 100345,
|
| 809 |
-
"content": "
|
| 810 |
"single_word": false,
|
| 811 |
"lstrip": false,
|
| 812 |
"rstrip": false,
|
|
@@ -815,7 +815,7 @@
|
|
| 815 |
},
|
| 816 |
{
|
| 817 |
"id": 100346,
|
| 818 |
-
"content": "
|
| 819 |
"single_word": false,
|
| 820 |
"lstrip": false,
|
| 821 |
"rstrip": false,
|
|
@@ -824,7 +824,7 @@
|
|
| 824 |
},
|
| 825 |
{
|
| 826 |
"id": 100347,
|
| 827 |
-
"content": "
|
| 828 |
"single_word": false,
|
| 829 |
"lstrip": false,
|
| 830 |
"rstrip": false,
|
|
@@ -833,7 +833,7 @@
|
|
| 833 |
},
|
| 834 |
{
|
| 835 |
"id": 100348,
|
| 836 |
-
"content": "
|
| 837 |
"single_word": false,
|
| 838 |
"lstrip": false,
|
| 839 |
"rstrip": false,
|
|
@@ -842,7 +842,7 @@
|
|
| 842 |
},
|
| 843 |
{
|
| 844 |
"id": 100349,
|
| 845 |
-
"content": "
|
| 846 |
"single_word": false,
|
| 847 |
"lstrip": false,
|
| 848 |
"rstrip": false,
|
|
@@ -851,7 +851,7 @@
|
|
| 851 |
},
|
| 852 |
{
|
| 853 |
"id": 100350,
|
| 854 |
-
"content": "
|
| 855 |
"single_word": false,
|
| 856 |
"lstrip": false,
|
| 857 |
"rstrip": false,
|
|
@@ -860,7 +860,7 @@
|
|
| 860 |
},
|
| 861 |
{
|
| 862 |
"id": 100351,
|
| 863 |
-
"content": "
|
| 864 |
"single_word": false,
|
| 865 |
"lstrip": false,
|
| 866 |
"rstrip": false,
|
|
@@ -869,322 +869,7 @@
|
|
| 869 |
},
|
| 870 |
{
|
| 871 |
"id": 100352,
|
| 872 |
-
"content": "<
|
| 873 |
-
"single_word": false,
|
| 874 |
-
"lstrip": false,
|
| 875 |
-
"rstrip": false,
|
| 876 |
-
"normalized": false,
|
| 877 |
-
"special": true
|
| 878 |
-
},
|
| 879 |
-
{
|
| 880 |
-
"id": 100353,
|
| 881 |
-
"content": "<row_1_col_2>",
|
| 882 |
-
"single_word": false,
|
| 883 |
-
"lstrip": false,
|
| 884 |
-
"rstrip": false,
|
| 885 |
-
"normalized": false,
|
| 886 |
-
"special": true
|
| 887 |
-
},
|
| 888 |
-
{
|
| 889 |
-
"id": 100354,
|
| 890 |
-
"content": "<row_1_col_3>",
|
| 891 |
-
"single_word": false,
|
| 892 |
-
"lstrip": false,
|
| 893 |
-
"rstrip": false,
|
| 894 |
-
"normalized": false,
|
| 895 |
-
"special": true
|
| 896 |
-
},
|
| 897 |
-
{
|
| 898 |
-
"id": 100355,
|
| 899 |
-
"content": "<row_1_col_4>",
|
| 900 |
-
"single_word": false,
|
| 901 |
-
"lstrip": false,
|
| 902 |
-
"rstrip": false,
|
| 903 |
-
"normalized": false,
|
| 904 |
-
"special": true
|
| 905 |
-
},
|
| 906 |
-
{
|
| 907 |
-
"id": 100356,
|
| 908 |
-
"content": "<row_1_col_5>",
|
| 909 |
-
"single_word": false,
|
| 910 |
-
"lstrip": false,
|
| 911 |
-
"rstrip": false,
|
| 912 |
-
"normalized": false,
|
| 913 |
-
"special": true
|
| 914 |
-
},
|
| 915 |
-
{
|
| 916 |
-
"id": 100357,
|
| 917 |
-
"content": "<row_1_col_6>",
|
| 918 |
-
"single_word": false,
|
| 919 |
-
"lstrip": false,
|
| 920 |
-
"rstrip": false,
|
| 921 |
-
"normalized": false,
|
| 922 |
-
"special": true
|
| 923 |
-
},
|
| 924 |
-
{
|
| 925 |
-
"id": 100358,
|
| 926 |
-
"content": "<row_2_col_1>",
|
| 927 |
-
"single_word": false,
|
| 928 |
-
"lstrip": false,
|
| 929 |
-
"rstrip": false,
|
| 930 |
-
"normalized": false,
|
| 931 |
-
"special": true
|
| 932 |
-
},
|
| 933 |
-
{
|
| 934 |
-
"id": 100359,
|
| 935 |
-
"content": "<row_2_col_2>",
|
| 936 |
-
"single_word": false,
|
| 937 |
-
"lstrip": false,
|
| 938 |
-
"rstrip": false,
|
| 939 |
-
"normalized": false,
|
| 940 |
-
"special": true
|
| 941 |
-
},
|
| 942 |
-
{
|
| 943 |
-
"id": 100360,
|
| 944 |
-
"content": "<row_2_col_3>",
|
| 945 |
-
"single_word": false,
|
| 946 |
-
"lstrip": false,
|
| 947 |
-
"rstrip": false,
|
| 948 |
-
"normalized": false,
|
| 949 |
-
"special": true
|
| 950 |
-
},
|
| 951 |
-
{
|
| 952 |
-
"id": 100361,
|
| 953 |
-
"content": "<row_2_col_4>",
|
| 954 |
-
"single_word": false,
|
| 955 |
-
"lstrip": false,
|
| 956 |
-
"rstrip": false,
|
| 957 |
-
"normalized": false,
|
| 958 |
-
"special": true
|
| 959 |
-
},
|
| 960 |
-
{
|
| 961 |
-
"id": 100362,
|
| 962 |
-
"content": "<row_2_col_5>",
|
| 963 |
-
"single_word": false,
|
| 964 |
-
"lstrip": false,
|
| 965 |
-
"rstrip": false,
|
| 966 |
-
"normalized": false,
|
| 967 |
-
"special": true
|
| 968 |
-
},
|
| 969 |
-
{
|
| 970 |
-
"id": 100363,
|
| 971 |
-
"content": "<row_2_col_6>",
|
| 972 |
-
"single_word": false,
|
| 973 |
-
"lstrip": false,
|
| 974 |
-
"rstrip": false,
|
| 975 |
-
"normalized": false,
|
| 976 |
-
"special": true
|
| 977 |
-
},
|
| 978 |
-
{
|
| 979 |
-
"id": 100364,
|
| 980 |
-
"content": "<row_3_col_1>",
|
| 981 |
-
"single_word": false,
|
| 982 |
-
"lstrip": false,
|
| 983 |
-
"rstrip": false,
|
| 984 |
-
"normalized": false,
|
| 985 |
-
"special": true
|
| 986 |
-
},
|
| 987 |
-
{
|
| 988 |
-
"id": 100365,
|
| 989 |
-
"content": "<row_3_col_2>",
|
| 990 |
-
"single_word": false,
|
| 991 |
-
"lstrip": false,
|
| 992 |
-
"rstrip": false,
|
| 993 |
-
"normalized": false,
|
| 994 |
-
"special": true
|
| 995 |
-
},
|
| 996 |
-
{
|
| 997 |
-
"id": 100366,
|
| 998 |
-
"content": "<row_3_col_3>",
|
| 999 |
-
"single_word": false,
|
| 1000 |
-
"lstrip": false,
|
| 1001 |
-
"rstrip": false,
|
| 1002 |
-
"normalized": false,
|
| 1003 |
-
"special": true
|
| 1004 |
-
},
|
| 1005 |
-
{
|
| 1006 |
-
"id": 100367,
|
| 1007 |
-
"content": "<row_3_col_4>",
|
| 1008 |
-
"single_word": false,
|
| 1009 |
-
"lstrip": false,
|
| 1010 |
-
"rstrip": false,
|
| 1011 |
-
"normalized": false,
|
| 1012 |
-
"special": true
|
| 1013 |
-
},
|
| 1014 |
-
{
|
| 1015 |
-
"id": 100368,
|
| 1016 |
-
"content": "<row_3_col_5>",
|
| 1017 |
-
"single_word": false,
|
| 1018 |
-
"lstrip": false,
|
| 1019 |
-
"rstrip": false,
|
| 1020 |
-
"normalized": false,
|
| 1021 |
-
"special": true
|
| 1022 |
-
},
|
| 1023 |
-
{
|
| 1024 |
-
"id": 100369,
|
| 1025 |
-
"content": "<row_3_col_6>",
|
| 1026 |
-
"single_word": false,
|
| 1027 |
-
"lstrip": false,
|
| 1028 |
-
"rstrip": false,
|
| 1029 |
-
"normalized": false,
|
| 1030 |
-
"special": true
|
| 1031 |
-
},
|
| 1032 |
-
{
|
| 1033 |
-
"id": 100370,
|
| 1034 |
-
"content": "<row_4_col_1>",
|
| 1035 |
-
"single_word": false,
|
| 1036 |
-
"lstrip": false,
|
| 1037 |
-
"rstrip": false,
|
| 1038 |
-
"normalized": false,
|
| 1039 |
-
"special": true
|
| 1040 |
-
},
|
| 1041 |
-
{
|
| 1042 |
-
"id": 100371,
|
| 1043 |
-
"content": "<row_4_col_2>",
|
| 1044 |
-
"single_word": false,
|
| 1045 |
-
"lstrip": false,
|
| 1046 |
-
"rstrip": false,
|
| 1047 |
-
"normalized": false,
|
| 1048 |
-
"special": true
|
| 1049 |
-
},
|
| 1050 |
-
{
|
| 1051 |
-
"id": 100372,
|
| 1052 |
-
"content": "<row_4_col_3>",
|
| 1053 |
-
"single_word": false,
|
| 1054 |
-
"lstrip": false,
|
| 1055 |
-
"rstrip": false,
|
| 1056 |
-
"normalized": false,
|
| 1057 |
-
"special": true
|
| 1058 |
-
},
|
| 1059 |
-
{
|
| 1060 |
-
"id": 100373,
|
| 1061 |
-
"content": "<row_4_col_4>",
|
| 1062 |
-
"single_word": false,
|
| 1063 |
-
"lstrip": false,
|
| 1064 |
-
"rstrip": false,
|
| 1065 |
-
"normalized": false,
|
| 1066 |
-
"special": true
|
| 1067 |
-
},
|
| 1068 |
-
{
|
| 1069 |
-
"id": 100374,
|
| 1070 |
-
"content": "<row_4_col_5>",
|
| 1071 |
-
"single_word": false,
|
| 1072 |
-
"lstrip": false,
|
| 1073 |
-
"rstrip": false,
|
| 1074 |
-
"normalized": false,
|
| 1075 |
-
"special": true
|
| 1076 |
-
},
|
| 1077 |
-
{
|
| 1078 |
-
"id": 100375,
|
| 1079 |
-
"content": "<row_4_col_6>",
|
| 1080 |
-
"single_word": false,
|
| 1081 |
-
"lstrip": false,
|
| 1082 |
-
"rstrip": false,
|
| 1083 |
-
"normalized": false,
|
| 1084 |
-
"special": true
|
| 1085 |
-
},
|
| 1086 |
-
{
|
| 1087 |
-
"id": 100376,
|
| 1088 |
-
"content": "<row_5_col_1>",
|
| 1089 |
-
"single_word": false,
|
| 1090 |
-
"lstrip": false,
|
| 1091 |
-
"rstrip": false,
|
| 1092 |
-
"normalized": false,
|
| 1093 |
-
"special": true
|
| 1094 |
-
},
|
| 1095 |
-
{
|
| 1096 |
-
"id": 100377,
|
| 1097 |
-
"content": "<row_5_col_2>",
|
| 1098 |
-
"single_word": false,
|
| 1099 |
-
"lstrip": false,
|
| 1100 |
-
"rstrip": false,
|
| 1101 |
-
"normalized": false,
|
| 1102 |
-
"special": true
|
| 1103 |
-
},
|
| 1104 |
-
{
|
| 1105 |
-
"id": 100378,
|
| 1106 |
-
"content": "<row_5_col_3>",
|
| 1107 |
-
"single_word": false,
|
| 1108 |
-
"lstrip": false,
|
| 1109 |
-
"rstrip": false,
|
| 1110 |
-
"normalized": false,
|
| 1111 |
-
"special": true
|
| 1112 |
-
},
|
| 1113 |
-
{
|
| 1114 |
-
"id": 100379,
|
| 1115 |
-
"content": "<row_5_col_4>",
|
| 1116 |
-
"single_word": false,
|
| 1117 |
-
"lstrip": false,
|
| 1118 |
-
"rstrip": false,
|
| 1119 |
-
"normalized": false,
|
| 1120 |
-
"special": true
|
| 1121 |
-
},
|
| 1122 |
-
{
|
| 1123 |
-
"id": 100380,
|
| 1124 |
-
"content": "<row_5_col_5>",
|
| 1125 |
-
"single_word": false,
|
| 1126 |
-
"lstrip": false,
|
| 1127 |
-
"rstrip": false,
|
| 1128 |
-
"normalized": false,
|
| 1129 |
-
"special": true
|
| 1130 |
-
},
|
| 1131 |
-
{
|
| 1132 |
-
"id": 100381,
|
| 1133 |
-
"content": "<row_5_col_6>",
|
| 1134 |
-
"single_word": false,
|
| 1135 |
-
"lstrip": false,
|
| 1136 |
-
"rstrip": false,
|
| 1137 |
-
"normalized": false,
|
| 1138 |
-
"special": true
|
| 1139 |
-
},
|
| 1140 |
-
{
|
| 1141 |
-
"id": 100382,
|
| 1142 |
-
"content": "<row_6_col_1>",
|
| 1143 |
-
"single_word": false,
|
| 1144 |
-
"lstrip": false,
|
| 1145 |
-
"rstrip": false,
|
| 1146 |
-
"normalized": false,
|
| 1147 |
-
"special": true
|
| 1148 |
-
},
|
| 1149 |
-
{
|
| 1150 |
-
"id": 100383,
|
| 1151 |
-
"content": "<row_6_col_2>",
|
| 1152 |
-
"single_word": false,
|
| 1153 |
-
"lstrip": false,
|
| 1154 |
-
"rstrip": false,
|
| 1155 |
-
"normalized": false,
|
| 1156 |
-
"special": true
|
| 1157 |
-
},
|
| 1158 |
-
{
|
| 1159 |
-
"id": 100384,
|
| 1160 |
-
"content": "<row_6_col_3>",
|
| 1161 |
-
"single_word": false,
|
| 1162 |
-
"lstrip": false,
|
| 1163 |
-
"rstrip": false,
|
| 1164 |
-
"normalized": false,
|
| 1165 |
-
"special": true
|
| 1166 |
-
},
|
| 1167 |
-
{
|
| 1168 |
-
"id": 100385,
|
| 1169 |
-
"content": "<row_6_col_4>",
|
| 1170 |
-
"single_word": false,
|
| 1171 |
-
"lstrip": false,
|
| 1172 |
-
"rstrip": false,
|
| 1173 |
-
"normalized": false,
|
| 1174 |
-
"special": true
|
| 1175 |
-
},
|
| 1176 |
-
{
|
| 1177 |
-
"id": 100386,
|
| 1178 |
-
"content": "<row_6_col_5>",
|
| 1179 |
-
"single_word": false,
|
| 1180 |
-
"lstrip": false,
|
| 1181 |
-
"rstrip": false,
|
| 1182 |
-
"normalized": false,
|
| 1183 |
-
"special": true
|
| 1184 |
-
},
|
| 1185 |
-
{
|
| 1186 |
-
"id": 100387,
|
| 1187 |
-
"content": "<row_6_col_6>",
|
| 1188 |
"single_word": false,
|
| 1189 |
"lstrip": false,
|
| 1190 |
"rstrip": false,
|
|
@@ -101479,18 +101164,18 @@
|
|
| 101479 |
"ĠConveyor": 100255,
|
| 101480 |
"<|pad|>": 100256,
|
| 101481 |
"<|end_of_text|>": 100257,
|
| 101482 |
-
"
|
| 101483 |
-
"
|
| 101484 |
-
"
|
| 101485 |
-
"
|
| 101486 |
-
"
|
| 101487 |
-
"
|
| 101488 |
"<|start_of_role|>": 100264,
|
| 101489 |
"<|end_of_role|>": 100265,
|
| 101490 |
-
"
|
| 101491 |
-
"
|
| 101492 |
-
"
|
| 101493 |
-
"
|
| 101494 |
"<image>": 100270,
|
| 101495 |
"<caption>": 100271,
|
| 101496 |
"</caption>": 100272,
|
|
@@ -101538,8 +101223,8 @@
|
|
| 101538 |
"<page_break>": 100314,
|
| 101539 |
"<smiles>": 100315,
|
| 101540 |
"</smiles>": 100316,
|
| 101541 |
-
"
|
| 101542 |
-
"<paragraph": 100318,
|
| 101543 |
"</paragraph>": 100319,
|
| 101544 |
"<references>": 100320,
|
| 101545 |
"</references>": 100321,
|
|
@@ -101550,7 +101235,7 @@
|
|
| 101550 |
"<group>": 100326,
|
| 101551 |
"<doctag>": 100327,
|
| 101552 |
"</doctag>": 100328,
|
| 101553 |
-
"<
|
| 101554 |
"<fcel>": 100330,
|
| 101555 |
"<ecel>": 100331,
|
| 101556 |
"<lcel>": 100332,
|
|
@@ -101559,20 +101244,20 @@
|
|
| 101559 |
"<nl>": 100335,
|
| 101560 |
"<ched>": 100336,
|
| 101561 |
"<rhed>": 100337,
|
| 101562 |
-
"
|
| 101563 |
"<fake_token_around_image>": 100339,
|
| 101564 |
"<global-img>": 100340,
|
| 101565 |
-
"
|
| 101566 |
-
"
|
| 101567 |
-
"
|
| 101568 |
-
"
|
| 101569 |
-
"
|
| 101570 |
-
"
|
| 101571 |
-
"
|
| 101572 |
-
"
|
| 101573 |
-
"
|
| 101574 |
-
"
|
| 101575 |
-
"
|
| 101576 |
},
|
| 101577 |
"merges": [
|
| 101578 |
[
|
|
|
|
| 23 |
},
|
| 24 |
{
|
| 25 |
"id": 100258,
|
| 26 |
+
"content": "<row_1_col_1>",
|
| 27 |
"single_word": false,
|
| 28 |
"lstrip": false,
|
| 29 |
"rstrip": false,
|
|
|
|
| 32 |
},
|
| 33 |
{
|
| 34 |
"id": 100259,
|
| 35 |
+
"content": "<row_1_col_2>",
|
| 36 |
"single_word": false,
|
| 37 |
"lstrip": false,
|
| 38 |
"rstrip": false,
|
|
|
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"id": 100260,
|
| 44 |
+
"content": "<text>",
|
| 45 |
"single_word": false,
|
| 46 |
"lstrip": false,
|
| 47 |
"rstrip": false,
|
|
|
|
| 50 |
},
|
| 51 |
{
|
| 52 |
"id": 100261,
|
| 53 |
+
"content": "<row_1_col_3>",
|
| 54 |
"single_word": false,
|
| 55 |
"lstrip": false,
|
| 56 |
"rstrip": false,
|
|
|
|
| 59 |
},
|
| 60 |
{
|
| 61 |
"id": 100262,
|
| 62 |
+
"content": "<row_1_col_4>",
|
| 63 |
"single_word": false,
|
| 64 |
"lstrip": false,
|
| 65 |
"rstrip": false,
|
|
|
|
| 68 |
},
|
| 69 |
{
|
| 70 |
"id": 100263,
|
| 71 |
+
"content": "<row_2_col_1>",
|
| 72 |
"single_word": false,
|
| 73 |
"lstrip": false,
|
| 74 |
"rstrip": false,
|
|
|
|
| 95 |
},
|
| 96 |
{
|
| 97 |
"id": 100266,
|
| 98 |
+
"content": "</title>",
|
| 99 |
"single_word": false,
|
| 100 |
"lstrip": false,
|
| 101 |
"rstrip": false,
|
|
|
|
| 104 |
},
|
| 105 |
{
|
| 106 |
"id": 100267,
|
| 107 |
+
"content": "<row_2_col_2>",
|
| 108 |
"single_word": false,
|
| 109 |
"lstrip": false,
|
| 110 |
"rstrip": false,
|
|
|
|
| 113 |
},
|
| 114 |
{
|
| 115 |
"id": 100268,
|
| 116 |
+
"content": "<row_2_col_3>",
|
| 117 |
"single_word": false,
|
| 118 |
"lstrip": false,
|
| 119 |
"rstrip": false,
|
|
|
|
| 122 |
},
|
| 123 |
{
|
| 124 |
"id": 100269,
|
| 125 |
+
"content": "<title>",
|
| 126 |
"single_word": false,
|
| 127 |
"lstrip": false,
|
| 128 |
"rstrip": false,
|
|
|
|
| 554 |
},
|
| 555 |
{
|
| 556 |
"id": 100317,
|
| 557 |
+
"content": "</text>",
|
| 558 |
"single_word": false,
|
| 559 |
"lstrip": false,
|
| 560 |
"rstrip": false,
|
|
|
|
| 563 |
},
|
| 564 |
{
|
| 565 |
"id": 100318,
|
| 566 |
+
"content": "<paragraph>",
|
| 567 |
"single_word": false,
|
| 568 |
"lstrip": false,
|
| 569 |
"rstrip": false,
|
|
|
|
| 662 |
},
|
| 663 |
{
|
| 664 |
"id": 100329,
|
| 665 |
+
"content": "<rec_",
|
| 666 |
"single_word": false,
|
| 667 |
"lstrip": false,
|
| 668 |
"rstrip": false,
|
|
|
|
| 743 |
},
|
| 744 |
{
|
| 745 |
"id": 100338,
|
| 746 |
+
"content": "<|unk|>",
|
| 747 |
"single_word": false,
|
| 748 |
"lstrip": false,
|
| 749 |
"rstrip": false,
|
|
|
|
| 770 |
},
|
| 771 |
{
|
| 772 |
"id": 100341,
|
| 773 |
+
"content": "<row_2_col_4>",
|
| 774 |
"single_word": false,
|
| 775 |
"lstrip": false,
|
| 776 |
"rstrip": false,
|
|
|
|
| 779 |
},
|
| 780 |
{
|
| 781 |
"id": 100342,
|
| 782 |
+
"content": "<row_3_col_1>",
|
| 783 |
"single_word": false,
|
| 784 |
"lstrip": false,
|
| 785 |
"rstrip": false,
|
|
|
|
| 788 |
},
|
| 789 |
{
|
| 790 |
"id": 100343,
|
| 791 |
+
"content": "<row_3_col_2>",
|
| 792 |
"single_word": false,
|
| 793 |
"lstrip": false,
|
| 794 |
"rstrip": false,
|
|
|
|
| 797 |
},
|
| 798 |
{
|
| 799 |
"id": 100344,
|
| 800 |
+
"content": "<row_3_col_3>",
|
| 801 |
"single_word": false,
|
| 802 |
"lstrip": false,
|
| 803 |
"rstrip": false,
|
|
|
|
| 806 |
},
|
| 807 |
{
|
| 808 |
"id": 100345,
|
| 809 |
+
"content": "<row_3_col_4>",
|
| 810 |
"single_word": false,
|
| 811 |
"lstrip": false,
|
| 812 |
"rstrip": false,
|
|
|
|
| 815 |
},
|
| 816 |
{
|
| 817 |
"id": 100346,
|
| 818 |
+
"content": "<row_4_col_1>",
|
| 819 |
"single_word": false,
|
| 820 |
"lstrip": false,
|
| 821 |
"rstrip": false,
|
|
|
|
| 824 |
},
|
| 825 |
{
|
| 826 |
"id": 100347,
|
| 827 |
+
"content": "<row_4_col_2>",
|
| 828 |
"single_word": false,
|
| 829 |
"lstrip": false,
|
| 830 |
"rstrip": false,
|
|
|
|
| 833 |
},
|
| 834 |
{
|
| 835 |
"id": 100348,
|
| 836 |
+
"content": "<row_4_col_3>",
|
| 837 |
"single_word": false,
|
| 838 |
"lstrip": false,
|
| 839 |
"rstrip": false,
|
|
|
|
| 842 |
},
|
| 843 |
{
|
| 844 |
"id": 100349,
|
| 845 |
+
"content": "<row_4_col_4>",
|
| 846 |
"single_word": false,
|
| 847 |
"lstrip": false,
|
| 848 |
"rstrip": false,
|
|
|
|
| 851 |
},
|
| 852 |
{
|
| 853 |
"id": 100350,
|
| 854 |
+
"content": "<code>",
|
| 855 |
"single_word": false,
|
| 856 |
"lstrip": false,
|
| 857 |
"rstrip": false,
|
|
|
|
| 860 |
},
|
| 861 |
{
|
| 862 |
"id": 100351,
|
| 863 |
+
"content": "</code>",
|
| 864 |
"single_word": false,
|
| 865 |
"lstrip": false,
|
| 866 |
"rstrip": false,
|
|
|
|
| 869 |
},
|
| 870 |
{
|
| 871 |
"id": 100352,
|
| 872 |
+
"content": "<end_of_utterance>",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 873 |
"single_word": false,
|
| 874 |
"lstrip": false,
|
| 875 |
"rstrip": false,
|
|
|
|
| 101164 |
"ĠConveyor": 100255,
|
| 101165 |
"<|pad|>": 100256,
|
| 101166 |
"<|end_of_text|>": 100257,
|
| 101167 |
+
"<row_1_col_1>": 100258,
|
| 101168 |
+
"<row_1_col_2>": 100259,
|
| 101169 |
+
"<text>": 100260,
|
| 101170 |
+
"<row_1_col_3>": 100261,
|
| 101171 |
+
"<row_1_col_4>": 100262,
|
| 101172 |
+
"<row_2_col_1>": 100263,
|
| 101173 |
"<|start_of_role|>": 100264,
|
| 101174 |
"<|end_of_role|>": 100265,
|
| 101175 |
+
"</title>": 100266,
|
| 101176 |
+
"<row_2_col_2>": 100267,
|
| 101177 |
+
"<row_2_col_3>": 100268,
|
| 101178 |
+
"<title>": 100269,
|
| 101179 |
"<image>": 100270,
|
| 101180 |
"<caption>": 100271,
|
| 101181 |
"</caption>": 100272,
|
|
|
|
| 101223 |
"<page_break>": 100314,
|
| 101224 |
"<smiles>": 100315,
|
| 101225 |
"</smiles>": 100316,
|
| 101226 |
+
"</text>": 100317,
|
| 101227 |
+
"<paragraph>": 100318,
|
| 101228 |
"</paragraph>": 100319,
|
| 101229 |
"<references>": 100320,
|
| 101230 |
"</references>": 100321,
|
|
|
|
| 101235 |
"<group>": 100326,
|
| 101236 |
"<doctag>": 100327,
|
| 101237 |
"</doctag>": 100328,
|
| 101238 |
+
"<rec_": 100329,
|
| 101239 |
"<fcel>": 100330,
|
| 101240 |
"<ecel>": 100331,
|
| 101241 |
"<lcel>": 100332,
|
|
|
|
| 101244 |
"<nl>": 100335,
|
| 101245 |
"<ched>": 100336,
|
| 101246 |
"<rhed>": 100337,
|
| 101247 |
+
"<|unk|>": 100338,
|
| 101248 |
"<fake_token_around_image>": 100339,
|
| 101249 |
"<global-img>": 100340,
|
| 101250 |
+
"<row_2_col_4>": 100341,
|
| 101251 |
+
"<row_3_col_1>": 100342,
|
| 101252 |
+
"<row_3_col_2>": 100343,
|
| 101253 |
+
"<row_3_col_3>": 100344,
|
| 101254 |
+
"<row_3_col_4>": 100345,
|
| 101255 |
+
"<row_4_col_1>": 100346,
|
| 101256 |
+
"<row_4_col_2>": 100347,
|
| 101257 |
+
"<row_4_col_3>": 100348,
|
| 101258 |
+
"<row_4_col_4>": 100349,
|
| 101259 |
+
"<code>": 100350,
|
| 101260 |
+
"</code>": 100351
|
| 101261 |
},
|
| 101262 |
"merges": [
|
| 101263 |
[
|
tokenizer_config.json
CHANGED
|
@@ -19,7 +19,7 @@
|
|
| 19 |
"special": true
|
| 20 |
},
|
| 21 |
"100258": {
|
| 22 |
-
"content": "
|
| 23 |
"lstrip": false,
|
| 24 |
"normalized": false,
|
| 25 |
"rstrip": false,
|
|
@@ -27,7 +27,7 @@
|
|
| 27 |
"special": true
|
| 28 |
},
|
| 29 |
"100259": {
|
| 30 |
-
"content": "
|
| 31 |
"lstrip": false,
|
| 32 |
"normalized": false,
|
| 33 |
"rstrip": false,
|
|
@@ -35,7 +35,7 @@
|
|
| 35 |
"special": true
|
| 36 |
},
|
| 37 |
"100260": {
|
| 38 |
-
"content": "
|
| 39 |
"lstrip": false,
|
| 40 |
"normalized": false,
|
| 41 |
"rstrip": false,
|
|
@@ -43,7 +43,7 @@
|
|
| 43 |
"special": true
|
| 44 |
},
|
| 45 |
"100261": {
|
| 46 |
-
"content": "
|
| 47 |
"lstrip": false,
|
| 48 |
"normalized": false,
|
| 49 |
"rstrip": false,
|
|
@@ -51,7 +51,7 @@
|
|
| 51 |
"special": true
|
| 52 |
},
|
| 53 |
"100262": {
|
| 54 |
-
"content": "
|
| 55 |
"lstrip": false,
|
| 56 |
"normalized": false,
|
| 57 |
"rstrip": false,
|
|
@@ -59,7 +59,7 @@
|
|
| 59 |
"special": true
|
| 60 |
},
|
| 61 |
"100263": {
|
| 62 |
-
"content": "
|
| 63 |
"lstrip": false,
|
| 64 |
"normalized": false,
|
| 65 |
"rstrip": false,
|
|
@@ -83,7 +83,7 @@
|
|
| 83 |
"special": true
|
| 84 |
},
|
| 85 |
"100266": {
|
| 86 |
-
"content": "
|
| 87 |
"lstrip": false,
|
| 88 |
"normalized": false,
|
| 89 |
"rstrip": false,
|
|
@@ -91,7 +91,7 @@
|
|
| 91 |
"special": true
|
| 92 |
},
|
| 93 |
"100267": {
|
| 94 |
-
"content": "
|
| 95 |
"lstrip": false,
|
| 96 |
"normalized": false,
|
| 97 |
"rstrip": false,
|
|
@@ -99,7 +99,7 @@
|
|
| 99 |
"special": true
|
| 100 |
},
|
| 101 |
"100268": {
|
| 102 |
-
"content": "
|
| 103 |
"lstrip": false,
|
| 104 |
"normalized": false,
|
| 105 |
"rstrip": false,
|
|
@@ -107,7 +107,7 @@
|
|
| 107 |
"special": true
|
| 108 |
},
|
| 109 |
"100269": {
|
| 110 |
-
"content": "
|
| 111 |
"lstrip": false,
|
| 112 |
"normalized": false,
|
| 113 |
"rstrip": false,
|
|
@@ -491,7 +491,7 @@
|
|
| 491 |
"special": true
|
| 492 |
},
|
| 493 |
"100317": {
|
| 494 |
-
"content": "
|
| 495 |
"lstrip": false,
|
| 496 |
"normalized": false,
|
| 497 |
"rstrip": false,
|
|
@@ -499,7 +499,7 @@
|
|
| 499 |
"special": true
|
| 500 |
},
|
| 501 |
"100318": {
|
| 502 |
-
"content": "<paragraph",
|
| 503 |
"lstrip": false,
|
| 504 |
"normalized": false,
|
| 505 |
"rstrip": false,
|
|
@@ -587,7 +587,7 @@
|
|
| 587 |
"special": true
|
| 588 |
},
|
| 589 |
"100329": {
|
| 590 |
-
"content": "<
|
| 591 |
"lstrip": false,
|
| 592 |
"normalized": false,
|
| 593 |
"rstrip": false,
|
|
@@ -659,7 +659,7 @@
|
|
| 659 |
"special": true
|
| 660 |
},
|
| 661 |
"100338": {
|
| 662 |
-
"content": "
|
| 663 |
"lstrip": false,
|
| 664 |
"normalized": false,
|
| 665 |
"rstrip": false,
|
|
@@ -683,7 +683,7 @@
|
|
| 683 |
"special": true
|
| 684 |
},
|
| 685 |
"100341": {
|
| 686 |
-
"content": "
|
| 687 |
"lstrip": false,
|
| 688 |
"normalized": false,
|
| 689 |
"rstrip": false,
|
|
@@ -691,7 +691,7 @@
|
|
| 691 |
"special": true
|
| 692 |
},
|
| 693 |
"100342": {
|
| 694 |
-
"content": "
|
| 695 |
"lstrip": false,
|
| 696 |
"normalized": false,
|
| 697 |
"rstrip": false,
|
|
@@ -699,7 +699,7 @@
|
|
| 699 |
"special": true
|
| 700 |
},
|
| 701 |
"100343": {
|
| 702 |
-
"content": "
|
| 703 |
"lstrip": false,
|
| 704 |
"normalized": false,
|
| 705 |
"rstrip": false,
|
|
@@ -707,7 +707,7 @@
|
|
| 707 |
"special": true
|
| 708 |
},
|
| 709 |
"100344": {
|
| 710 |
-
"content": "
|
| 711 |
"lstrip": false,
|
| 712 |
"normalized": false,
|
| 713 |
"rstrip": false,
|
|
@@ -715,7 +715,7 @@
|
|
| 715 |
"special": true
|
| 716 |
},
|
| 717 |
"100345": {
|
| 718 |
-
"content": "
|
| 719 |
"lstrip": false,
|
| 720 |
"normalized": false,
|
| 721 |
"rstrip": false,
|
|
@@ -723,7 +723,7 @@
|
|
| 723 |
"special": true
|
| 724 |
},
|
| 725 |
"100346": {
|
| 726 |
-
"content": "
|
| 727 |
"lstrip": false,
|
| 728 |
"normalized": false,
|
| 729 |
"rstrip": false,
|
|
@@ -731,7 +731,7 @@
|
|
| 731 |
"special": true
|
| 732 |
},
|
| 733 |
"100347": {
|
| 734 |
-
"content": "
|
| 735 |
"lstrip": false,
|
| 736 |
"normalized": false,
|
| 737 |
"rstrip": false,
|
|
@@ -739,7 +739,7 @@
|
|
| 739 |
"special": true
|
| 740 |
},
|
| 741 |
"100348": {
|
| 742 |
-
"content": "
|
| 743 |
"lstrip": false,
|
| 744 |
"normalized": false,
|
| 745 |
"rstrip": false,
|
|
@@ -747,7 +747,7 @@
|
|
| 747 |
"special": true
|
| 748 |
},
|
| 749 |
"100349": {
|
| 750 |
-
"content": "
|
| 751 |
"lstrip": false,
|
| 752 |
"normalized": false,
|
| 753 |
"rstrip": false,
|
|
@@ -755,7 +755,7 @@
|
|
| 755 |
"special": true
|
| 756 |
},
|
| 757 |
"100350": {
|
| 758 |
-
"content": "
|
| 759 |
"lstrip": false,
|
| 760 |
"normalized": false,
|
| 761 |
"rstrip": false,
|
|
@@ -763,7 +763,7 @@
|
|
| 763 |
"special": true
|
| 764 |
},
|
| 765 |
"100351": {
|
| 766 |
-
"content": "
|
| 767 |
"lstrip": false,
|
| 768 |
"normalized": false,
|
| 769 |
"rstrip": false,
|
|
@@ -771,287 +771,7 @@
|
|
| 771 |
"special": true
|
| 772 |
},
|
| 773 |
"100352": {
|
| 774 |
-
"content": "<
|
| 775 |
-
"lstrip": false,
|
| 776 |
-
"normalized": false,
|
| 777 |
-
"rstrip": false,
|
| 778 |
-
"single_word": false,
|
| 779 |
-
"special": true
|
| 780 |
-
},
|
| 781 |
-
"100353": {
|
| 782 |
-
"content": "<row_1_col_2>",
|
| 783 |
-
"lstrip": false,
|
| 784 |
-
"normalized": false,
|
| 785 |
-
"rstrip": false,
|
| 786 |
-
"single_word": false,
|
| 787 |
-
"special": true
|
| 788 |
-
},
|
| 789 |
-
"100354": {
|
| 790 |
-
"content": "<row_1_col_3>",
|
| 791 |
-
"lstrip": false,
|
| 792 |
-
"normalized": false,
|
| 793 |
-
"rstrip": false,
|
| 794 |
-
"single_word": false,
|
| 795 |
-
"special": true
|
| 796 |
-
},
|
| 797 |
-
"100355": {
|
| 798 |
-
"content": "<row_1_col_4>",
|
| 799 |
-
"lstrip": false,
|
| 800 |
-
"normalized": false,
|
| 801 |
-
"rstrip": false,
|
| 802 |
-
"single_word": false,
|
| 803 |
-
"special": true
|
| 804 |
-
},
|
| 805 |
-
"100356": {
|
| 806 |
-
"content": "<row_1_col_5>",
|
| 807 |
-
"lstrip": false,
|
| 808 |
-
"normalized": false,
|
| 809 |
-
"rstrip": false,
|
| 810 |
-
"single_word": false,
|
| 811 |
-
"special": true
|
| 812 |
-
},
|
| 813 |
-
"100357": {
|
| 814 |
-
"content": "<row_1_col_6>",
|
| 815 |
-
"lstrip": false,
|
| 816 |
-
"normalized": false,
|
| 817 |
-
"rstrip": false,
|
| 818 |
-
"single_word": false,
|
| 819 |
-
"special": true
|
| 820 |
-
},
|
| 821 |
-
"100358": {
|
| 822 |
-
"content": "<row_2_col_1>",
|
| 823 |
-
"lstrip": false,
|
| 824 |
-
"normalized": false,
|
| 825 |
-
"rstrip": false,
|
| 826 |
-
"single_word": false,
|
| 827 |
-
"special": true
|
| 828 |
-
},
|
| 829 |
-
"100359": {
|
| 830 |
-
"content": "<row_2_col_2>",
|
| 831 |
-
"lstrip": false,
|
| 832 |
-
"normalized": false,
|
| 833 |
-
"rstrip": false,
|
| 834 |
-
"single_word": false,
|
| 835 |
-
"special": true
|
| 836 |
-
},
|
| 837 |
-
"100360": {
|
| 838 |
-
"content": "<row_2_col_3>",
|
| 839 |
-
"lstrip": false,
|
| 840 |
-
"normalized": false,
|
| 841 |
-
"rstrip": false,
|
| 842 |
-
"single_word": false,
|
| 843 |
-
"special": true
|
| 844 |
-
},
|
| 845 |
-
"100361": {
|
| 846 |
-
"content": "<row_2_col_4>",
|
| 847 |
-
"lstrip": false,
|
| 848 |
-
"normalized": false,
|
| 849 |
-
"rstrip": false,
|
| 850 |
-
"single_word": false,
|
| 851 |
-
"special": true
|
| 852 |
-
},
|
| 853 |
-
"100362": {
|
| 854 |
-
"content": "<row_2_col_5>",
|
| 855 |
-
"lstrip": false,
|
| 856 |
-
"normalized": false,
|
| 857 |
-
"rstrip": false,
|
| 858 |
-
"single_word": false,
|
| 859 |
-
"special": true
|
| 860 |
-
},
|
| 861 |
-
"100363": {
|
| 862 |
-
"content": "<row_2_col_6>",
|
| 863 |
-
"lstrip": false,
|
| 864 |
-
"normalized": false,
|
| 865 |
-
"rstrip": false,
|
| 866 |
-
"single_word": false,
|
| 867 |
-
"special": true
|
| 868 |
-
},
|
| 869 |
-
"100364": {
|
| 870 |
-
"content": "<row_3_col_1>",
|
| 871 |
-
"lstrip": false,
|
| 872 |
-
"normalized": false,
|
| 873 |
-
"rstrip": false,
|
| 874 |
-
"single_word": false,
|
| 875 |
-
"special": true
|
| 876 |
-
},
|
| 877 |
-
"100365": {
|
| 878 |
-
"content": "<row_3_col_2>",
|
| 879 |
-
"lstrip": false,
|
| 880 |
-
"normalized": false,
|
| 881 |
-
"rstrip": false,
|
| 882 |
-
"single_word": false,
|
| 883 |
-
"special": true
|
| 884 |
-
},
|
| 885 |
-
"100366": {
|
| 886 |
-
"content": "<row_3_col_3>",
|
| 887 |
-
"lstrip": false,
|
| 888 |
-
"normalized": false,
|
| 889 |
-
"rstrip": false,
|
| 890 |
-
"single_word": false,
|
| 891 |
-
"special": true
|
| 892 |
-
},
|
| 893 |
-
"100367": {
|
| 894 |
-
"content": "<row_3_col_4>",
|
| 895 |
-
"lstrip": false,
|
| 896 |
-
"normalized": false,
|
| 897 |
-
"rstrip": false,
|
| 898 |
-
"single_word": false,
|
| 899 |
-
"special": true
|
| 900 |
-
},
|
| 901 |
-
"100368": {
|
| 902 |
-
"content": "<row_3_col_5>",
|
| 903 |
-
"lstrip": false,
|
| 904 |
-
"normalized": false,
|
| 905 |
-
"rstrip": false,
|
| 906 |
-
"single_word": false,
|
| 907 |
-
"special": true
|
| 908 |
-
},
|
| 909 |
-
"100369": {
|
| 910 |
-
"content": "<row_3_col_6>",
|
| 911 |
-
"lstrip": false,
|
| 912 |
-
"normalized": false,
|
| 913 |
-
"rstrip": false,
|
| 914 |
-
"single_word": false,
|
| 915 |
-
"special": true
|
| 916 |
-
},
|
| 917 |
-
"100370": {
|
| 918 |
-
"content": "<row_4_col_1>",
|
| 919 |
-
"lstrip": false,
|
| 920 |
-
"normalized": false,
|
| 921 |
-
"rstrip": false,
|
| 922 |
-
"single_word": false,
|
| 923 |
-
"special": true
|
| 924 |
-
},
|
| 925 |
-
"100371": {
|
| 926 |
-
"content": "<row_4_col_2>",
|
| 927 |
-
"lstrip": false,
|
| 928 |
-
"normalized": false,
|
| 929 |
-
"rstrip": false,
|
| 930 |
-
"single_word": false,
|
| 931 |
-
"special": true
|
| 932 |
-
},
|
| 933 |
-
"100372": {
|
| 934 |
-
"content": "<row_4_col_3>",
|
| 935 |
-
"lstrip": false,
|
| 936 |
-
"normalized": false,
|
| 937 |
-
"rstrip": false,
|
| 938 |
-
"single_word": false,
|
| 939 |
-
"special": true
|
| 940 |
-
},
|
| 941 |
-
"100373": {
|
| 942 |
-
"content": "<row_4_col_4>",
|
| 943 |
-
"lstrip": false,
|
| 944 |
-
"normalized": false,
|
| 945 |
-
"rstrip": false,
|
| 946 |
-
"single_word": false,
|
| 947 |
-
"special": true
|
| 948 |
-
},
|
| 949 |
-
"100374": {
|
| 950 |
-
"content": "<row_4_col_5>",
|
| 951 |
-
"lstrip": false,
|
| 952 |
-
"normalized": false,
|
| 953 |
-
"rstrip": false,
|
| 954 |
-
"single_word": false,
|
| 955 |
-
"special": true
|
| 956 |
-
},
|
| 957 |
-
"100375": {
|
| 958 |
-
"content": "<row_4_col_6>",
|
| 959 |
-
"lstrip": false,
|
| 960 |
-
"normalized": false,
|
| 961 |
-
"rstrip": false,
|
| 962 |
-
"single_word": false,
|
| 963 |
-
"special": true
|
| 964 |
-
},
|
| 965 |
-
"100376": {
|
| 966 |
-
"content": "<row_5_col_1>",
|
| 967 |
-
"lstrip": false,
|
| 968 |
-
"normalized": false,
|
| 969 |
-
"rstrip": false,
|
| 970 |
-
"single_word": false,
|
| 971 |
-
"special": true
|
| 972 |
-
},
|
| 973 |
-
"100377": {
|
| 974 |
-
"content": "<row_5_col_2>",
|
| 975 |
-
"lstrip": false,
|
| 976 |
-
"normalized": false,
|
| 977 |
-
"rstrip": false,
|
| 978 |
-
"single_word": false,
|
| 979 |
-
"special": true
|
| 980 |
-
},
|
| 981 |
-
"100378": {
|
| 982 |
-
"content": "<row_5_col_3>",
|
| 983 |
-
"lstrip": false,
|
| 984 |
-
"normalized": false,
|
| 985 |
-
"rstrip": false,
|
| 986 |
-
"single_word": false,
|
| 987 |
-
"special": true
|
| 988 |
-
},
|
| 989 |
-
"100379": {
|
| 990 |
-
"content": "<row_5_col_4>",
|
| 991 |
-
"lstrip": false,
|
| 992 |
-
"normalized": false,
|
| 993 |
-
"rstrip": false,
|
| 994 |
-
"single_word": false,
|
| 995 |
-
"special": true
|
| 996 |
-
},
|
| 997 |
-
"100380": {
|
| 998 |
-
"content": "<row_5_col_5>",
|
| 999 |
-
"lstrip": false,
|
| 1000 |
-
"normalized": false,
|
| 1001 |
-
"rstrip": false,
|
| 1002 |
-
"single_word": false,
|
| 1003 |
-
"special": true
|
| 1004 |
-
},
|
| 1005 |
-
"100381": {
|
| 1006 |
-
"content": "<row_5_col_6>",
|
| 1007 |
-
"lstrip": false,
|
| 1008 |
-
"normalized": false,
|
| 1009 |
-
"rstrip": false,
|
| 1010 |
-
"single_word": false,
|
| 1011 |
-
"special": true
|
| 1012 |
-
},
|
| 1013 |
-
"100382": {
|
| 1014 |
-
"content": "<row_6_col_1>",
|
| 1015 |
-
"lstrip": false,
|
| 1016 |
-
"normalized": false,
|
| 1017 |
-
"rstrip": false,
|
| 1018 |
-
"single_word": false,
|
| 1019 |
-
"special": true
|
| 1020 |
-
},
|
| 1021 |
-
"100383": {
|
| 1022 |
-
"content": "<row_6_col_2>",
|
| 1023 |
-
"lstrip": false,
|
| 1024 |
-
"normalized": false,
|
| 1025 |
-
"rstrip": false,
|
| 1026 |
-
"single_word": false,
|
| 1027 |
-
"special": true
|
| 1028 |
-
},
|
| 1029 |
-
"100384": {
|
| 1030 |
-
"content": "<row_6_col_3>",
|
| 1031 |
-
"lstrip": false,
|
| 1032 |
-
"normalized": false,
|
| 1033 |
-
"rstrip": false,
|
| 1034 |
-
"single_word": false,
|
| 1035 |
-
"special": true
|
| 1036 |
-
},
|
| 1037 |
-
"100385": {
|
| 1038 |
-
"content": "<row_6_col_4>",
|
| 1039 |
-
"lstrip": false,
|
| 1040 |
-
"normalized": false,
|
| 1041 |
-
"rstrip": false,
|
| 1042 |
-
"single_word": false,
|
| 1043 |
-
"special": true
|
| 1044 |
-
},
|
| 1045 |
-
"100386": {
|
| 1046 |
-
"content": "<row_6_col_5>",
|
| 1047 |
-
"lstrip": false,
|
| 1048 |
-
"normalized": false,
|
| 1049 |
-
"rstrip": false,
|
| 1050 |
-
"single_word": false,
|
| 1051 |
-
"special": true
|
| 1052 |
-
},
|
| 1053 |
-
"100387": {
|
| 1054 |
-
"content": "<row_6_col_6>",
|
| 1055 |
"lstrip": false,
|
| 1056 |
"normalized": false,
|
| 1057 |
"rstrip": false,
|
|
@@ -1066,11 +786,11 @@
|
|
| 1066 |
],
|
| 1067 |
"bos_token": "<|start_of_role|>",
|
| 1068 |
"clean_up_tokenization_spaces": false,
|
| 1069 |
-
"eos_token": "<|
|
| 1070 |
"errors": "replace",
|
| 1071 |
"extra_special_tokens": {},
|
| 1072 |
"model_max_length": 8192,
|
| 1073 |
-
"pad_token": "<|
|
| 1074 |
"padding_side": "left",
|
| 1075 |
"processor_class": "Idefics3Processor",
|
| 1076 |
"tokenizer_class": "GPT2Tokenizer",
|
|
|
|
| 19 |
"special": true
|
| 20 |
},
|
| 21 |
"100258": {
|
| 22 |
+
"content": "<row_1_col_1>",
|
| 23 |
"lstrip": false,
|
| 24 |
"normalized": false,
|
| 25 |
"rstrip": false,
|
|
|
|
| 27 |
"special": true
|
| 28 |
},
|
| 29 |
"100259": {
|
| 30 |
+
"content": "<row_1_col_2>",
|
| 31 |
"lstrip": false,
|
| 32 |
"normalized": false,
|
| 33 |
"rstrip": false,
|
|
|
|
| 35 |
"special": true
|
| 36 |
},
|
| 37 |
"100260": {
|
| 38 |
+
"content": "<text>",
|
| 39 |
"lstrip": false,
|
| 40 |
"normalized": false,
|
| 41 |
"rstrip": false,
|
|
|
|
| 43 |
"special": true
|
| 44 |
},
|
| 45 |
"100261": {
|
| 46 |
+
"content": "<row_1_col_3>",
|
| 47 |
"lstrip": false,
|
| 48 |
"normalized": false,
|
| 49 |
"rstrip": false,
|
|
|
|
| 51 |
"special": true
|
| 52 |
},
|
| 53 |
"100262": {
|
| 54 |
+
"content": "<row_1_col_4>",
|
| 55 |
"lstrip": false,
|
| 56 |
"normalized": false,
|
| 57 |
"rstrip": false,
|
|
|
|
| 59 |
"special": true
|
| 60 |
},
|
| 61 |
"100263": {
|
| 62 |
+
"content": "<row_2_col_1>",
|
| 63 |
"lstrip": false,
|
| 64 |
"normalized": false,
|
| 65 |
"rstrip": false,
|
|
|
|
| 83 |
"special": true
|
| 84 |
},
|
| 85 |
"100266": {
|
| 86 |
+
"content": "</title>",
|
| 87 |
"lstrip": false,
|
| 88 |
"normalized": false,
|
| 89 |
"rstrip": false,
|
|
|
|
| 91 |
"special": true
|
| 92 |
},
|
| 93 |
"100267": {
|
| 94 |
+
"content": "<row_2_col_2>",
|
| 95 |
"lstrip": false,
|
| 96 |
"normalized": false,
|
| 97 |
"rstrip": false,
|
|
|
|
| 99 |
"special": true
|
| 100 |
},
|
| 101 |
"100268": {
|
| 102 |
+
"content": "<row_2_col_3>",
|
| 103 |
"lstrip": false,
|
| 104 |
"normalized": false,
|
| 105 |
"rstrip": false,
|
|
|
|
| 107 |
"special": true
|
| 108 |
},
|
| 109 |
"100269": {
|
| 110 |
+
"content": "<title>",
|
| 111 |
"lstrip": false,
|
| 112 |
"normalized": false,
|
| 113 |
"rstrip": false,
|
|
|
|
| 491 |
"special": true
|
| 492 |
},
|
| 493 |
"100317": {
|
| 494 |
+
"content": "</text>",
|
| 495 |
"lstrip": false,
|
| 496 |
"normalized": false,
|
| 497 |
"rstrip": false,
|
|
|
|
| 499 |
"special": true
|
| 500 |
},
|
| 501 |
"100318": {
|
| 502 |
+
"content": "<paragraph>",
|
| 503 |
"lstrip": false,
|
| 504 |
"normalized": false,
|
| 505 |
"rstrip": false,
|
|
|
|
| 587 |
"special": true
|
| 588 |
},
|
| 589 |
"100329": {
|
| 590 |
+
"content": "<rec_",
|
| 591 |
"lstrip": false,
|
| 592 |
"normalized": false,
|
| 593 |
"rstrip": false,
|
|
|
|
| 659 |
"special": true
|
| 660 |
},
|
| 661 |
"100338": {
|
| 662 |
+
"content": "<|unk|>",
|
| 663 |
"lstrip": false,
|
| 664 |
"normalized": false,
|
| 665 |
"rstrip": false,
|
|
|
|
| 683 |
"special": true
|
| 684 |
},
|
| 685 |
"100341": {
|
| 686 |
+
"content": "<row_2_col_4>",
|
| 687 |
"lstrip": false,
|
| 688 |
"normalized": false,
|
| 689 |
"rstrip": false,
|
|
|
|
| 691 |
"special": true
|
| 692 |
},
|
| 693 |
"100342": {
|
| 694 |
+
"content": "<row_3_col_1>",
|
| 695 |
"lstrip": false,
|
| 696 |
"normalized": false,
|
| 697 |
"rstrip": false,
|
|
|
|
| 699 |
"special": true
|
| 700 |
},
|
| 701 |
"100343": {
|
| 702 |
+
"content": "<row_3_col_2>",
|
| 703 |
"lstrip": false,
|
| 704 |
"normalized": false,
|
| 705 |
"rstrip": false,
|
|
|
|
| 707 |
"special": true
|
| 708 |
},
|
| 709 |
"100344": {
|
| 710 |
+
"content": "<row_3_col_3>",
|
| 711 |
"lstrip": false,
|
| 712 |
"normalized": false,
|
| 713 |
"rstrip": false,
|
|
|
|
| 715 |
"special": true
|
| 716 |
},
|
| 717 |
"100345": {
|
| 718 |
+
"content": "<row_3_col_4>",
|
| 719 |
"lstrip": false,
|
| 720 |
"normalized": false,
|
| 721 |
"rstrip": false,
|
|
|
|
| 723 |
"special": true
|
| 724 |
},
|
| 725 |
"100346": {
|
| 726 |
+
"content": "<row_4_col_1>",
|
| 727 |
"lstrip": false,
|
| 728 |
"normalized": false,
|
| 729 |
"rstrip": false,
|
|
|
|
| 731 |
"special": true
|
| 732 |
},
|
| 733 |
"100347": {
|
| 734 |
+
"content": "<row_4_col_2>",
|
| 735 |
"lstrip": false,
|
| 736 |
"normalized": false,
|
| 737 |
"rstrip": false,
|
|
|
|
| 739 |
"special": true
|
| 740 |
},
|
| 741 |
"100348": {
|
| 742 |
+
"content": "<row_4_col_3>",
|
| 743 |
"lstrip": false,
|
| 744 |
"normalized": false,
|
| 745 |
"rstrip": false,
|
|
|
|
| 747 |
"special": true
|
| 748 |
},
|
| 749 |
"100349": {
|
| 750 |
+
"content": "<row_4_col_4>",
|
| 751 |
"lstrip": false,
|
| 752 |
"normalized": false,
|
| 753 |
"rstrip": false,
|
|
|
|
| 755 |
"special": true
|
| 756 |
},
|
| 757 |
"100350": {
|
| 758 |
+
"content": "<code>",
|
| 759 |
"lstrip": false,
|
| 760 |
"normalized": false,
|
| 761 |
"rstrip": false,
|
|
|
|
| 763 |
"special": true
|
| 764 |
},
|
| 765 |
"100351": {
|
| 766 |
+
"content": "</code>",
|
| 767 |
"lstrip": false,
|
| 768 |
"normalized": false,
|
| 769 |
"rstrip": false,
|
|
|
|
| 771 |
"special": true
|
| 772 |
},
|
| 773 |
"100352": {
|
| 774 |
+
"content": "<end_of_utterance>",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 775 |
"lstrip": false,
|
| 776 |
"normalized": false,
|
| 777 |
"rstrip": false,
|
|
|
|
| 786 |
],
|
| 787 |
"bos_token": "<|start_of_role|>",
|
| 788 |
"clean_up_tokenization_spaces": false,
|
| 789 |
+
"eos_token": "<|end_of_text|>",
|
| 790 |
"errors": "replace",
|
| 791 |
"extra_special_tokens": {},
|
| 792 |
"model_max_length": 8192,
|
| 793 |
+
"pad_token": "<|end_of_text|>",
|
| 794 |
"padding_side": "left",
|
| 795 |
"processor_class": "Idefics3Processor",
|
| 796 |
"tokenizer_class": "GPT2Tokenizer",
|
vocab.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|