Add Troubleshooting section to README (#23)
Browse files- Add Troubleshooting section to README (10ae2e8a5590ace87c70eadead48c12532ce6cad)
README.md
CHANGED
|
@@ -35,7 +35,7 @@ library_name: transformers
|
|
| 35 |
|
| 36 |
**Model Summary**:
|
| 37 |
|
| 38 |
-
Granite Docling 258M builds upon the
|
| 39 |
|
| 40 |
- **Developed by**: IBM Research
|
| 41 |
- **Model type**: Multi-modal model (image+text-to-text)
|
|
@@ -292,6 +292,8 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
|
|
| 292 |
|
| 293 |
💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
|
| 294 |
|
|
|
|
|
|
|
| 295 |
## Intended Use
|
| 296 |
Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
|
| 297 |
However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
|
|
@@ -533,4 +535,49 @@ Its training, which includes both human-annotated and synthetic data informed by
|
|
| 533 |
- ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
|
| 534 |
- 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
|
| 535 |
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
|
| 536 |
-
- 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
**Model Summary**:
|
| 37 |
|
| 38 |
+
Granite Docling 258M builds upon the Idefics3 architecture, but introduces two key modifications: it replaces the vision encoder with siglip2-base-patch16-512 and substitutes the language model with a Granite 165M LLM. Try out our [Granite-Docling-258](https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo) demo today.
|
| 39 |
|
| 40 |
- **Developed by**: IBM Research
|
| 41 |
- **Model type**: Multi-modal model (image+text-to-text)
|
|
|
|
| 292 |
|
| 293 |
💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
|
| 294 |
|
| 295 |
+
ℹ️ If you see trouble running granite-docling with the codes above, check the troubleshooting section at the bottom ⬇️.
|
| 296 |
+
|
| 297 |
## Intended Use
|
| 298 |
Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
|
| 299 |
However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
|
|
|
|
| 535 |
- ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
|
| 536 |
- 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
|
| 537 |
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
|
| 538 |
+
- 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion
|
| 539 |
+
|
| 540 |
+
## Troubleshooting
|
| 541 |
+
|
| 542 |
+
**Running with VLLM**
|
| 543 |
+
|
| 544 |
+
1. You receive `AttributeError: 'LlamaModel' object has no attribute 'wte'` when launching the model through VLLM.
|
| 545 |
+
|
| 546 |
+
With current versions of VLLM (including 0.10.2), support for tied weights as used in granite-docling is limited and breaks. We provide a version with untied weights on the `untied` branch of this model repo.
|
| 547 |
+
To use the untied version, please pass the `revision` argument to VLLM:
|
| 548 |
+
|
| 549 |
+
```sh
|
| 550 |
+
# Serve the model through VLLM
|
| 551 |
+
$> vllm serve ibm-granite/granite-docling-258M --revision untied
|
| 552 |
+
```
|
| 553 |
+
|
| 554 |
+
```python
|
| 555 |
+
# If using the VLLM python SDK:
|
| 556 |
+
from vllm import LLM
|
| 557 |
+
...
|
| 558 |
+
|
| 559 |
+
llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1})
|
| 560 |
+
```
|
| 561 |
+
|
| 562 |
+
2. The model outputs only exclamation marks (i.e. "!!!!!!!!!!!!!!!").
|
| 563 |
+
|
| 564 |
+
This is seen on older NVIDIA GPUs, such as the T4 GPU available in Google Colab, because it lacks support for `bfloat16` format.
|
| 565 |
+
You can work around it by setting the `dtype` to `float32`.
|
| 566 |
+
|
| 567 |
+
```sh
|
| 568 |
+
# Serve the model through VLLM
|
| 569 |
+
$> vllm serve ibm-granite/granite-docling-258M --revision untied --dtype float32
|
| 570 |
+
```
|
| 571 |
+
|
| 572 |
+
```python
|
| 573 |
+
# If using the VLLM python SDK:
|
| 574 |
+
from vllm import LLM
|
| 575 |
+
...
|
| 576 |
+
|
| 577 |
+
llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1}, dtype="float32")
|
| 578 |
+
```
|
| 579 |
+
|
| 580 |
+
|
| 581 |
+
|
| 582 |
+
|
| 583 |
+
|