ibm-granite
/

granite-docling-258M

Model card Files Files and versions

xet

Community

auerchristoph commited on Sep 23

Commit

982fe3b

verified ·

1 Parent(s): 377322a

Add Troubleshooting section to README (#23)

Browse files

- Add Troubleshooting section to README (10ae2e8a5590ace87c70eadead48c12532ce6cad)

Files changed (1) hide show

README.md +49 -2

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ library_name: transformers
 **Model Summary**:
-Granite Docling 258M builds upon the IDEFICS3 architecture, but introduces two key modifications: it replaces the vision encoder with siglip2-base-patch16-512 and substitutes the language model with a Granite 165M LLM. Try out our [Granite-Docling-258](https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo) demo today.
 - **Developed by**: IBM Research
 - **Model type**: Multi-modal model (image+text-to-text)
@@ -292,6 +292,8 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
 💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
 ## Intended Use
 Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
 However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
@@ -533,4 +535,49 @@ Its training, which includes both human-annotated and synthetic data informed by
 - ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
 - 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
 - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
-- 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion

 **Model Summary**:
+Granite Docling 258M builds upon the Idefics3 architecture, but introduces two key modifications: it replaces the vision encoder with siglip2-base-patch16-512 and substitutes the language model with a Granite 165M LLM. Try out our [Granite-Docling-258](https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo) demo today.
 - **Developed by**: IBM Research
 - **Model type**: Multi-modal model (image+text-to-text)
 💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
+ℹ️ If you see trouble running granite-docling with the codes above, check the troubleshooting section at the bottom ⬇️.
 ## Intended Use
 Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
 However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
 - ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
 - 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
 - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
+- 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion
+## Troubleshooting
+**Running with VLLM**
+1. You receive `AttributeError: 'LlamaModel' object has no attribute 'wte'` when launching the model through VLLM.
+    With current versions of VLLM (including 0.10.2), support for tied weights as used in granite-docling is limited and breaks. We provide a version with untied weights on the `untied` branch of this model repo.
+    To use the untied version, please pass the `revision` argument to VLLM:
+    ```sh
+    # Serve the model through VLLM
+    $> vllm serve ibm-granite/granite-docling-258M --revision untied
+    ```
+    ```python
+    # If using the VLLM python SDK:
+    from vllm import LLM
+    ...
+    llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1})
+    ```
+2. The model outputs only exclamation marks (i.e. "!!!!!!!!!!!!!!!").
+   This is seen on older NVIDIA GPUs, such as the T4 GPU available in Google Colab, because it lacks support for `bfloat16` format.
+   You can work around it by setting the `dtype` to `float32`.
+   ```sh
+    # Serve the model through VLLM
+    $> vllm serve ibm-granite/granite-docling-258M --revision untied --dtype float32
+    ```
+    ```python
+    # If using the VLLM python SDK:
+    from vllm import LLM
+    ...
+    llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1}, dtype="float32")
+    ```