auerchristoph commited on
Commit
982fe3b
·
verified ·
1 Parent(s): 377322a

Add Troubleshooting section to README (#23)

Browse files

- Add Troubleshooting section to README (10ae2e8a5590ace87c70eadead48c12532ce6cad)

Files changed (1) hide show
  1. README.md +49 -2
README.md CHANGED
@@ -35,7 +35,7 @@ library_name: transformers
35
 
36
  **Model Summary**:
37
 
38
- Granite Docling 258M builds upon the IDEFICS3 architecture, but introduces two key modifications: it replaces the vision encoder with siglip2-base-patch16-512 and substitutes the language model with a Granite 165M LLM. Try out our [Granite-Docling-258](https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo) demo today.
39
 
40
  - **Developed by**: IBM Research
41
  - **Model type**: Multi-modal model (image+text-to-text)
@@ -292,6 +292,8 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
292
 
293
  💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
294
 
 
 
295
  ## Intended Use
296
  Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
297
  However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
@@ -533,4 +535,49 @@ Its training, which includes both human-annotated and synthetic data informed by
533
  - ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
534
  - 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
535
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
536
- - 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  **Model Summary**:
37
 
38
+ Granite Docling 258M builds upon the Idefics3 architecture, but introduces two key modifications: it replaces the vision encoder with siglip2-base-patch16-512 and substitutes the language model with a Granite 165M LLM. Try out our [Granite-Docling-258](https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo) demo today.
39
 
40
  - **Developed by**: IBM Research
41
  - **Model type**: Multi-modal model (image+text-to-text)
 
292
 
293
  💻 Local inference on Apple Silicon with MLX: [see here](https://huggingface.co/ibm-granite/granite-docling-258M-mlx)
294
 
295
+ ℹ️ If you see trouble running granite-docling with the codes above, check the troubleshooting section at the bottom ⬇️.
296
+
297
  ## Intended Use
298
  Granite-Docling is designed to complement the Docling library, not replace it. It integrates as a component within larger Docling library, consolidating the functions of multiple single-purpose models into a single, compact VLM.
299
  However, Granite-Docling is **not** intended for general image understanding. For tasks focused solely on image-text input, we recommend using [Granite Vision models](https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800), which are purpose-built and optimized for image-text processing.
 
535
  - ⭐️ Learn about the latest updates with Docling: https://docling-project.github.io/docling/#features
536
  - 🚀 Get started with Docling concepts, integrations and tutorials: https://docling-project.github.io/docling/getting_started/
537
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
538
+ - 🖥️ Learn more about how to use Granite-Docling, explore the Docling library, and see what’s coming next for Docling in the release blog: https://ibm.com/new/announcements/granite-docling-end-to-end-document-conversion
539
+
540
+ ## Troubleshooting
541
+
542
+ **Running with VLLM**
543
+
544
+ 1. You receive `AttributeError: 'LlamaModel' object has no attribute 'wte'` when launching the model through VLLM.
545
+
546
+ With current versions of VLLM (including 0.10.2), support for tied weights as used in granite-docling is limited and breaks. We provide a version with untied weights on the `untied` branch of this model repo.
547
+ To use the untied version, please pass the `revision` argument to VLLM:
548
+
549
+ ```sh
550
+ # Serve the model through VLLM
551
+ $> vllm serve ibm-granite/granite-docling-258M --revision untied
552
+ ```
553
+
554
+ ```python
555
+ # If using the VLLM python SDK:
556
+ from vllm import LLM
557
+ ...
558
+
559
+ llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1})
560
+ ```
561
+
562
+ 2. The model outputs only exclamation marks (i.e. "!!!!!!!!!!!!!!!").
563
+
564
+ This is seen on older NVIDIA GPUs, such as the T4 GPU available in Google Colab, because it lacks support for `bfloat16` format.
565
+ You can work around it by setting the `dtype` to `float32`.
566
+
567
+ ```sh
568
+ # Serve the model through VLLM
569
+ $> vllm serve ibm-granite/granite-docling-258M --revision untied --dtype float32
570
+ ```
571
+
572
+ ```python
573
+ # If using the VLLM python SDK:
574
+ from vllm import LLM
575
+ ...
576
+
577
+ llm = LLM(model=MODEL_PATH, revision="untied", limit_mm_per_prompt={"image": 1}, dtype="float32")
578
+ ```
579
+
580
+
581
+
582
+
583
+