slow by design?

by willfalco - opened 10 days ago

Discussion

willfalco

10 days ago

@50k context, ran as fp8 ~30tps, full fp16 ~40tps (1gpu), nvfp4 ~17tps (1gpu)
RTX PRO 6000

jenny-miromind

MiroMind AI org 9 days ago

Hi, thanks for your interest! MiroThinker-v1.0-72B uses the same architecture as Qwen2.5-72B. Since there are no structural changes, the inference speed should be essentially the same.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment