slow by design?
#1
by
willfalco
- opened
@50k context, ran as fp8 ~30tps, full fp16 ~40tps (1gpu), nvfp4 ~17tps (1gpu)
RTX PRO 6000
Hi, thanks for your interest! MiroThinker-v1.0-72B uses the same architecture as Qwen2.5-72B. Since there are no structural changes, the inference speed should be essentially the same.