slow by design?

#1
by willfalco - opened

@50k context, ran as fp8 ~30tps, full fp16 ~40tps (1gpu), nvfp4 ~17tps (1gpu)
RTX PRO 6000

MiroMind AI org

Hi, thanks for your interest! MiroThinker-v1.0-72B uses the same architecture as Qwen2.5-72B. Since there are no structural changes, the inference speed should be essentially the same.

Sign up or log in to comment