(Dataset based on Pinkstack/syngen-reasoning-0.6b-dataset)

This is a slightly refined version of qingy2024/SynGen-14B with DPO training on qingy2024/SynGen-Antiloop-DPO. This should reduce repetitions and improve quality of generated reasoning traces. See the original model card for a description of what it can do.

Notes:

  • Everything (training configs, datasets, model weights) is open source.
  • This model is specifically optimized for R1's reasoning style but GPT-OSS may still work fine (I haven't tested yet).
  • It's not guaranteed that the model generates perfect CoT every time, but it should not be too hard of a task given that it knows the final answer already.
  • For sampler settings: temp = 0.7, top_p = 0.95, pretty much default works.

Prompt Format

System Message

<reasoning_style>deepseek_r1</reasoning_style> # Can replace deepseek_r1 with gpt_oss
<system_prompt>Original System Prompt</system_prompt>

Prompt Message

<user>User Message Here</user>
<assistant>Assistant Final Response Here (without reasoning)</assistant>

Output Format

<think>Generated Reasoning</think>

Training Details

  • Base Model: qingy2024/SynGen-14B
  • Training Epochs: 1
  • Learning Rate: 2e-6
  • Batch Size: 64
  • Training Method: 16-bit LoRA (rank 64, alpha 128)
  • Training Hardware: H200
  • Training Platform: Modal
  • Total Cost: $20.43 USD
  • Seed: 42
  • As of January 1, 2026, this is the biggest model ever trained for reasoning generation!

image

Downloads last month
29
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qingy2024/SynGen-Max-14B

Finetuned
Qwen/Qwen3-14B
Finetuned
(1)
this model
Merges
3 models
Quantizations
2 models

Dataset used to train qingy2024/SynGen-Max-14B

Collection including qingy2024/SynGen-Max-14B