Update README.md
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ tags:
|
|
| 17 |
|
| 18 |
MiniGuard-v0.1 is a compact content safety classifier fine-tuned from [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B). It classifies content in both, User inputs (prompt classification) and LLM responses (response classification), outputting whether a given prompt or response is safe or unsafe, along with the violated safety categories.
|
| 19 |
|
| 20 |
-
MiniGuard-v0.1 achieves **~99% of Nemotron-Guard-8B's benchmark accuracy** with **13x fewer parameters
|
| 21 |
|
| 22 |
|
| 23 |
## Compatibility
|
|
@@ -56,14 +56,16 @@ Dataset - English subset test split of [nvidia/Nemotron-Safety-Guard-Dataset-v3]
|
|
| 56 |
|
| 57 |
Evaluated on out-of-distribution production data containing real user queries. Cost estimated based on H200 GPU pricing ($7.91/hour) at maximum concurrency with P95 latency SLA of <500ms.
|
| 58 |
|
| 59 |
-
|
|
| 60 |
-
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
-
MiniGuard-v0.1 achieves 91.1% relative performance on out-of-distribution data while costing **67% less** to serve.
|
| 67 |
|
| 68 |
### Ablation Study
|
| 69 |
|
|
@@ -77,6 +79,10 @@ Impact of techniques on out-of-distribution production data (Relative Macro F1 c
|
|
| 77 |
| + Targeted Synthetic Data | 0.6B | 87.2% | +1.6% |
|
| 78 |
| + Soup (top-3) [MiniGuard-v0.1] | 0.6B | 92.3% | +5.1% |
|
| 79 |
| + FP8 | 0.6B | 91.1% | -1.2% |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
| Nemotron-Guard-8B-v3 | 8B | 100% | reference |
|
| 81 |
|
| 82 |
#### In-Distribution
|
|
|
|
| 17 |
|
| 18 |
MiniGuard-v0.1 is a compact content safety classifier fine-tuned from [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B). It classifies content in both, User inputs (prompt classification) and LLM responses (response classification), outputting whether a given prompt or response is safe or unsafe, along with the violated safety categories.
|
| 19 |
|
| 20 |
+
MiniGuard-v0.1 achieves **~99% of Nemotron-Guard-8B's benchmark accuracy** with **13x fewer parameters** and **outperforms Qwen3Guard-8B** (a specialized 8B safety model) **by 9 percentage points** on production data.
|
| 21 |
|
| 22 |
|
| 23 |
## Compatibility
|
|
|
|
| 56 |
|
| 57 |
Evaluated on out-of-distribution production data containing real user queries. Cost estimated based on H200 GPU pricing ($7.91/hour) at maximum concurrency with P95 latency SLA of <500ms.
|
| 58 |
|
| 59 |
+
| Model | Parameters | Rel. Macro F1 | Cost per 1M requests | Cost Savings |
|
| 60 |
+
|--------|------------|---------------|----------------------|--------------|
|
| 61 |
+
| **MiniGuard-v0.1** | **0.6B** | **91.1%** | **$15.54** | **67%** |
|
| 62 |
+
| Qwen3Guard-Gen-0.6B | 0.6B | 72.1% | - | - |
|
| 63 |
+
| Qwen3Guard-Gen-4B | 4B | 78.0% | - | - |
|
| 64 |
+
| Qwen3Guard-Gen-8B | 8B | 82.1% | - | - |
|
| 65 |
+
| Nemotron-Guard-8B-v3 | 8B | 100% | $46.93 | baseline |
|
| 66 |
|
| 67 |
|
| 68 |
+
MiniGuard-v0.1 achieves 91.1% relative performance on out-of-distribution data while costing **67% less** to serve. **Notably, our 0.6B fine-tuned model outperforms all Qwen3Guard models by significant margins**, including the 8B version (82.1%), demonstrating that targeted fine-tuning is more effective than simply using larger pretrained safety models.
|
| 69 |
|
| 70 |
### Ablation Study
|
| 71 |
|
|
|
|
| 79 |
| + Targeted Synthetic Data | 0.6B | 87.2% | +1.6% |
|
| 80 |
| + Soup (top-3) [MiniGuard-v0.1] | 0.6B | 92.3% | +5.1% |
|
| 81 |
| + FP8 | 0.6B | 91.1% | -1.2% |
|
| 82 |
+
| **Comparison Baselines:** |
|
| 83 |
+
| Qwen3Guard-Gen-0.6B | 0.6B | 72.1% | - |
|
| 84 |
+
| Qwen3Guard-Gen-4B | 4B | 78.0% | - |
|
| 85 |
+
| Qwen3Guard-Gen-8B | 8B | 82.1% | - |
|
| 86 |
| Nemotron-Guard-8B-v3 | 8B | 100% | reference |
|
| 87 |
|
| 88 |
#### In-Distribution
|