Spaces:

iteratehack
/

MentorFlow

Paused

App Files Files Community

MentorFlow / teacher_agent_dev /FIXES_SUMMARY.md

Cornelius

Deploy MentorFlow with GPU support

a52f96d 12 days ago

preview code

raw

history blame contribute delete

3.13 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Summary of Fixes for Accuracy Drop Issues

Issues Identified

1. Accuracy Drops at End ❌

Root Causes:

Evaluation uses NEW tasks each iteration → Variance and inconsistency
- Line 171-175: Generates new tasks on-the-fly for general_accuracy
- Different tasks each time = different difficulty/variance
Forgetting rate too aggressive for 500 iterations
- Forgetting rate = 0.05
- After 500 time units: retention = exp(-0.05 * 500) ≈ 0.0
- All skills completely forgotten by iteration 500!
Evaluation timing: Evaluation happens after time advance, but we log before - this is actually OK

Fix:

✅ Use FIXED eval sets generated once at start
✅ Reduce forgetting rate from 0.05 to 0.01 (5x slower forgetting)
✅ Evaluation happens BEFORE time advance (accurate snapshot)

2. Accuracy Calculation Method

Current Method:

Uses student.evaluate(eval_tasks) which samples answers stochastically
Accounts for forgetting correctly
BUT: Uses different tasks each time

Problems:

Stochastic variance (random sampling)
Inconsistent eval sets (regenerated each time)
Small eval sets (10-15 tasks) = high variance

Better Method:

✅ FIXED eval sets generated once
✅ Same tasks used throughout = consistent measurement
✅ Larger eval sets (15+ tasks) for stability

Alternative (for future):

Use expected accuracy = mean(prob_correct) instead of sampling
Removes stochastic variance

3. Mock vs Real Components

Current Mock Components:

✅ Mock Student: Captures learning + forgetting well
✅ Mock Task Generator: Simple but functional
❌ Simplified learning model
❌ Limited task diversity

Real Components (MentorFlow):

Real Student: Full PPO with neural network
Real Task Generator: Procedural generation, 15 families

Will Real Components Be Better? YES:

Real PPO Student:
- Can learn complex patterns
- Better generalization
- More realistic learning curves
- But: Slower to train
Real Task Generator:
- More diverse tasks
- Procedural generation = infinite variety
- Better tests generalization
Teacher Agent Algorithm:
- UCB algorithm will work the same
- Should perform even better with real components
- More realistic reward signals

Expected Improvement:

Teacher should learn better curriculum
Student should achieve higher accuracy
More realistic forgetting patterns (if implemented)

Applied Fixes

✅ Fixed evaluation to use FIXED eval sets ✅ Reduced forgetting rate from 0.05 → 0.01 ✅ Evaluation happens BEFORE time advance ✅ All strategies use consistent eval sets

Remaining Considerations

Forgetting Model: Could use more sophisticated model (spaced repetition optimization)
Evaluation Method: Could use expected accuracy instead of sampling
Eval Set Size: Could increase for more stability (currently 15 tasks, could be 50-100)
Time Reset: Could periodically reset time to prevent complete forgetting in long training