Spaces:

iteratehack
/

MentorFlow

Paused

App Files Files Community

MentorFlow / teacher_agent_dev /COMPARISON_README.md

Cornelius

Deploy MentorFlow with GPU support

a52f96d 13 days ago

preview code

raw

history blame contribute delete

4.36 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Strategy Comparison: Teacher vs Baselines

Overview

This module compares three training strategies for the student agent:

Random Strategy: Student receives random questions from task generator until they can confidently pass difficult questions
Progressive Strategy: Student receives questions in progressive difficulty order (Easy → Medium → Hard) within each family sequentially
Teacher Strategy: RL teacher agent learns optimal curriculum using UCB bandit algorithm

Goal

Demonstrate that the Teacher-trained student performs best - achieving highest accuracy on difficult questions.

Running the Comparison

cd teacher_agent_dev
python compare_strategies.py

This will:

Train all three strategies for 500 iterations
Track accuracy on general questions and difficult questions
Generate comparison plots showing all three strategies
Print summary statistics

Output

Plot: `comparison_all_strategies.png`

The plot contains three subplots:

General Accuracy Over Time: Shows how student accuracy improves on medium-difficulty questions
Difficult Question Accuracy: KEY METRIC - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
Learning Efficiency: Bar chart showing iterations to reach 75% target vs final performance

Key Metrics Tracked

General Accuracy: Student performance on medium-difficulty questions from all topics
Difficult Accuracy: Student performance on hard-difficulty questions (target metric)
Iterations to Target: How many iterations until student reaches 75% accuracy on difficult questions
Final Accuracy: Final performance after 500 iterations

Expected Results

The Teacher strategy should show:

✅ Highest final accuracy on difficult questions
✅ Efficient learning (good balance of speed and performance)
✅ Better curriculum (smarter topic/difficulty selection)

Example Output

STRATEGY COMPARISON SUMMARY
======================================================================
Random          | ✅ Reached       | Iterations:   51 | Final Acc: 0.760
Progressive     | ✅ Reached       | Iterations:  310 | Final Acc: 0.520
Teacher         | ✅ Reached       | Iterations:   55 | Final Acc: 0.880
======================================================================

Teacher wins with highest final accuracy!

Strategy Details

Random Strategy

Completely random selection of topics and difficulties
No curriculum structure
Baseline for comparison
May reach target quickly due to luck, but doesn't optimize learning

Progressive Strategy

Rigid curriculum: Easy → Medium → Hard for each topic sequentially
No adaptation to student needs
Slow to reach difficult questions
Doesn't account for forgetting or optimal pacing

Teacher Strategy

RL-based curriculum learning
Uses UCB bandit to balance exploration/exploitation
Adapts based on student improvement (reward signal)
Optimizes for efficient learning
Can strategically review topics to prevent forgetting

Visualization Features

Color coding: Teacher in green (highlighted as best), Random in red, Progressive in teal
Line styles: Teacher with solid thick line, baselines with dashed/dotted
Annotations: Final accuracy values labeled on plots
Target line: 75% accuracy threshold marked on difficult question plot
Summary statistics: Table showing which strategies reached target and when

Customization

You can modify parameters in compare_strategies.py:

num_iterations = 500  # Number of training iterations
target_accuracy = 0.75  # Target accuracy on difficult questions
seed = 42  # Random seed for reproducibility

Files

compare_strategies.py - Main comparison script
comparison_all_strategies.png - Generated comparison plot
train_teacher.py - Teacher training logic
mock_student.py - Student agent implementation
mock_task_generator.py - Task generator

Notes

All strategies use the same student parameters for fair comparison
Evaluation uses held-out test sets
Teacher strategy learns from rewards based on student improvement
Results may vary slightly due to randomness, but teacher should consistently outperform baselines