MentorFlow / teacher_agent_dev /COMPARISON_README.md
Cornelius
Deploy MentorFlow with GPU support
a52f96d

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Strategy Comparison: Teacher vs Baselines

Overview

This module compares three training strategies for the student agent:

  1. Random Strategy: Student receives random questions from task generator until they can confidently pass difficult questions
  2. Progressive Strategy: Student receives questions in progressive difficulty order (Easy β†’ Medium β†’ Hard) within each family sequentially
  3. Teacher Strategy: RL teacher agent learns optimal curriculum using UCB bandit algorithm

Goal

Demonstrate that the Teacher-trained student performs best - achieving highest accuracy on difficult questions.

Running the Comparison

cd teacher_agent_dev
python compare_strategies.py

This will:

  • Train all three strategies for 500 iterations
  • Track accuracy on general questions and difficult questions
  • Generate comparison plots showing all three strategies
  • Print summary statistics

Output

Plot: comparison_all_strategies.png

The plot contains three subplots:

  1. General Accuracy Over Time: Shows how student accuracy improves on medium-difficulty questions
  2. Difficult Question Accuracy: KEY METRIC - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
  3. Learning Efficiency: Bar chart showing iterations to reach 75% target vs final performance

Key Metrics Tracked

  • General Accuracy: Student performance on medium-difficulty questions from all topics
  • Difficult Accuracy: Student performance on hard-difficulty questions (target metric)
  • Iterations to Target: How many iterations until student reaches 75% accuracy on difficult questions
  • Final Accuracy: Final performance after 500 iterations

Expected Results

The Teacher strategy should show:

  • βœ… Highest final accuracy on difficult questions
  • βœ… Efficient learning (good balance of speed and performance)
  • βœ… Better curriculum (smarter topic/difficulty selection)

Example Output

STRATEGY COMPARISON SUMMARY
======================================================================
Random          | βœ… Reached       | Iterations:   51 | Final Acc: 0.760
Progressive     | βœ… Reached       | Iterations:  310 | Final Acc: 0.520
Teacher         | βœ… Reached       | Iterations:   55 | Final Acc: 0.880
======================================================================

Teacher wins with highest final accuracy!

Strategy Details

Random Strategy

  • Completely random selection of topics and difficulties
  • No curriculum structure
  • Baseline for comparison
  • May reach target quickly due to luck, but doesn't optimize learning

Progressive Strategy

  • Rigid curriculum: Easy β†’ Medium β†’ Hard for each topic sequentially
  • No adaptation to student needs
  • Slow to reach difficult questions
  • Doesn't account for forgetting or optimal pacing

Teacher Strategy

  • RL-based curriculum learning
  • Uses UCB bandit to balance exploration/exploitation
  • Adapts based on student improvement (reward signal)
  • Optimizes for efficient learning
  • Can strategically review topics to prevent forgetting

Visualization Features

  • Color coding: Teacher in green (highlighted as best), Random in red, Progressive in teal
  • Line styles: Teacher with solid thick line, baselines with dashed/dotted
  • Annotations: Final accuracy values labeled on plots
  • Target line: 75% accuracy threshold marked on difficult question plot
  • Summary statistics: Table showing which strategies reached target and when

Customization

You can modify parameters in compare_strategies.py:

num_iterations = 500  # Number of training iterations
target_accuracy = 0.75  # Target accuracy on difficult questions
seed = 42  # Random seed for reproducibility

Files

  • compare_strategies.py - Main comparison script
  • comparison_all_strategies.png - Generated comparison plot
  • train_teacher.py - Teacher training logic
  • mock_student.py - Student agent implementation
  • mock_task_generator.py - Task generator

Notes

  • All strategies use the same student parameters for fair comparison
  • Evaluation uses held-out test sets
  • Teacher strategy learns from rewards based on student improvement
  • Results may vary slightly due to randomness, but teacher should consistently outperform baselines