Spaces:
Running
CUGA Profiling
This directory contains tools for profiling CUGA digital sales tasks with different configurations and models, extracting performance metrics and LLM call information from Langfuse.
Directory Structure
system_tests/profiling/
βββ README.md # This file
βββ run_experiment.sh # Main entry point for running experiments
βββ serve.sh # HTTP server for viewing results
βββ bin/ # Internal scripts
β βββ profile_digital_sales_tasks.py
β βββ run_profiling.sh
β βββ run_experiment.sh
βββ config/ # Configuration files
β βββ default_experiment.yaml # Default experiment configuration
β βββ fast_vs_accurate.yaml # Example: Fast vs Accurate comparison
β βββ .secrets.yaml # Secrets file (git-ignored)
βββ experiments/ # Experiment results and comparison HTML
β βββ comparison.html
βββ reports/ # Individual profiling reports
Quick Start
1. Set Up Environment Variables
Create a .env file in the project root or export these variables:
export LANGFUSE_PUBLIC_KEY="pk-your-public-key"
export LANGFUSE_SECRET_KEY="sk-your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # Optional
2. Run an Experiment
The simplest way to run experiments is using the configuration files:
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh
# Run a specific experiment configuration
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml
# Run and automatically open results in browser
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml --open
3. View Results
Results are automatically saved to system_tests/profiling/experiments/ and can be viewed in the HTML dashboard:
# Start the server (serves experiments directory)
./system_tests/profiling/serve.sh
# Or start and open browser automatically
./system_tests/profiling/serve.sh --open
# Use a different port
./system_tests/profiling/serve.sh --port 3000
Configuration Files
Configuration files use YAML format with Dynaconf. They define experiments with multiple runs and comparison settings.
Example Configuration
profiling:
configs:
- "settings.openai.toml"
modes:
- "fast"
- "balanced"
tasks:
- "test_get_top_account_by_revenue_stream"
runs: 3
experiment:
name: "fast_vs_balanced"
description: "Compare fast and balanced modes"
runs:
- name: "fast_mode"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/fast_{{timestamp}}.json"
env:
MODEL_NAME: "Azure/gpt-4o" # Set environment variable
- name: "balanced_mode"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/balanced_{{timestamp}}.json"
env:
MODEL_NAME: null # Unset environment variable
comparison:
generate_html: true
html_output: "experiments/comparison.html"
auto_open: false
Configuration Options
Profiling Section
configs: List of configuration files to test (e.g.,settings.openai.toml)modes: List of CUGA modes (fast,balanced,accurate)tasks: List of test tasks to runruns: Number of iterations per configurationoutput: Output directory and filename settingslangfuse: Langfuse connection settings (credentials from env vars)
Experiment Section
name: Name of the experimentdescription: Description of what's being testedruns: List of experiment runs to executename: Display name for the runtest_id: Specific test to run (format:config:mode:task)iterations: Number of times to run this testoutput: Output file path (use{{timestamp}}for dynamic naming)env: (Optional) Environment variables to set/unset for this run- Set a variable:
VAR_NAME: "value" - Unset a variable:
VAR_NAME: null
- Set a variable:
comparison: Settings for generating comparison HTML
Available Test IDs
Test IDs follow the format: config:mode:task
Configurations:
settings.openai.tomlsettings.azure.tomlsettings.watsonx.toml
Modes:
fastbalancedaccurate
Tasks:
test_get_top_account_by_revenue_streamtest_list_my_accountstest_find_vp_sales_active_high_value_accounts
To list all available test IDs:
./system_tests/profiling/bin/run_profiling.sh --list-tests
Advanced Usage
Command Line Interface
You can also use CLI arguments directly:
# Run specific configuration with CLI args
./system_tests/profiling/bin/run_profiling.sh \
--configs settings.openai.toml \
--modes fast,balanced \
--runs 3
# Run a single test by ID
./system_tests/profiling/bin/run_profiling.sh \
--test-id settings.openai.toml:fast:test_get_top_account_by_revenue_stream \
--runs 5
# Use config file but override runs
./system_tests/profiling/bin/run_profiling.sh \
--config-file default_experiment.yaml \
--runs 5
Direct Python Usage
# Run with config file
cd /path/to/project
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--config-file default_experiment.yaml
# Run with CLI arguments
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--configs settings.openai.toml \
--modes fast \
--tasks test_get_top_account_by_revenue_stream \
--runs 3 \
--output system_tests/profiling/reports/my_report.json
Output
Profiling Reports
Individual profiling runs generate JSON reports with:
- Summary Statistics: Total tests, success rate, timing
- Configuration Stats: Performance per config/mode
- Langfuse Metrics: LLM calls, tokens, costs, node timings
- Detailed Results: Complete test execution details
Comparison HTML
The comparison HTML (experiments/comparison.html) provides:
Interactive Visualizations:
- π Execution time comparison charts
- π° Cost analysis across modes
- π― Token usage visualization
- π LLM calls breakdown
- π Execution time variability (Min/Avg/Max with range and std dev)
- β‘ Time breakdown (generation vs processing)
- π Performance radar chart (normalized comparison)
Detailed Tables:
- Summary view of all experiments
- Configuration statistics table
- Per-run Langfuse metrics
- Aggregated metrics across runs
Features:
- Tab navigation between charts and tables
- Color-coded modes (Fast=green, Balanced=blue, Accurate=orange)
- Interactive tooltips on hover
- Automatic loading of all JSON files in the directory
- Modern, responsive design
Creating Custom Experiments
- Create a new YAML file in
system_tests/profiling/config/:
cp system_tests/profiling/config/default_experiment.yaml system_tests/profiling/config/my_experiment.yaml
Edit the configuration to match your experiment needs
Run your experiment:
./system_tests/profiling/run_experiment.sh --config my_experiment.yaml
Tips
- Use
{{timestamp}}in output paths for unique filenames - CLI arguments override config file settings
- The HTML comparison automatically picks up new JSON files
- Set credentials in
.envorconfig/.secrets.yaml - Use
--openflag to automatically open results in browser - Use
envin experiment runs to set/unset environment variables per run - Set
env.VAR: nullto explicitly unset an environment variable
Troubleshooting
Port Conflicts
The scripts automatically clean up processes on ports 8000, 8001, 7860.
Missing Credentials
Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set:
# Check if set
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY
# Set temporarily
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
# Or add to .env file
Configuration Not Found
If a config file isn't found, check:
- The file exists in
system_tests/profiling/config/ - The filename is correct (case-sensitive)
- You're running from the correct directory
Examples
Compare Fast vs Balanced (3 runs each)
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml
Compare Providers (OpenAI vs Azure vs WatsonX)
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
This compares different LLM providers using the same mode (balanced).
Compare All Modes with OpenAI
Create system_tests/profiling/config/all_modes.yaml:
experiment:
name: "all_modes_comparison"
runs:
- name: "fast"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/fast_{{timestamp}}.json"
- name: "balanced"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/balanced_{{timestamp}}.json"
- name: "accurate"
test_id: "settings.openai.toml:accurate:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/accurate_{{timestamp}}.json"
Then run:
./system_tests/profiling/run_experiment.sh --config all_modes.yaml --open
Full Matrix Comparison (Providers Γ Modes)
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
This creates a comprehensive comparison across multiple providers and modes.