| # ποΈ Architecture Overview | |
| ## System Architecture | |
| This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Gradio UI Layer β | |
| β - Question Input β | |
| β - Mode Selection (Think/Act/ReAct/All) β | |
| β - Three Output Panels (side-by-side comparison) β | |
| ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Agent Controller β | |
| β run_comparison() - Routes to appropriate mode handler β | |
| ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| ββββββββββββ΄βββββββββββ¬βββββββββββββββ | |
| βΌ βΌ βΌ | |
| ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ | |
| β Think-Only β β Act-Only β β ReAct β | |
| β Mode β β Mode β β Mode β | |
| ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ | |
| β β β | |
| βΌ βΌ βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β LLM Interface β | |
| β call_llm() - Communicates with openai/gpt-oss-20b β | |
| ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ (Act-Only & ReAct modes only) | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Tool Executor β | |
| β - parse_action() β | |
| β - call_tool() β | |
| ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βββββββββββββ΄ββββββββββββ¬ββββββββββββ¬ββββββββββββ¬βββββββ | |
| βΌ βΌ βΌ βΌ βΌ | |
| ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ | |
| β DuckDuckGo β β Wikipedia β βWeatherβ βCalcβ β Python β | |
| β Search β β Search β β API β β β β REPL β | |
| ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ | |
| ``` | |
| ## Component Details | |
| ### 1. **Tool Layer** | |
| Each tool is wrapped in a `Tool` class with: | |
| - **name**: Identifier for the LLM to reference | |
| - **description**: Instructions for when/how to use the tool | |
| - **func**: The actual implementation | |
| **Tool Implementations:** | |
| - `duckduckgo_search()`: Uses DuckDuckGo's JSON API | |
| - `wikipedia_search()`: Uses the Wikipedia Python library | |
| - `get_weather()`: Queries wttr.in API for weather data | |
| - `calculate()`: Safe AST-based math expression evaluator | |
| - `python_repl()`: Sandboxed Python execution with whitelisted builtins | |
| ### 2. **Agent Modes** | |
| #### Think-Only Mode (`think_only_mode`) | |
| ``` | |
| User Question β System Prompt β LLM β Thoughts β Answer | |
| ``` | |
| - Single LLM call with CoT prompt | |
| - No tool access | |
| - Shows reasoning steps | |
| - Best for knowledge-based questions | |
| #### Act-Only Mode (`act_only_mode`) | |
| ``` | |
| User Question β System Prompt β LLM β Action | |
| β | |
| Execute Tool β Observation | |
| β | |
| LLM β Action/Answer | |
| β | |
| ... | |
| ``` | |
| - Iterative loop: Action β Observation | |
| - No explicit "Thought" step | |
| - Maximum 5 iterations | |
| - Best for information gathering | |
| #### ReAct Mode (`react_mode`) | |
| ``` | |
| User Question β System Prompt β LLM β Thought β Action | |
| β | |
| Execute Tool β Observation | |
| β | |
| LLM β Thought β Action/Answer | |
| β | |
| ... | |
| ``` | |
| - Full Thought-Action-Observation cycle | |
| - Most comprehensive reasoning | |
| - Maximum 5 iterations | |
| - Best for complex multi-step problems | |
| ### 3. **LLM Interface** | |
| **`call_llm()` Function:** | |
| - Uses Hugging Face Inference API | |
| - Model: openai/gpt-oss-20b | |
| - Supports chat format (messages list) | |
| - Configurable temperature and max_tokens | |
| **Authentication:** | |
| - Requires `HF_TOKEN` environment variable | |
| - Set in Space secrets (secure) | |
| ### 4. **Parsing & Control Flow** | |
| **`parse_action()` Function:** | |
| - Extracts `Action:` and `Action Input:` from LLM response | |
| - Uses regex to handle various formats | |
| - Returns (action_name, action_input) tuple | |
| **Iteration Control:** | |
| - Max 5 iterations per mode to prevent infinite loops | |
| - Early termination when "Answer:" detected | |
| - Error handling for malformed responses | |
| ### 5. **UI Layer (Gradio)** | |
| **Components:** | |
| - **Input Section**: Question textbox + mode dropdown | |
| - **Example Buttons**: Pre-filled question templates | |
| - **Output Panels**: Three side-by-side Markdown displays | |
| - **Streaming**: Generator functions for real-time updates | |
| **User Flow:** | |
| 1. User enters question or clicks example | |
| 2. Selects mode (or "All" for comparison) | |
| 3. Clicks "Run" | |
| 4. Sees real-time updates in output panel(s) | |
| 5. Views final answer and complete reasoning trace | |
| ## Data Flow Example | |
| ### Example: "What's the weather in Paris?" | |
| **Mode: ReAct** | |
| 1. User submits question | |
| 2. `react_mode()` called with question | |
| 3. Prompt formatted with question + tool descriptions | |
| 4. First LLM call: | |
| ``` | |
| Thought: I need to check the current weather in Paris | |
| Action: get_weather | |
| Action Input: Paris | |
| ``` | |
| 5. `parse_action()` extracts tool call | |
| 6. `call_tool("get_weather", "Paris")` executes | |
| 7. Observation: "Weather in Paris: Cloudy, 15Β°C..." | |
| 8. Second LLM call with observation | |
| 9. LLM responds: | |
| ``` | |
| Thought: I have the weather information | |
| Answer: The current weather in Paris is... | |
| ``` | |
| 10. Generator yields formatted output to UI | |
| 11. User sees complete trace in ReAct panel | |
| ## Key Design Patterns | |
| ### 1. **Generator Pattern for Streaming** | |
| ```python | |
| def mode(question: str) -> Generator[str, None, None]: | |
| yield "Step 1..." | |
| # process | |
| yield "Step 2..." | |
| # etc | |
| ``` | |
| Enables real-time UI updates without blocking | |
| ### 2. **Tool Registry Pattern** | |
| ```python | |
| TOOLS = [Tool(name, description, func), ...] | |
| ``` | |
| Easy to add new tools - just append to list | |
| ### 3. **Prompt Templates** | |
| ```python | |
| PROMPT = """...""".format(question=q, tools=t) | |
| ``` | |
| Modular prompts for each mode | |
| ### 4. **Safe Execution** | |
| - AST parsing for calculator (no `eval()`) | |
| - Whitelisted builtins for Python REPL | |
| - Timeout limits on API calls | |
| - Error handling with fallback messages | |
| ## Extensibility | |
| ### Adding a New Tool | |
| ```python | |
| def my_tool(input: str) -> str: | |
| # Implementation | |
| return result | |
| TOOLS.append(Tool( | |
| name="my_tool", | |
| description="When to use this tool...", | |
| func=my_tool | |
| )) | |
| ``` | |
| ### Adding a New Mode | |
| ```python | |
| def hybrid_mode(question: str) -> Generator[str, None, None]: | |
| # Custom logic mixing elements | |
| yield "Starting hybrid mode..." | |
| # ... | |
| # Add to run_comparison() and UI dropdown | |
| ``` | |
| ### Customizing Prompts | |
| Edit the `*_PROMPT` constants to change agent behavior: | |
| - Add constraints | |
| - Change format | |
| - Provide examples | |
| - Adjust tone | |
| ## Performance Considerations | |
| 1. **API Latency**: Model calls take 2-5 seconds | |
| 2. **Tool Latency**: External APIs add 1-2 seconds per call | |
| 3. **Iteration Count**: 5 iterations max = ~30 seconds worst case | |
| 4. **Parallel Modes**: "All" mode runs sequentially (not parallel) | |
| ## Security Notes | |
| 1. **API Keys**: Never commit `HF_TOKEN` to repo | |
| 2. **Python REPL**: Sandboxed with limited builtins | |
| 3. **User Input**: Sanitized before tool execution | |
| 4. **Rate Limits**: Consider adding rate limiting for production | |
| ## Testing Strategy | |
| 1. **Unit Tests**: Test individual tool functions | |
| 2. **Integration Tests**: Test mode handlers end-to-end | |
| 3. **Prompt Tests**: Verify LLM responses parse correctly | |
| 4. **UI Tests**: Test Gradio interface components | |
| ## Future Enhancements | |
| - [ ] Add memory/conversation history | |
| - [ ] Implement parallel tool calling | |
| - [ ] Add caching layer for repeated queries | |
| - [ ] Support custom user tools | |
| - [ ] Add performance metrics/timing | |
| - [ ] Implement token counting/cost tracking | |
| - [ ] Add export functionality for reasoning traces | |