ReACT / ARCHITECTURE.md
HAMMALE's picture
Initial ReAct Space: Compare Think-Only, Act-Only, and ReAct reasoning modes
35bd451
# πŸ—οΈ Architecture Overview
## System Architecture
This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio UI Layer β”‚
β”‚ - Question Input β”‚
β”‚ - Mode Selection (Think/Act/ReAct/All) β”‚
β”‚ - Three Output Panels (side-by-side comparison) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Agent Controller β”‚
β”‚ run_comparison() - Routes to appropriate mode handler β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Think-Only β”‚ β”‚ Act-Only β”‚ β”‚ ReAct β”‚
β”‚ Mode β”‚ β”‚ Mode β”‚ β”‚ Mode β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM Interface β”‚
β”‚ call_llm() - Communicates with openai/gpt-oss-20b β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό (Act-Only & ReAct modes only)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tool Executor β”‚
β”‚ - parse_action() β”‚
β”‚ - call_tool() β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDuckGo β”‚ β”‚ Wikipedia β”‚ β”‚Weatherβ”‚ β”‚Calcβ”‚ β”‚ Python β”‚
β”‚ Search β”‚ β”‚ Search β”‚ β”‚ API β”‚ β”‚ β”‚ β”‚ REPL β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Component Details
### 1. **Tool Layer**
Each tool is wrapped in a `Tool` class with:
- **name**: Identifier for the LLM to reference
- **description**: Instructions for when/how to use the tool
- **func**: The actual implementation
**Tool Implementations:**
- `duckduckgo_search()`: Uses DuckDuckGo's JSON API
- `wikipedia_search()`: Uses the Wikipedia Python library
- `get_weather()`: Queries wttr.in API for weather data
- `calculate()`: Safe AST-based math expression evaluator
- `python_repl()`: Sandboxed Python execution with whitelisted builtins
### 2. **Agent Modes**
#### Think-Only Mode (`think_only_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Thoughts β†’ Answer
```
- Single LLM call with CoT prompt
- No tool access
- Shows reasoning steps
- Best for knowledge-based questions
#### Act-Only Mode (`act_only_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Action
↓
Execute Tool β†’ Observation
↓
LLM β†’ Action/Answer
↓
...
```
- Iterative loop: Action β†’ Observation
- No explicit "Thought" step
- Maximum 5 iterations
- Best for information gathering
#### ReAct Mode (`react_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Thought β†’ Action
↓
Execute Tool β†’ Observation
↓
LLM β†’ Thought β†’ Action/Answer
↓
...
```
- Full Thought-Action-Observation cycle
- Most comprehensive reasoning
- Maximum 5 iterations
- Best for complex multi-step problems
### 3. **LLM Interface**
**`call_llm()` Function:**
- Uses Hugging Face Inference API
- Model: openai/gpt-oss-20b
- Supports chat format (messages list)
- Configurable temperature and max_tokens
**Authentication:**
- Requires `HF_TOKEN` environment variable
- Set in Space secrets (secure)
### 4. **Parsing & Control Flow**
**`parse_action()` Function:**
- Extracts `Action:` and `Action Input:` from LLM response
- Uses regex to handle various formats
- Returns (action_name, action_input) tuple
**Iteration Control:**
- Max 5 iterations per mode to prevent infinite loops
- Early termination when "Answer:" detected
- Error handling for malformed responses
### 5. **UI Layer (Gradio)**
**Components:**
- **Input Section**: Question textbox + mode dropdown
- **Example Buttons**: Pre-filled question templates
- **Output Panels**: Three side-by-side Markdown displays
- **Streaming**: Generator functions for real-time updates
**User Flow:**
1. User enters question or clicks example
2. Selects mode (or "All" for comparison)
3. Clicks "Run"
4. Sees real-time updates in output panel(s)
5. Views final answer and complete reasoning trace
## Data Flow Example
### Example: "What's the weather in Paris?"
**Mode: ReAct**
1. User submits question
2. `react_mode()` called with question
3. Prompt formatted with question + tool descriptions
4. First LLM call:
```
Thought: I need to check the current weather in Paris
Action: get_weather
Action Input: Paris
```
5. `parse_action()` extracts tool call
6. `call_tool("get_weather", "Paris")` executes
7. Observation: "Weather in Paris: Cloudy, 15Β°C..."
8. Second LLM call with observation
9. LLM responds:
```
Thought: I have the weather information
Answer: The current weather in Paris is...
```
10. Generator yields formatted output to UI
11. User sees complete trace in ReAct panel
## Key Design Patterns
### 1. **Generator Pattern for Streaming**
```python
def mode(question: str) -> Generator[str, None, None]:
yield "Step 1..."
# process
yield "Step 2..."
# etc
```
Enables real-time UI updates without blocking
### 2. **Tool Registry Pattern**
```python
TOOLS = [Tool(name, description, func), ...]
```
Easy to add new tools - just append to list
### 3. **Prompt Templates**
```python
PROMPT = """...""".format(question=q, tools=t)
```
Modular prompts for each mode
### 4. **Safe Execution**
- AST parsing for calculator (no `eval()`)
- Whitelisted builtins for Python REPL
- Timeout limits on API calls
- Error handling with fallback messages
## Extensibility
### Adding a New Tool
```python
def my_tool(input: str) -> str:
# Implementation
return result
TOOLS.append(Tool(
name="my_tool",
description="When to use this tool...",
func=my_tool
))
```
### Adding a New Mode
```python
def hybrid_mode(question: str) -> Generator[str, None, None]:
# Custom logic mixing elements
yield "Starting hybrid mode..."
# ...
# Add to run_comparison() and UI dropdown
```
### Customizing Prompts
Edit the `*_PROMPT` constants to change agent behavior:
- Add constraints
- Change format
- Provide examples
- Adjust tone
## Performance Considerations
1. **API Latency**: Model calls take 2-5 seconds
2. **Tool Latency**: External APIs add 1-2 seconds per call
3. **Iteration Count**: 5 iterations max = ~30 seconds worst case
4. **Parallel Modes**: "All" mode runs sequentially (not parallel)
## Security Notes
1. **API Keys**: Never commit `HF_TOKEN` to repo
2. **Python REPL**: Sandboxed with limited builtins
3. **User Input**: Sanitized before tool execution
4. **Rate Limits**: Consider adding rate limiting for production
## Testing Strategy
1. **Unit Tests**: Test individual tool functions
2. **Integration Tests**: Test mode handlers end-to-end
3. **Prompt Tests**: Verify LLM responses parse correctly
4. **UI Tests**: Test Gradio interface components
## Future Enhancements
- [ ] Add memory/conversation history
- [ ] Implement parallel tool calling
- [ ] Add caching layer for repeated queries
- [ ] Support custom user tools
- [ ] Add performance metrics/timing
- [ ] Implement token counting/cost tracking
- [ ] Add export functionality for reasoning traces