Spaces:

HAMMALE
/

ReACT

Paused

App Files Files Community

ReACT / ARCHITECTURE.md

HAMMALE

Initial ReAct Space: Compare Think-Only, Act-Only, and ReAct reasoning modes

35bd451 about 1 month ago

preview code

raw

history blame contribute delete

10.2 kB

	# 🏗️ Architecture Overview

	## System Architecture

	This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:

	```
	┌─────────────────────────────────────────────────────────────┐
	│ Gradio UI Layer │
	│ - Question Input │
	│ - Mode Selection (Think/Act/ReAct/All) │
	│ - Three Output Panels (side-by-side comparison) │
	└──────────────────┬──────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ Agent Controller │
	│ run_comparison() - Routes to appropriate mode handler │
	└──────────────────┬──────────────────────────────────────────┘
	│
	┌──────────┴──────────┬──────────────┐
	▼ ▼ ▼
	┌──────────────┐ ┌──────────────┐ ┌──────────────┐
	│ Think-Only │ │ Act-Only │ │ ReAct │
	│ Mode │ │ Mode │ │ Mode │
	└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
	│ │ │
	▼ ▼ ▼
	┌─────────────────────────────────────────────────────────────┐
	│ LLM Interface │
	│ call_llm() - Communicates with openai/gpt-oss-20b │
	└──────────────────┬──────────────────────────────────────────┘
	│
	▼ (Act-Only & ReAct modes only)
	┌─────────────────────────────────────────────────────────────┐
	│ Tool Executor │
	│ - parse_action() │
	│ - call_tool() │
	└──────────────────┬──────────────────────────────────────────┘
	│
	┌───────────┴───────────┬───────────┬───────────┬──────┐
	▼ ▼ ▼ ▼ ▼
	┌────────────┐ ┌────────────┐ ┌──────┐ ┌────┐ ┌─────────┐
	│ DuckDuckGo │ │ Wikipedia │ │Weather│ │Calc│ │ Python │
	│ Search │ │ Search │ │ API │ │ │ │ REPL │
	└────────────┘ └────────────┘ └──────┘ └────┘ └─────────┘
	```

	## Component Details

	### 1. Tool Layer

	Each tool is wrapped in a `Tool` class with:
	- name: Identifier for the LLM to reference
	- description: Instructions for when/how to use the tool
	- func: The actual implementation

	Tool Implementations:

	- `duckduckgo_search()`: Uses DuckDuckGo's JSON API
	- `wikipedia_search()`: Uses the Wikipedia Python library
	- `get_weather()`: Queries wttr.in API for weather data
	- `calculate()`: Safe AST-based math expression evaluator
	- `python_repl()`: Sandboxed Python execution with whitelisted builtins

	### 2. Agent Modes

	#### Think-Only Mode (`think_only_mode`)
	```
	User Question → System Prompt → LLM → Thoughts → Answer
	```
	- Single LLM call with CoT prompt
	- No tool access
	- Shows reasoning steps
	- Best for knowledge-based questions

	#### Act-Only Mode (`act_only_mode`)
	```
	User Question → System Prompt → LLM → Action
	↓
	Execute Tool → Observation
	↓
	LLM → Action/Answer
	↓
	...
	```
	- Iterative loop: Action → Observation
	- No explicit "Thought" step
	- Maximum 5 iterations
	- Best for information gathering

	#### ReAct Mode (`react_mode`)
	```
	User Question → System Prompt → LLM → Thought → Action
	↓
	Execute Tool → Observation
	↓
	LLM → Thought → Action/Answer
	↓
	...
	```
	- Full Thought-Action-Observation cycle
	- Most comprehensive reasoning
	- Maximum 5 iterations
	- Best for complex multi-step problems

	### 3. LLM Interface

	`call_llm()` Function:
	- Uses Hugging Face Inference API
	- Model: openai/gpt-oss-20b
	- Supports chat format (messages list)
	- Configurable temperature and max_tokens

	Authentication:
	- Requires `HF_TOKEN` environment variable
	- Set in Space secrets (secure)

	### 4. Parsing & Control Flow

	`parse_action()` Function:
	- Extracts `Action:` and `Action Input:` from LLM response
	- Uses regex to handle various formats
	- Returns (action_name, action_input) tuple

	Iteration Control:
	- Max 5 iterations per mode to prevent infinite loops
	- Early termination when "Answer:" detected
	- Error handling for malformed responses

	### 5. UI Layer (Gradio)

	Components:
	- Input Section: Question textbox + mode dropdown
	- Example Buttons: Pre-filled question templates
	- Output Panels: Three side-by-side Markdown displays
	- Streaming: Generator functions for real-time updates

	User Flow:
	1. User enters question or clicks example
	2. Selects mode (or "All" for comparison)
	3. Clicks "Run"
	4. Sees real-time updates in output panel(s)
	5. Views final answer and complete reasoning trace

	## Data Flow Example

	### Example: "What's the weather in Paris?"

	Mode: ReAct

	1. User submits question
	2. `react_mode()` called with question
	3. Prompt formatted with question + tool descriptions
	4. First LLM call:
	```
	Thought: I need to check the current weather in Paris
	Action: get_weather
	Action Input: Paris
	```
	5. `parse_action()` extracts tool call
	6. `call_tool("get_weather", "Paris")` executes
	7. Observation: "Weather in Paris: Cloudy, 15°C..."
	8. Second LLM call with observation
	9. LLM responds:
	```
	Thought: I have the weather information
	Answer: The current weather in Paris is...
	```
	10. Generator yields formatted output to UI
	11. User sees complete trace in ReAct panel

	## Key Design Patterns

	### 1. Generator Pattern for Streaming
	```python
	def mode(question: str) -> Generator[str, None, None]:
	yield "Step 1..."
	# process
	yield "Step 2..."
	# etc
	```
	Enables real-time UI updates without blocking

	### 2. Tool Registry Pattern
	```python
	TOOLS = [Tool(name, description, func), ...]
	```
	Easy to add new tools - just append to list

	### 3. Prompt Templates
	```python
	PROMPT = """...""".format(question=q, tools=t)
	```
	Modular prompts for each mode

	### 4. Safe Execution
	- AST parsing for calculator (no `eval()`)
	- Whitelisted builtins for Python REPL
	- Timeout limits on API calls
	- Error handling with fallback messages

	## Extensibility

	### Adding a New Tool

	```python
	def my_tool(input: str) -> str:
	# Implementation
	return result

	TOOLS.append(Tool(
	name="my_tool",
	description="When to use this tool...",
	func=my_tool
	))
	```

	### Adding a New Mode

	```python
	def hybrid_mode(question: str) -> Generator[str, None, None]:
	# Custom logic mixing elements
	yield "Starting hybrid mode..."
	# ...

	# Add to run_comparison() and UI dropdown
	```

	### Customizing Prompts

	Edit the `*_PROMPT` constants to change agent behavior:
	- Add constraints
	- Change format
	- Provide examples
	- Adjust tone

	## Performance Considerations

	1. API Latency: Model calls take 2-5 seconds
	2. Tool Latency: External APIs add 1-2 seconds per call
	3. Iteration Count: 5 iterations max = ~30 seconds worst case
	4. Parallel Modes: "All" mode runs sequentially (not parallel)

	## Security Notes

	1. API Keys: Never commit `HF_TOKEN` to repo
	2. Python REPL: Sandboxed with limited builtins
	3. User Input: Sanitized before tool execution
	4. Rate Limits: Consider adding rate limiting for production

	## Testing Strategy

	1. Unit Tests: Test individual tool functions
	2. Integration Tests: Test mode handlers end-to-end
	3. Prompt Tests: Verify LLM responses parse correctly
	4. UI Tests: Test Gradio interface components

	## Future Enhancements

	- [ ] Add memory/conversation history
	- [ ] Implement parallel tool calling
	- [ ] Add caching layer for repeated queries
	- [ ] Support custom user tools
	- [ ] Add performance metrics/timing
	- [ ] Implement token counting/cost tracking
	- [ ] Add export functionality for reasoning traces