Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions .github/CODING_AGENT_ISSUE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# GitHub Issue: CodingAgent Feature Request

Use this content to create an issue at:
https://github.com/google/adk-python/issues/new?template=feature_request.md

---

## Title

feat(agents): Add CodingAgent (agents that think in code)

---

## Is your feature request related to a problem? Please describe.

ADK’s current default agent interaction pattern is “tool selection from a fixed action set”. This is powerful, but it breaks down for two increasingly common workloads:

1) Long-context work beyond model context windows
- Many real tasks require operating over very large corpora: codebases, logs, datasets, multi-file configs, or long documents.
- If the agent must keep the relevant source text and intermediate results inside the LLM context, it becomes context-window bound and expensive.
- Recent work such as “Recursive Language Models” (arXiv:2512.24601) proposes treating long prompts as an external environment and letting the model programmatically examine/decompose/recursively process snippets. This suggests a practical direction for agents: move heavy inspection, decomposition, and intermediate state out of the prompt and into an execution environment.
- https://arxiv.org/abs/2512.24601

2) Expressiveness and composability limits of pure tool-calling
- Tool-calling assumes we can enumerate actions up-front. In open-ended tasks, the agent needs to compose multiple operations, iterate, cache intermediate artifacts, and implement “one-off” transformations without requiring new bespoke tools each time.
- A code-based action space lets the agent compose operations naturally (loops, conditionals, helper functions), which reduces the need for an explosion of tools.

3) Developer experience gap for building “coding agents” and sub-agent architectures
- Users increasingly want agent systems like Claude Code / OpenCode: multi-step coding workflows with sub-agents (planner, tester, refactorer, etc.) and strong “think in code” execution.
- ADK has strong orchestration primitives; adding a first-class code-executing agent unlocks building these systems within ADK while keeping sandboxing and tool integration.

Related inspiration: HuggingFace “smolagents” positions CodeAgent as a first-class concept (“agents that think in code”) and supports sandbox backends (Docker, etc.).
- https://github.com/huggingface/smolagents

---

## Describe the solution you’d like

Add a new experimental agent type: CodingAgent.

CodingAgent should:
- Generate Python code as the primary action representation (in `tool_code` blocks).
- Execute that code in a sandboxed environment (Docker-based initially).
- Allow generated code to call ADK tools safely via an IPC bridge (e.g., HTTP) rather than exposing the host runtime directly.
- Support iterative execution (ReAct-style loop): generate → run → observe stdout/tool results → refine → final answer.

Why this solves the problem
- Long-context: aligns with the “external environment” framing in arXiv:2512.24601 by enabling the agent to iteratively inspect, decompose, and process large inputs using code and persisted artifacts, instead of forcing all content into the model context.
- Composability: code enables arbitrary composition (loops, conditionals, helper functions) without requiring every combination to be implemented as a first-class tool.
- Coding-agent architectures: makes it straightforward to build higher-level workflows and multi-agent hierarchies where sub-agents can generate/run code for specialized tasks.

High-level architecture

User → CodingAgent (LLM) → sandbox executor (Docker Python)
↘ tool IPC server on host ↙

Proposed execution environments (progressive)
- v1: Docker Python sandbox (existing ContainerCodeExecutor integration)
- future: REPL / Jupyter-kernel style execution modes for interactive, stateful sessions (still sandboxed)

---

## Describe alternatives you’ve considered

1) “Just add a code-execution tool” to existing agents
- Pros: minimal surface-area change.
- Cons: code execution becomes an occasional tool call rather than the agent’s primary action space; harder to support tight generate→execute→iterate loops and long-context strategies that rely on an external environment.

2) Require users to write bespoke tools for every operation
- Pros: explicit and controlled.
- Cons: does not scale; real workflows need ad-hoc transformations and composition that explode the tool surface area.

3) Run code on the host interpreter
- Pros: simplest.
- Cons: unacceptable security risk; sandboxing is required for a general-purpose code agent.

---

## Additional context

Future directions enabled by CodingAgent
- Long-context scaffolds inspired by arXiv:2512.24601: treat large inputs (files, repo trees, logs) as an “environment” the agent queries/decomposes recursively using code, storing intermediate state outside the LLM context.
- Sub-agent coding workflows (Claude Code / OpenCode style): planner/tester/refactor sub-agents coordinated by ADK, each using code execution.
- Multiple sandbox backends (like smolagents): Docker initially, with optional future support for other sandboxes and interactive execution modes.

Links
- smolagents (inspiration): https://github.com/huggingface/smolagents
- Recursive Language Models (long-context framing): https://arxiv.org/abs/2512.24601

Labels to add
- enhancement
- agents
- new-feature
- experimental
148 changes: 148 additions & 0 deletions .github/CODING_AGENT_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# CodingAgent - Implementation Plan & Status

This document tracks the implementation of CodingAgent, an experimental agent type that generates and executes Python code in sandboxed containers.

## Overview

CodingAgent is a ReAct-style agent that:
- Generates Python code to solve tasks using an LLM (Gemini)
- Executes code in sandboxed Docker containers
- Calls ADK tools from generated code via HTTP IPC
- Iterates until a final answer is produced

## Implementation Status

### Core Components ✅ Complete

| Component | File | Status | Lines |
|-----------|------|--------|-------|
| CodingAgent | `src/google/adk/agents/coding_agent.py` | ✅ Complete | ~610 |
| CodingAgentConfig | `src/google/adk/agents/coding_agent_config.py` | ✅ Complete | ~225 |
| CodingAgentCodeExecutor | `src/google/adk/code_executors/coding_agent_code_executor.py` | ✅ Complete | ~505 |
| ToolCodeGenerator | `src/google/adk/code_executors/tool_code_generator.py` | ✅ Complete | ~475 |
| ToolExecutionServer | `src/google/adk/code_executors/tool_execution_server.py` | ✅ Complete | ~365 |
| AllowlistValidator | `src/google/adk/code_executors/allowlist_validator.py` | ✅ Complete | ~355 |

### Sample Agent ✅ Complete

| File | Status | Description |
|------|--------|-------------|
| `contributing/samples/coding_agent/agent.py` | ✅ Complete | Data Analysis Agent (~360 lines) |
| `contributing/samples/coding_agent/README.md` | ✅ Complete | Documentation (~290 lines) |
| `contributing/samples/coding_agent/__init__.py` | ✅ Complete | Module init |

### Unit Tests ✅ Complete

| Test File | Status | Lines |
|-----------|--------|-------|
| `tests/unittests/agents/test_coding_agent.py` | ✅ Complete | ~310 |
| `tests/unittests/code_executors/test_allowlist_validator.py` | ✅ Complete | ~320 |
| `tests/unittests/code_executors/test_tool_code_generator.py` | ✅ Complete | ~320 |

### Manual E2E Tests ✅ Passed

| Test Scenario | Status | Notes |
|--------------|--------|-------|
| Basic math query ("What is 25 * 17?") | ✅ Passed | Returns 425 |
| Data analysis (Titanic survival rate) | ✅ Passed | Returns 38.38% |
| Visualization (bar chart by class) | ✅ Passed | Chart saved to host |
| Multi-step analysis | ✅ Passed | Stats + visualization + insights |
| Tool calling via HTTP IPC | ✅ Passed | fetch_url, save_chart work |
| Error handling (pip warnings) | ✅ Passed | Ignores non-fatal stderr |
| Chart saving to host system | ✅ Passed | Saved to /tmp/adk_charts/ |

## Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │────▶│ CodingAgent │────▶│ Docker Container│
│ │ │ (Gemini LLM) │ │ (Python 3.11) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ │ Executes
▼ │ generated code
┌──────────────┐ │
│ Tool Server │◀────────────────┘
│ (HTTP IPC) │ Tool calls via HTTP
└──────────────┘
```

### How Tool IPC Works

1. CodingAgent starts ToolExecutionServer on host (port 8765)
2. Code is generated with tool stubs that make HTTP POST requests
3. Container reaches host via `host.docker.internal` (macOS/Windows) or bridge gateway (Linux)
4. Tool server executes actual tool functions with proper context
5. Results returned to container via HTTP response

## Key Design Decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Container image | `python:3.11-slim` + runtime pip | Simpler for users, no custom Dockerfile |
| Tool communication | HTTP IPC | Works across container boundary, secure |
| Import validation | Allowlist-based | Security without blocking legitimate use |
| Chart saving | `save_chart` tool | Transfers data to host filesystem |
| Error handling | Distinguish warnings from errors | pip warnings shouldn't fail execution |

## Sample Agent: Data Analyst

### Tools Available

| Tool | Description |
|------|-------------|
| `fetch_url(url)` | Fetch CSV/JSON/text from URLs |
| `get_sample_datasets()` | List available datasets (Titanic, Iris, Tips) |
| `get_current_time()` | Get current timestamp |
| `save_chart(image_data, filename)` | Save base64 chart to host |
| `list_saved_charts()` | List saved charts |

### Example Queries

1. "What is the survival rate on the Titanic?"
2. "Create a bar chart showing survival rate by passenger class"
3. "Analyze the iris dataset and create a scatter plot colored by species"
4. "Perform comprehensive analysis: stats, survival rates, visualization, insights"

## Files Changed Summary

```
.github/CODING_AGENT_PLAN.md | Plan document
contributing/samples/coding_agent/README.md | 290 lines
contributing/samples/coding_agent/__init__.py | 17 lines
contributing/samples/coding_agent/agent.py | 360 lines
src/google/adk/agents/__init__.py | +2 exports
src/google/adk/agents/coding_agent.py | 610 lines
src/google/adk/agents/coding_agent_config.py | 225 lines
src/google/adk/code_executors/__init__.py | +6 exports
src/google/adk/code_executors/allowlist_validator.py | 355 lines
src/google/adk/code_executors/coding_agent_code_executor.py | 505 lines
src/google/adk/code_executors/tool_code_generator.py | 475 lines
src/google/adk/code_executors/tool_execution_server.py | 365 lines
tests/unittests/agents/test_coding_agent.py | 310 lines
tests/unittests/code_executors/test_allowlist_validator.py | 320 lines
tests/unittests/code_executors/test_tool_code_generator.py | 320 lines
```

**Total: ~4,200 lines of new code**

## PR Checklist

- [x] Implementation complete
- [x] Unit tests written and passing
- [x] Manual E2E tests passing
- [x] Sample agent created with README
- [x] Code follows ADK style guide (relative imports, `from __future__ import annotations`)
- [x] Marked as `@experimental`
- [ ] Run `./autoformat.sh` before PR
- [ ] Run full test suite: `pytest tests/unittests`
- [ ] Create GitHub issue (see `.github/CODING_AGENT_ISSUE.md`)
- [ ] Submit PR with testing plan

## Future Enhancements (Out of Scope)

- Stateful execution (persist variables across turns)
- Custom container images with pre-installed packages
- VertexAI code execution integration
- Support for JavaScript/TypeScript
- Streaming output during execution
Loading