feat(agents): Add CodingAgent (agents that think in code) #4259

Sudhendra · 2026-01-25T21:12:17Z

Please ensure you have read the contribution guide before creating a pull request.**

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Closes: feat(agents): Add CodingAgent (agents that think in code) #4198

2. Or, if no issue exists, describe the change:

N/A - This PR implements the feature request described in issue #4198.

Description

Problem:
ADK's current default agent interaction pattern is "tool selection from a fixed action set." This breaks down for several increasingly common workloads:

Long-context work beyond model context windows: Many real tasks require operating over very large corpora (codebases, logs, datasets, multi-file configs). If the agent must keep all relevant content in the LLM context, it becomes context-window bound and expensive.
Expressiveness and composability limits of pure tool-calling: Tool-calling assumes enumerable actions. Open-ended tasks require composing operations, iterating, caching intermediate artifacts, and implementing one-off transformations without requiring new bespoke tools each time.
Developer experience gap for building "coding agents": Users want agent systems like Claude Code/OpenCode with multi-step coding workflows, sub-agents, and strong "think in code" execution capabilities.

Solution:
Introduce a new experimental agent type: CodingAgent - an agent that generates and executes Python code as its primary action representation.

Key features:

ReAct-style execution loop: Generate code → Execute → Observe results → Refine → Final answer
Sandboxed execution: Code runs in Docker containers via ContainerCodeExecutor for security
Tool integration via HTTP IPC: Generated code can call ADK tools through a ToolExecutionServer running on the host
Import validation: AllowlistValidator ensures only authorized imports are allowed
Stateful execution: Optional state persistence across iterations
Full telemetry: OpenTelemetry spans for code generation, execution, and LLM calls

Architecture:

User → CodingAgent (LLM) → Docker Container (Python sandbox)
                               ↓
                         Tool Server (HTTP IPC on host)
                               ↓
                         ADK Tools with ToolContext

New files introduced:

src/google/adk/agents/coding_agent.py - Main CodingAgent class
src/google/adk/agents/coding_agent_config.py - Pydantic configuration
src/google/adk/code_executors/coding_agent_code_executor.py - Executor wrapper with tool injection
src/google/adk/code_executors/tool_execution_server.py - FastAPI server for tool IPC
src/google/adk/code_executors/tool_code_generator.py - System prompt and stub generation
src/google/adk/code_executors/allowlist_validator.py - Import validation
contributing/samples/coding_agent/ - Sample data analysis agent with documentation

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

Test files added:

tests/unittests/agents/test_coding_agent.py - Tests for CodingAgent, CodingAgentConfig, CodingAgentState
tests/unittests/code_executors/test_tool_code_generator.py - Tests for prompt generation, tool stubs, runtime header
tests/unittests/code_executors/test_allowlist_validator.py - Tests for import validation

Test coverage includes:

CodingAgentConfig default values and validation (max_iterations bounds, port bounds)
CodingAgentState serialization and history tracking
CodingAgent creation with default and custom configurations
Code block extraction (tool_code and python blocks, preference order)
Error feedback formatting
Tool resolution from functions and BaseTool instances
Tool stub generation with type hints and docstrings
Runtime header generation with trace collection
System prompt generation with tool documentation
Import allowlist validation

pytest summary:

# Run all CodingAgent tests
pytest tests/unittests/agents/test_coding_agent.py -v

# Run tool code generator tests
pytest tests/unittests/code_executors/test_tool_code_generator.py -v

# Run allowlist validator tests
pytest tests/unittests/code_executors/test_allowlist_validator.py -v

# Run all related tests
pytest tests/unittests/agents/test_coding_agent.py tests/unittests/code_executors/test_tool_code_generator.py tests/unittests/code_executors/test_allowlist_validator.py -v

Manual End-to-End (E2E) Tests:

Prerequisites:

Docker installed and running
GOOGLE_API_KEY environment variable set

Test the sample agent:

# Interactive CLI mode
adk run contributing/samples/coding_agent

# Web UI mode
adk web contributing/samples
# Navigate to http://localhost:8000 and select coding_agent

Example test interactions:

Basic data analysis:

User: What is the survival rate on the Titanic dataset?
Expected: Agent fetches Titanic CSV, analyzes it with pandas, returns ~38.4% survival rate

Visualization with chart saving:

User: Create a bar chart showing survival rate by passenger class on the Titanic
Expected: Agent creates matplotlib chart, saves via save_chart tool to /tmp/adk_charts/

Multi-step analysis:

User: Analyze the iris dataset and give me key insights
Expected: Agent iteratively fetches data, runs statistical analysis, potentially creates visualizations

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

Design decisions:

Experimental decorator: CodingAgent is marked with @experimental to indicate this is a new feature that may evolve.
Default to ContainerCodeExecutor: Security-first approach - code executes in isolated Docker containers by default. Users can supply custom executors if needed.
HTTP IPC over direct execution: Tools run on the host, not in containers. This maintains security isolation while allowing full ToolContext capabilities.
Import allowlist: DEFAULT_SAFE_IMPORTS provides a conservative set of safe modules. Users can extend this for specific use cases (e.g., adding pandas.* for data analysis).
ReAct-style iteration: The agent can observe execution results and iteratively refine its approach, similar to how human developers debug code.

Future enhancements enabled by this architecture:

REPL/Jupyter-kernel style execution modes for interactive sessions
Long-context scaffolds inspired by arXiv:2512.24601
Sub-agent coding workflows (planner/tester/refactor sub-agents)
Additional sandbox backends beyond Docker

Related inspiration:

smolagents - HuggingFace's "agents that think in code"
Recursive Language Models - Long-context processing via code execution

Documentation:

Sample agent README: contributing/samples/coding_agent/README.md
Technical documentation: `contributing/samples/coding_agent/CODING_AGENT.mdPlease ensure you have read the contribution guide before creating a pull request.**

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Closes: feat(agents): Add CodingAgent (agents that think in code) #4198

2. Or, if no issue exists, describe the change:

N/A - This PR implements the feature request described in issue #4198.

Description

Problem:
ADK's current default agent interaction pattern is "tool selection from a fixed action set." This breaks down for several increasingly common workloads:

Long-context work beyond model context windows: Many real tasks require operating over very large corpora (codebases, logs, datasets, multi-file configs). If the agent must keep all relevant content in the LLM context, it becomes context-window bound and expensive.
Expressiveness and composability limits of pure tool-calling: Tool-calling assumes enumerable actions. Open-ended tasks require composing operations, iterating, caching intermediate artifacts, and implementing one-off transformations without requiring new bespoke tools each time.
Developer experience gap for building "coding agents": Users want agent systems like Claude Code/OpenCode with multi-step coding workflows, sub-agents, and strong "think in code" execution capabilities.

Solution:
Introduce a new experimental agent type: CodingAgent - an agent that generates and executes Python code as its primary action representation.

Key features:

ReAct-style execution loop: Generate code → Execute → Observe results → Refine → Final answer
Sandboxed execution: Code runs in Docker containers via ContainerCodeExecutor for security
Tool integration via HTTP IPC: Generated code can call ADK tools through a ToolExecutionServer running on the host
Import validation: AllowlistValidator ensures only authorized imports are allowed
Stateful execution: Optional state persistence across iterations
Full telemetry: OpenTelemetry spans for code generation, execution, and LLM calls

Architecture:

User → CodingAgent (LLM) → Docker Container (Python sandbox)
                               ↓
                         Tool Server (HTTP IPC on host)
                               ↓
                         ADK Tools with ToolContext

New files introduced:

src/google/adk/agents/coding_agent.py - Main CodingAgent class
src/google/adk/agents/coding_agent_config.py - Pydantic configuration
src/google/adk/code_executors/coding_agent_code_executor.py - Executor wrapper with tool injection
src/google/adk/code_executors/tool_execution_server.py - FastAPI server for tool IPC
src/google/adk/code_executors/tool_code_generator.py - System prompt and stub generation
src/google/adk/code_executors/allowlist_validator.py - Import validation
contributing/samples/coding_agent/ - Sample data analysis agent with documentation

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

Test files added:

tests/unittests/agents/test_coding_agent.py - Tests for CodingAgent, CodingAgentConfig, CodingAgentState
tests/unittests/code_executors/test_tool_code_generator.py - Tests for prompt generation, tool stubs, runtime header
tests/unittests/code_executors/test_allowlist_validator.py - Tests for import validation

Test coverage includes:

CodingAgentConfig default values and validation (max_iterations bounds, port bounds)
CodingAgentState serialization and history tracking
CodingAgent creation with default and custom configurations
Code block extraction (tool_code and python blocks, preference order)
Error feedback formatting
Tool resolution from functions and BaseTool instances
Tool stub generation with type hints and docstrings
Runtime header generation with trace collection
System prompt generation with tool documentation
Import allowlist validation

pytest summary:

# Run all CodingAgent tests
pytest tests/unittests/agents/test_coding_agent.py -v

# Run tool code generator tests
pytest tests/unittests/code_executors/test_tool_code_generator.py -v

# Run allowlist validator tests
pytest tests/unittests/code_executors/test_allowlist_validator.py -v

# Run all related tests
pytest tests/unittests/agents/test_coding_agent.py tests/unittests/code_executors/test_tool_code_generator.py tests/unittests/code_executors/test_allowlist_validator.py -v

Manual End-to-End (E2E) Tests:

Prerequisites:

Docker installed and running
GOOGLE_API_KEY environment variable set

Test the sample agent:

# Interactive CLI mode
adk run contributing/samples/coding_agent

# Web UI mode
adk web contributing/samples
# Navigate to http://localhost:8000 and select coding_agent

Example test interactions:

Basic data analysis:

User: What is the survival rate on the Titanic dataset?
Expected: Agent fetches Titanic CSV, analyzes it with pandas, returns ~38.4% survival rate

Visualization with chart saving:

User: Create a bar chart showing survival rate by passenger class on the Titanic
Expected: Agent creates matplotlib chart, saves via save_chart tool to /tmp/adk_charts/

Multi-step analysis:

User: Analyze the iris dataset and give me key insights
Expected: Agent iteratively fetches data, runs statistical analysis, potentially creates visualizations

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

Design decisions:

Experimental decorator: CodingAgent is marked with @experimental to indicate this is a new feature that may evolve.
Default to ContainerCodeExecutor: Security-first approach - code executes in isolated Docker containers by default. Users can supply custom executors if needed.
HTTP IPC over direct execution: Tools run on the host, not in containers. This maintains security isolation while allowing full ToolContext capabilities.
Import allowlist: DEFAULT_SAFE_IMPORTS provides a conservative set of safe modules. Users can extend this for specific use cases (e.g., adding pandas.* for data analysis).
ReAct-style iteration: The agent can observe execution results and iteratively refine its approach, similar to how human developers debug code.

Future enhancements enabled by this architecture:

REPL/Jupyter-kernel style execution modes for interactive sessions
Long-context scaffolds inspired by arXiv:2512.24601
Sub-agent coding workflows (planner/tester/refactor sub-agents)
Additional sandbox backends beyond Docker

Related inspiration:

smolagents - HuggingFace's "agents that think in code"
Recursive Language Models - Long-context processing via code execution

Documentation:

Sample agent README: contributing/samples/coding_agent/README.md
Technical documentation: `contributing/samples/coding_agent/CODING_AGENT.md

…ngAgent - Add save_chart tool to save visualizations to host filesystem - Add list_saved_charts tool to list saved charts - Add _is_real_error method to distinguish between warnings and errors - Fix pip warnings being treated as execution errors - Update system prompt with package installation instructions - Add base64 to authorized imports for chart encoding - Update README with new tool documentation - Create GitHub issue template for CodingAgent feature

…epth - Add research foundation from CodeAct (ICML 2024) and DynaSaur (COLM 2025) - Reference HuggingFace smolagents as inspiration (25k+ GitHub stars) - Expand problem statement with context window bottleneck analysis - Add detailed alternatives considered section with rationale - Include future roadmap for stateful execution and alternative sandboxes - Add concrete user pain points and how CodingAgent solves them

- Refocus motivation on arXiv:2512.24601 long-context external environment framing - Keep smolagents as primary inspiration for code-thinking agents - Tighten solution, alternatives, and future directions per ADK template

gemini-code-assist · 2026-01-25T21:13:06Z

Summary of Changes

Hello @Sudhendra, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ADK framework by introducing the CodingAgent, an experimental agent type designed to overcome the limitations of traditional tool-calling agents. It empowers agents to dynamically generate and execute Python code in a secure, sandboxed environment, facilitating multi-step reasoning, long-context workloads, and complex data manipulations. The architecture ensures robust security through Docker containerization and import validation, while enabling seamless interaction with existing ADK tools via an HTTP IPC server. This feature paves the way for more sophisticated and adaptable agent behaviors within the ADK ecosystem.

Highlights

New Agent Type: CodingAgent: Introduces a novel CodingAgent that generates and executes Python code to solve tasks, enabling more flexible and complex workflows beyond fixed tool sets.
Sandboxed Code Execution: Implements secure code execution within isolated Docker containers via ContainerCodeExecutor, mitigating security risks associated with arbitrary code execution.
HTTP IPC for Tool Integration: Establishes a ToolExecutionServer (FastAPI) on the host machine, allowing generated code in the container to safely call ADK tools through HTTP Inter-Process Communication (IPC).
Import Allowlisting for Security: Incorporates an AllowlistValidator to restrict Python imports within the generated code to a predefined set of safe modules, enhancing the security posture.
ReAct-style Iterative Loop: The CodingAgent operates using a ReAct-style loop, allowing it to generate code, execute it, observe results, and iteratively refine its approach until a final answer is reached.
Comprehensive Telemetry: Adds detailed OpenTelemetry tracing for code generation, execution, import validation, and tool IPC, providing deep observability into the agent's operations.
Data Analysis Sample Agent: Includes a fully functional sample CodingAgent configured for data analysis, demonstrating capabilities like fetching data, using pandas, creating visualizations with matplotlib, and saving charts.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an impressive and comprehensive pull request that introduces the CodingAgent. The architecture is well-thought-out, with a secure-by-default approach using sandboxed execution and an import allowlist. The code is clean, well-documented, and includes a great sample agent that showcases the new capabilities. I have a couple of suggestions for improvement regarding code duplication and error handling to further enhance the robustness of this new feature.

src/google/adk/code_executors/coding_agent_code_executor.py

src/google/adk/agents/coding_agent_config.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…y with upstream Addresses Gemini code review suggestion (PR google#4259): - Remove duplicate DEFAULT_SAFE_IMPORTS from coding_agent_config.py - Import DEFAULT_SAFE_IMPORTS from allowlist_validator.py (canonical source) - Create _DATA_SCIENCE_IMPORTS for numpy/pandas/scipy/matplotlib packages - Create _EXTENDED_SAFE_IMPORTS combining both for CodingAgentConfig default Resolves merge conflict in telemetry/tracing.py: - Sync with upstream main (new OTEL improvements, proper semconv imports) - Add CodingAgent-specific tracing functions: trace_code_generation, trace_code_execution, trace_import_validation, trace_tool_ipc Updates test to use _EXTENDED_SAFE_IMPORTS from coding_agent_config.py

Sudhendra · 2026-01-26T00:14:00Z

Closing in favor of a new PR from Sudhendra:pr-4259, which contains the
Gemini review fix and the tracing.py conflict resolution.

New PR can be found here - #4262

Sudhendra added 6 commits January 17, 2026 12:33

Initial implementation of CodingAgent for adk contribution

6cf64e2

successful data analyst code_agent test

50a2819

tracing

1428127

adk-bot added the core [Component] This issue is related to the core interface and implementation label Jan 25, 2026

Sudhendra mentioned this pull request Jan 25, 2026

feat(agents): Add CodingAgent (agents that think in code) #4198

Open

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

src/google/adk/code_executors/coding_agent_code_executor.py Show resolved Hide resolved

src/google/adk/agents/coding_agent_config.py Show resolved Hide resolved

Update src/google/adk/code_executors/coding_agent_code_executor.py

7957286

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Sudhendra mentioned this pull request Jan 26, 2026

feat(agents): Add CodingAgent (agents that think in code) #4262

Open

9 tasks

Sudhendra closed this Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): Add CodingAgent (agents that think in code) #4259

feat(agents): Add CodingAgent (agents that think in code) #4259

Sudhendra commented Jan 25, 2026

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Sudhendra commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(agents): Add CodingAgent (agents that think in code) #4259

feat(agents): Add CodingAgent (agents that think in code) #4259

Conversation

Sudhendra commented Jan 25, 2026

Link to Issue or Description of Change

Description

Testing Plan

Checklist

Additional context

Link to Issue or Description of Change

Description

Testing Plan

Checklist

Additional context

Uh oh!

gemini-code-assist bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Sudhendra commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants