Skip to content

Conversation

@Sudhendra
Copy link

Please ensure you have read the contribution guide before creating a pull request.**

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

N/A - This PR implements the feature request described in issue #4198.

Description

Problem:
ADK's current default agent interaction pattern is "tool selection from a fixed action set." This breaks down for several increasingly common workloads:

  1. Long-context work beyond model context windows: Many real tasks require operating over very large corpora (codebases, logs, datasets, multi-file configs). If the agent must keep all relevant content in the LLM context, it becomes context-window bound and expensive.

  2. Expressiveness and composability limits of pure tool-calling: Tool-calling assumes enumerable actions. Open-ended tasks require composing operations, iterating, caching intermediate artifacts, and implementing one-off transformations without requiring new bespoke tools each time.

  3. Developer experience gap for building "coding agents": Users want agent systems like Claude Code/OpenCode with multi-step coding workflows, sub-agents, and strong "think in code" execution capabilities.

Solution:
Introduce a new experimental agent type: CodingAgent - an agent that generates and executes Python code as its primary action representation.

Key features:

  • ReAct-style execution loop: Generate code → Execute → Observe results → Refine → Final answer
  • Sandboxed execution: Code runs in Docker containers via ContainerCodeExecutor for security
  • Tool integration via HTTP IPC: Generated code can call ADK tools through a ToolExecutionServer running on the host
  • Import validation: AllowlistValidator ensures only authorized imports are allowed
  • Stateful execution: Optional state persistence across iterations
  • Full telemetry: OpenTelemetry spans for code generation, execution, and LLM calls

Architecture:

User → CodingAgent (LLM) → Docker Container (Python sandbox)
                               ↓
                         Tool Server (HTTP IPC on host)
                               ↓
                         ADK Tools with ToolContext

New files introduced:

  • src/google/adk/agents/coding_agent.py - Main CodingAgent class
  • src/google/adk/agents/coding_agent_config.py - Pydantic configuration
  • src/google/adk/code_executors/coding_agent_code_executor.py - Executor wrapper with tool injection
  • src/google/adk/code_executors/tool_execution_server.py - FastAPI server for tool IPC
  • src/google/adk/code_executors/tool_code_generator.py - System prompt and stub generation
  • src/google/adk/code_executors/allowlist_validator.py - Import validation
  • contributing/samples/coding_agent/ - Sample data analysis agent with documentation

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Test files added:

  • tests/unittests/agents/test_coding_agent.py - Tests for CodingAgent, CodingAgentConfig, CodingAgentState
  • tests/unittests/code_executors/test_tool_code_generator.py - Tests for prompt generation, tool stubs, runtime header
  • tests/unittests/code_executors/test_allowlist_validator.py - Tests for import validation

Test coverage includes:

  • CodingAgentConfig default values and validation (max_iterations bounds, port bounds)
  • CodingAgentState serialization and history tracking
  • CodingAgent creation with default and custom configurations
  • Code block extraction (tool_code and python blocks, preference order)
  • Error feedback formatting
  • Tool resolution from functions and BaseTool instances
  • Tool stub generation with type hints and docstrings
  • Runtime header generation with trace collection
  • System prompt generation with tool documentation
  • Import allowlist validation

pytest summary:

# Run all CodingAgent tests
pytest tests/unittests/agents/test_coding_agent.py -v

# Run tool code generator tests
pytest tests/unittests/code_executors/test_tool_code_generator.py -v

# Run allowlist validator tests
pytest tests/unittests/code_executors/test_allowlist_validator.py -v

# Run all related tests
pytest tests/unittests/agents/test_coding_agent.py tests/unittests/code_executors/test_tool_code_generator.py tests/unittests/code_executors/test_allowlist_validator.py -v

Manual End-to-End (E2E) Tests:

Prerequisites:

  • Docker installed and running
  • GOOGLE_API_KEY environment variable set

Test the sample agent:

# Interactive CLI mode
adk run contributing/samples/coding_agent

# Web UI mode
adk web contributing/samples
# Navigate to http://localhost:8000 and select coding_agent

Example test interactions:

  1. Basic data analysis:

    User: What is the survival rate on the Titanic dataset?
    Expected: Agent fetches Titanic CSV, analyzes it with pandas, returns ~38.4% survival rate
    
  2. Visualization with chart saving:

    User: Create a bar chart showing survival rate by passenger class on the Titanic
    Expected: Agent creates matplotlib chart, saves via save_chart tool to /tmp/adk_charts/
    
  3. Multi-step analysis:

    User: Analyze the iris dataset and give me key insights
    Expected: Agent iteratively fetches data, runs statistical analysis, potentially creates visualizations
    

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Design decisions:

  1. Experimental decorator: CodingAgent is marked with @experimental to indicate this is a new feature that may evolve.

  2. Default to ContainerCodeExecutor: Security-first approach - code executes in isolated Docker containers by default. Users can supply custom executors if needed.

  3. HTTP IPC over direct execution: Tools run on the host, not in containers. This maintains security isolation while allowing full ToolContext capabilities.

  4. Import allowlist: DEFAULT_SAFE_IMPORTS provides a conservative set of safe modules. Users can extend this for specific use cases (e.g., adding pandas.* for data analysis).

  5. ReAct-style iteration: The agent can observe execution results and iteratively refine its approach, similar to how human developers debug code.

Future enhancements enabled by this architecture:

  • REPL/Jupyter-kernel style execution modes for interactive sessions
  • Long-context scaffolds inspired by arXiv:2512.24601
  • Sub-agent coding workflows (planner/tester/refactor sub-agents)
  • Additional sandbox backends beyond Docker

Related inspiration:

Documentation:

  • Sample agent README: contributing/samples/coding_agent/README.md
  • Technical documentation: `contributing/samples/coding_agent/CODING_AGENT.mdPlease ensure you have read the contribution guide before creating a pull request.**

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

N/A - This PR implements the feature request described in issue #4198.

Description

Problem:
ADK's current default agent interaction pattern is "tool selection from a fixed action set." This breaks down for several increasingly common workloads:

  1. Long-context work beyond model context windows: Many real tasks require operating over very large corpora (codebases, logs, datasets, multi-file configs). If the agent must keep all relevant content in the LLM context, it becomes context-window bound and expensive.

  2. Expressiveness and composability limits of pure tool-calling: Tool-calling assumes enumerable actions. Open-ended tasks require composing operations, iterating, caching intermediate artifacts, and implementing one-off transformations without requiring new bespoke tools each time.

  3. Developer experience gap for building "coding agents": Users want agent systems like Claude Code/OpenCode with multi-step coding workflows, sub-agents, and strong "think in code" execution capabilities.

Solution:
Introduce a new experimental agent type: CodingAgent - an agent that generates and executes Python code as its primary action representation.

Key features:

  • ReAct-style execution loop: Generate code → Execute → Observe results → Refine → Final answer
  • Sandboxed execution: Code runs in Docker containers via ContainerCodeExecutor for security
  • Tool integration via HTTP IPC: Generated code can call ADK tools through a ToolExecutionServer running on the host
  • Import validation: AllowlistValidator ensures only authorized imports are allowed
  • Stateful execution: Optional state persistence across iterations
  • Full telemetry: OpenTelemetry spans for code generation, execution, and LLM calls

Architecture:

User → CodingAgent (LLM) → Docker Container (Python sandbox)
                               ↓
                         Tool Server (HTTP IPC on host)
                               ↓
                         ADK Tools with ToolContext

New files introduced:

  • src/google/adk/agents/coding_agent.py - Main CodingAgent class
  • src/google/adk/agents/coding_agent_config.py - Pydantic configuration
  • src/google/adk/code_executors/coding_agent_code_executor.py - Executor wrapper with tool injection
  • src/google/adk/code_executors/tool_execution_server.py - FastAPI server for tool IPC
  • src/google/adk/code_executors/tool_code_generator.py - System prompt and stub generation
  • src/google/adk/code_executors/allowlist_validator.py - Import validation
  • contributing/samples/coding_agent/ - Sample data analysis agent with documentation

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Test files added:

  • tests/unittests/agents/test_coding_agent.py - Tests for CodingAgent, CodingAgentConfig, CodingAgentState
  • tests/unittests/code_executors/test_tool_code_generator.py - Tests for prompt generation, tool stubs, runtime header
  • tests/unittests/code_executors/test_allowlist_validator.py - Tests for import validation

Test coverage includes:

  • CodingAgentConfig default values and validation (max_iterations bounds, port bounds)
  • CodingAgentState serialization and history tracking
  • CodingAgent creation with default and custom configurations
  • Code block extraction (tool_code and python blocks, preference order)
  • Error feedback formatting
  • Tool resolution from functions and BaseTool instances
  • Tool stub generation with type hints and docstrings
  • Runtime header generation with trace collection
  • System prompt generation with tool documentation
  • Import allowlist validation

pytest summary:

# Run all CodingAgent tests
pytest tests/unittests/agents/test_coding_agent.py -v

# Run tool code generator tests
pytest tests/unittests/code_executors/test_tool_code_generator.py -v

# Run allowlist validator tests
pytest tests/unittests/code_executors/test_allowlist_validator.py -v

# Run all related tests
pytest tests/unittests/agents/test_coding_agent.py tests/unittests/code_executors/test_tool_code_generator.py tests/unittests/code_executors/test_allowlist_validator.py -v

Manual End-to-End (E2E) Tests:

Prerequisites:

  • Docker installed and running
  • GOOGLE_API_KEY environment variable set

Test the sample agent:

# Interactive CLI mode
adk run contributing/samples/coding_agent

# Web UI mode
adk web contributing/samples
# Navigate to http://localhost:8000 and select coding_agent

Example test interactions:

  1. Basic data analysis:

    User: What is the survival rate on the Titanic dataset?
    Expected: Agent fetches Titanic CSV, analyzes it with pandas, returns ~38.4% survival rate
    
  2. Visualization with chart saving:

    User: Create a bar chart showing survival rate by passenger class on the Titanic
    Expected: Agent creates matplotlib chart, saves via save_chart tool to /tmp/adk_charts/
    
  3. Multi-step analysis:

    User: Analyze the iris dataset and give me key insights
    Expected: Agent iteratively fetches data, runs statistical analysis, potentially creates visualizations
    

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Design decisions:

  1. Experimental decorator: CodingAgent is marked with @experimental to indicate this is a new feature that may evolve.

  2. Default to ContainerCodeExecutor: Security-first approach - code executes in isolated Docker containers by default. Users can supply custom executors if needed.

  3. HTTP IPC over direct execution: Tools run on the host, not in containers. This maintains security isolation while allowing full ToolContext capabilities.

  4. Import allowlist: DEFAULT_SAFE_IMPORTS provides a conservative set of safe modules. Users can extend this for specific use cases (e.g., adding pandas.* for data analysis).

  5. ReAct-style iteration: The agent can observe execution results and iteratively refine its approach, similar to how human developers debug code.

Future enhancements enabled by this architecture:

  • REPL/Jupyter-kernel style execution modes for interactive sessions
  • Long-context scaffolds inspired by arXiv:2512.24601
  • Sub-agent coding workflows (planner/tester/refactor sub-agents)
  • Additional sandbox backends beyond Docker

Related inspiration:

Documentation:

  • Sample agent README: contributing/samples/coding_agent/README.md
  • Technical documentation: `contributing/samples/coding_agent/CODING_AGENT.md

…ngAgent

- Add save_chart tool to save visualizations to host filesystem
- Add list_saved_charts tool to list saved charts
- Add _is_real_error method to distinguish between warnings and errors
- Fix pip warnings being treated as execution errors
- Update system prompt with package installation instructions
- Add base64 to authorized imports for chart encoding
- Update README with new tool documentation
- Create GitHub issue template for CodingAgent feature
…epth

- Add research foundation from CodeAct (ICML 2024) and DynaSaur (COLM 2025)
- Reference HuggingFace smolagents as inspiration (25k+ GitHub stars)
- Expand problem statement with context window bottleneck analysis
- Add detailed alternatives considered section with rationale
- Include future roadmap for stateful execution and alternative sandboxes
- Add concrete user pain points and how CodingAgent solves them
- Refocus motivation on arXiv:2512.24601 long-context external environment framing
- Keep smolagents as primary inspiration for code-thinking agents
- Tighten solution, alternatives, and future directions per ADK template
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Sudhendra, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ADK framework by introducing the CodingAgent, an experimental agent type designed to overcome the limitations of traditional tool-calling agents. It empowers agents to dynamically generate and execute Python code in a secure, sandboxed environment, facilitating multi-step reasoning, long-context workloads, and complex data manipulations. The architecture ensures robust security through Docker containerization and import validation, while enabling seamless interaction with existing ADK tools via an HTTP IPC server. This feature paves the way for more sophisticated and adaptable agent behaviors within the ADK ecosystem.

Highlights

  • New Agent Type: CodingAgent: Introduces a novel CodingAgent that generates and executes Python code to solve tasks, enabling more flexible and complex workflows beyond fixed tool sets.
  • Sandboxed Code Execution: Implements secure code execution within isolated Docker containers via ContainerCodeExecutor, mitigating security risks associated with arbitrary code execution.
  • HTTP IPC for Tool Integration: Establishes a ToolExecutionServer (FastAPI) on the host machine, allowing generated code in the container to safely call ADK tools through HTTP Inter-Process Communication (IPC).
  • Import Allowlisting for Security: Incorporates an AllowlistValidator to restrict Python imports within the generated code to a predefined set of safe modules, enhancing the security posture.
  • ReAct-style Iterative Loop: The CodingAgent operates using a ReAct-style loop, allowing it to generate code, execute it, observe results, and iteratively refine its approach until a final answer is reached.
  • Comprehensive Telemetry: Adds detailed OpenTelemetry tracing for code generation, execution, import validation, and tool IPC, providing deep observability into the agent's operations.
  • Data Analysis Sample Agent: Includes a fully functional sample CodingAgent configured for data analysis, demonstrating capabilities like fetching data, using pandas, creating visualizations with matplotlib, and saving charts.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Jan 25, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an impressive and comprehensive pull request that introduces the CodingAgent. The architecture is well-thought-out, with a secure-by-default approach using sandboxed execution and an import allowlist. The code is clean, well-documented, and includes a great sample agent that showcases the new capabilities. I have a couple of suggestions for improvement regarding code duplication and error handling to further enhance the robustness of this new feature.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sudhendra added a commit to Sudhendra/adk-python-codeagent that referenced this pull request Jan 25, 2026
…y with upstream

Addresses Gemini code review suggestion (PR google#4259):
- Remove duplicate DEFAULT_SAFE_IMPORTS from coding_agent_config.py
- Import DEFAULT_SAFE_IMPORTS from allowlist_validator.py (canonical source)
- Create _DATA_SCIENCE_IMPORTS for numpy/pandas/scipy/matplotlib packages
- Create _EXTENDED_SAFE_IMPORTS combining both for CodingAgentConfig default

Resolves merge conflict in telemetry/tracing.py:
- Sync with upstream main (new OTEL improvements, proper semconv imports)
- Add CodingAgent-specific tracing functions: trace_code_generation,
  trace_code_execution, trace_import_validation, trace_tool_ipc

Updates test to use _EXTENDED_SAFE_IMPORTS from coding_agent_config.py
@Sudhendra
Copy link
Author

Closing in favor of a new PR from Sudhendra:pr-4259, which contains the
Gemini review fix and the tracing.py conflict resolution.

New PR can be found here - #4262

@Sudhendra Sudhendra closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agents): Add CodingAgent (agents that think in code)

2 participants