Skip to content

Conversation

@Sudhendra
Copy link

Please ensure you have read the contribution guide before creating a pull request.**

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

N/A - This PR implements the feature request described in issue #4198.

Description

Problem:
ADK's current default agent interaction pattern is "tool selection from a fixed action set." This breaks down for several increasingly common workloads:

  1. Long-context work beyond model context windows: Many real tasks require operating over very large corpora (codebases, logs, datasets, multi-file configs). If the agent must keep all relevant content in the LLM context, it becomes context-window bound and expensive.

  2. Expressiveness and composability limits of pure tool-calling: Tool-calling assumes enumerable actions. Open-ended tasks require composing operations, iterating, caching intermediate artifacts, and implementing one-off transformations without requiring new bespoke tools each time.

  3. Developer experience gap for building "coding agents": Users want agent systems like Claude Code/OpenCode with multi-step coding workflows, sub-agents, and strong "think in code" execution capabilities.

Solution:
Introduce a new experimental agent type: CodingAgent - an agent that generates and executes Python code as its primary action representation.

Key features:

  • ReAct-style execution loop: Generate code → Execute → Observe results → Refine → Final answer
  • Sandboxed execution: Code runs in Docker containers via ContainerCodeExecutor for security
  • Tool integration via HTTP IPC: Generated code can call ADK tools through a ToolExecutionServer running on the host
  • Import validation: AllowlistValidator ensures only authorized imports are allowed
  • Stateful execution: Optional state persistence across iterations
  • Full telemetry: OpenTelemetry spans for code generation, execution, and LLM calls

Architecture:

User → CodingAgent (LLM) → Docker Container (Python sandbox)
                               ↓
                         Tool Server (HTTP IPC on host)
                               ↓
                         ADK Tools with ToolContext

New files introduced:

  • src/google/adk/agents/coding_agent.py - Main CodingAgent class
  • src/google/adk/agents/coding_agent_config.py - Pydantic configuration
  • src/google/adk/code_executors/coding_agent_code_executor.py - Executor wrapper with tool injection
  • src/google/adk/code_executors/tool_execution_server.py - FastAPI server for tool IPC
  • src/google/adk/code_executors/tool_code_generator.py - System prompt and stub generation
  • src/google/adk/code_executors/allowlist_validator.py - Import validation
  • contributing/samples/coding_agent/ - Sample data analysis agent with documentation

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Test files added:

  • tests/unittests/agents/test_coding_agent.py - Tests for CodingAgent, CodingAgentConfig, CodingAgentState
  • tests/unittests/code_executors/test_tool_code_generator.py - Tests for prompt generation, tool stubs, runtime header
  • tests/unittests/code_executors/test_allowlist_validator.py - Tests for import validation

Test coverage includes:

  • CodingAgentConfig default values and validation (max_iterations bounds, port bounds)
  • CodingAgentState serialization and history tracking
  • CodingAgent creation with default and custom configurations
  • Code block extraction (tool_code and python blocks, preference order)
  • Error feedback formatting
  • Tool resolution from functions and BaseTool instances
  • Tool stub generation with type hints and docstrings
  • Runtime header generation with trace collection
  • System prompt generation with tool documentation
  • Import allowlist validation

pytest summary:

# Run all CodingAgent tests
pytest tests/unittests/agents/test_coding_agent.py -v

# Run tool code generator tests
pytest tests/unittests/code_executors/test_tool_code_generator.py -v

# Run allowlist validator tests
pytest tests/unittests/code_executors/test_allowlist_validator.py -v

# Run all related tests
pytest tests/unittests/agents/test_coding_agent.py tests/unittests/code_executors/test_tool_code_generator.py tests/unittests/code_executors/test_allowlist_validator.py -v

Manual End-to-End (E2E) Tests:

Prerequisites:

  • Docker installed and running
  • GOOGLE_API_KEY environment variable set

Test the sample agent:

# Interactive CLI mode
adk run contributing/samples/coding_agent

# Web UI mode
adk web contributing/samples
# Navigate to http://localhost:8000 and select coding_agent

Example test interactions:

  1. Basic data analysis:

    User: What is the survival rate on the Titanic dataset?
    Expected: Agent fetches Titanic CSV, analyzes it with pandas, returns ~38.4% survival rate
    
  2. Visualization with chart saving:

    User: Create a bar chart showing survival rate by passenger class on the Titanic
    Expected: Agent creates matplotlib chart, saves via save_chart tool to /tmp/adk_charts/
    
  3. Multi-step analysis:

    User: Analyze the iris dataset and give me key insights
    Expected: Agent iteratively fetches data, runs statistical analysis, potentially creates visualizations
    

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Design decisions:

  1. Experimental decorator: CodingAgent is marked with @experimental to indicate this is a new feature that may evolve.

  2. Default to ContainerCodeExecutor: Security-first approach - code executes in isolated Docker containers by default. Users can supply custom executors if needed.

  3. HTTP IPC over direct execution: Tools run on the host, not in containers. This maintains security isolation while allowing full ToolContext capabilities.

  4. Import allowlist: DEFAULT_SAFE_IMPORTS provides a conservative set of safe modules. Users can extend this for specific use cases (e.g., adding pandas.* for data analysis).

  5. ReAct-style iteration: The agent can observe execution results and iteratively refine its approach, similar to how human developers debug code.

Future enhancements enabled by this architecture:

  • REPL/Jupyter-kernel style execution modes for interactive sessions
  • Long-context scaffolds inspired by arXiv:2512.24601
  • Sub-agent coding workflows (planner/tester/refactor sub-agents)
  • Additional sandbox backends beyond Docker

Related inspiration:

Documentation:

  • Sample agent README: contributing/samples/coding_agent/README.md
  • Technical documentation: `contributing/samples/coding_agent/CODING_AGENT.md

Footnotes

Creating this new PR closing the old #4259 to implement the following in addition to CodingAgent

Summary of new changes

  • Deduplicates DEFAULT_SAFE_IMPORTS by importing the canonical value from
    allowlist_validator.py and extending it via _EXTENDED_SAFE_IMPORTS for
    CodingAgent defaults.
  • Resolves the merge conflict in src/google/adk/telemetry/tracing.py by
    syncing with upstream and re-adding CodingAgent tracing helpers
    (trace_code_generation, trace_code_execution, trace_import_validation,
    trace_tool_ipc).

Testing

  • pytest tests/unittests/agents/test_coding_agent.py
  • pytest tests/unittests/code_executors/test_allowlist_validator.py

Sudhendra and others added 8 commits January 25, 2026 17:58
…ngAgent

- Add save_chart tool to save visualizations to host filesystem
- Add list_saved_charts tool to list saved charts
- Add _is_real_error method to distinguish between warnings and errors
- Fix pip warnings being treated as execution errors
- Update system prompt with package installation instructions
- Add base64 to authorized imports for chart encoding
- Update README with new tool documentation
- Create GitHub issue template for CodingAgent feature
…epth

- Add research foundation from CodeAct (ICML 2024) and DynaSaur (COLM 2025)
- Reference HuggingFace smolagents as inspiration (25k+ GitHub stars)
- Expand problem statement with context window bottleneck analysis
- Add detailed alternatives considered section with rationale
- Include future roadmap for stateful execution and alternative sandboxes
- Add concrete user pain points and how CodingAgent solves them
- Refocus motivation on arXiv:2512.24601 long-context external environment framing
- Keep smolagents as primary inspiration for code-thinking agents
- Tighten solution, alternatives, and future directions per ADK template
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…y with upstream

Addresses Gemini code review suggestion (PR google#4259):
- Remove duplicate DEFAULT_SAFE_IMPORTS from coding_agent_config.py
- Import DEFAULT_SAFE_IMPORTS from allowlist_validator.py (canonical source)
- Create _DATA_SCIENCE_IMPORTS for numpy/pandas/scipy/matplotlib packages
- Create _EXTENDED_SAFE_IMPORTS combining both for CodingAgentConfig default

Resolves merge conflict in telemetry/tracing.py:
- Sync with upstream main (new OTEL improvements, proper semconv imports)
- Add CodingAgent-specific tracing functions: trace_code_generation,
  trace_code_execution, trace_import_validation, trace_tool_ipc

Updates test to use _EXTENDED_SAFE_IMPORTS from coding_agent_config.py
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Sudhendra, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new agent type, the CodingAgent, designed to overcome the limitations of traditional tool-calling agents when dealing with long-context workloads, complex task expressiveness, and the need for multi-step coding workflows. By enabling agents to 'think in code' and execute Python in a secure, sandboxed environment, it expands the capabilities of ADK to handle more open-ended and iterative problem-solving scenarios, bridging a gap in developer experience for building advanced AI systems.

Highlights

  • New CodingAgent Type: Introduced a new experimental CodingAgent that generates and executes Python code in a sandboxed environment, addressing limitations of fixed action sets for complex tasks.
  • Sandboxed Code Execution: Code generated by the agent runs securely within Docker containers via ContainerCodeExecutor, ensuring isolation from the host system.
  • Tool Integration via HTTP IPC: Generated Python code can safely call existing ADK tools through a ToolExecutionServer running on the host, using HTTP for inter-process communication.
  • Import Validation and Security: Implemented an AllowlistValidator to ensure that only authorized Python imports are permitted within the generated code, enhancing security.
  • ReAct-style Iteration and Stateful Execution: The agent employs a ReAct-style execution loop (Generate Code → Execute → Observe → Refine) and supports optional state persistence across iterations for multi-step tasks.
  • Comprehensive Telemetry: Integrated OpenTelemetry spans for key operations including code generation, execution, import validation, and tool IPC, providing detailed observability.
  • Data Analysis Sample Agent: A fully functional sample Data Analysis Agent is included, demonstrating the CodingAgent's capabilities in fetching, analyzing, and visualizing data.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Jan 26, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed new feature: the CodingAgent. The architecture, which uses sandboxed Docker containers for code execution and an HTTP IPC mechanism for tool calls, provides a robust and secure way for agents to "think in code". The implementation is comprehensive, covering configuration, state management, security via import allowlisting, and detailed telemetry. The included sample data analysis agent is an excellent demonstration of the new capabilities.

My review focuses on a few areas for improvement, primarily concerning resource management reliability (the use of __del__) and making some of the generated code and error handling even more robust. Overall, this is a high-quality contribution.

Sudhendra and others added 2 commits January 25, 2026 18:18
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agents): Add CodingAgent (agents that think in code)

2 participants