Skip to content

Conversation

@LiamConnell
Copy link

Summary

This PR adds a new sample implementing Recursive Language Models (RLM) using ADK and Gemini models.

RLM enables LLMs to handle near-infinite length contexts by programmatically examining, decomposing, and recursively calling themselves through a Python REPL environment. This is an implementation of the concepts from the paper "Enabling Near-Infinite Length Context with Recursive Language Models" adapted to use Google's ADK.

Key Features

  • Recursive LLM Calls: LLMs can spawn sub-LLMs to analyze context chunks, with configurable max depth
  • Sandboxed Python REPL: Safe code execution environment with restricted builtins
  • Streaming Events: Real-time event streaming for UI integration (rlm.iteration.start, rlm.code.found, rlm.final.answer, etc.)
  • Multi-Turn Persistence: Maintain conversation state across turns using ADK sessions
  • JSONL Logging: Structured logs for debugging and visualization
  • File Loading: Lazy loading from local filesystem and Google Cloud Storage
  • Web UI: FastAPI-based interface with WebSocket streaming and Tokyo Night theme
  • CLI: Interactive REPL with Rich console output

Architecture

The sample demonstrates several ADK patterns:

  • Custom BaseAgent implementation (RLMAgent)
  • Custom BaseCodeExecutor for sandboxed REPL
  • Streaming events via AsyncGenerator[Event, None]
  • Session persistence with DatabaseSessionService

Test Plan

Unit Tests

The sample includes comprehensive unit tests that can be run without API access:

cd contributing/samples/rlm
uv pip install -e ".[dev]"
python -m pytest tests/ --ignore=tests/test_e2e.py --ignore=tests/test_gcs_integration.py -v

Tests cover:

  • REPL execution and sandboxing (test_repl.py)
  • Code block parsing (test_parsing.py)
  • File loading and lazy evaluation (test_files.py)
  • Multi-turn conversation handling (test_multi_turn.py)
  • Event streaming (test_simple_llm_events.py)
  • Usage tracking (test_usage.py)

E2E Tests (with LLM)

RLM_E2E_TESTS=true python -m pytest tests/test_e2e.py -v

Web UI Tests (with Playwright)

python -m pytest tests/ui/ -v  # Mocked WebSocket
RLM_E2E_TESTS=true python -m pytest tests/e2e/ -v  # Real server

Manual Testing

# Interactive CLI
python -m adk_rlm.cli

# Web interface
python -m adk_rlm.web

# ADK built-in web interface
adk web adk_rlm.agent

This sample implements Recursive Language Models using ADK and Gemini.
RLM enables LLMs to handle near-infinite length contexts by programmatically
examining, decomposing, and recursively calling themselves through a Python
REPL environment.

Features:
- Recursive LLM calls with depth control
- Sandboxed Python REPL with restricted builtins
- Real-time streaming events for UI integration
- Multi-turn conversation persistence
- JSONL logging compatible with RLM visualizer
- File loading from local filesystem and GCS
- Web UI with Tokyo Night theme
- Comprehensive test suite (unit, UI, E2E)
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @LiamConnell, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new sample that implements Recursive Language Models (RLM) using Google's Agent Development Kit (ADK) and Gemini models. The RLM approach allows large language models to process extensive contexts by breaking down problems, executing code in a controlled environment, and recursively calling themselves. This enables near-infinite context length handling and provides detailed insights into the model's reasoning process through comprehensive event streaming and logging.

Highlights

  • Recursive LLM Calls: LLMs can spawn sub-LLMs to analyze context chunks, with configurable max depth, enabling hierarchical decomposition of complex problems.
  • Sandboxed Python REPL: A safe code execution environment with restricted builtins is provided, allowing LLMs to programmatically interact with data.
  • Streaming Events: Real-time event streaming (e.g., 'rlm.iteration.start', 'rlm.code.found', 'rlm.final.answer') is implemented for UI integration and granular visibility into execution.
  • Multi-Turn Persistence: Conversation state and REPL variables are maintained across turns using ADK sessions, supporting continuous interaction.
  • File System Integration: Lazy loading from local filesystem and Google Cloud Storage is supported, allowing efficient handling of large contexts.
  • Web UI and CLI: A FastAPI-based web interface with WebSocket streaming and a Tokyo Night theme, along with an interactive CLI with Rich console output, are provided for user interaction.
  • ADK Pattern Implementation: The sample demonstrates custom BaseAgent (RLMAgent), custom BaseCodeExecutor for the sandboxed REPL, streaming events via AsyncGenerator, and session persistence with DatabaseSessionService.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new sample for Recursive Language Models (RLM) using ADK and Gemini models. The implementation includes a robust agent, a sandboxed Python REPL, streaming events, multi-turn persistence, and file handling with lazy loading. The accompanying CLI and Web UI provide interactive ways to engage with the RLM. Comprehensive unit and E2E tests are also included, along with deployment scripts for GCP. Overall, the changes are well-structured and demonstrate a strong understanding of the RLM pattern and ADK framework. I've identified a few areas for improvement related to Dockerfile best practices, logging configuration, and encapsulation, which are detailed in the specific comments.

RUN mkdir -p adk_rlm && echo '__version__ = "0.1.0"' > adk_rlm/__init__.py

# Install dependencies (this layer is cached unless pyproject.toml changes)
RUN uv pip install --system -e ".[all]" --index-url https://pypi.org/simple/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using --system with uv pip install inside a Dockerfile is generally not recommended. In a Docker image, the environment is already isolated, so installing into the system Python (even a slim one) bypasses uv's virtual environment management. It's often better to let uv manage a virtual environment within the container or use pip install directly if uv's virtual environment features aren't strictly needed in the final image layer.

EXPOSE 8080

# Run the web server
CMD ["sh", "-c", "python -m adk_rlm.web --host 0.0.0.0 --port $PORT"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's generally better to use the exec form of CMD in Dockerfiles (e.g., CMD ["python", "-m", "adk_rlm.web", "--host", "0.0.0.0", "--port", "$PORT"]). The sh -c form runs the command as a child process of sh, which can lead to issues with signal handling (e.g., SIGTERM not being passed to the Python process) and process IDs. Using the exec form ensures that signals are correctly propagated to your application.

Comment on lines +20 to +27
_logger = _logging.getLogger(__name__)
if not _logger.handlers:
_handler = _logging.StreamHandler()
_handler.setFormatter(
_logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
)
_logger.addHandler(_handler)
_logger.setLevel(_logging.WARNING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Setting a default StreamHandler and logging level (WARNING) directly in __init__.py can interfere with an application's logging configuration. It's generally recommended for libraries to add a NullHandler to prevent "No handlers could be found for logger" messages if the application doesn't configure logging, and then provide a separate configure_logging function for explicit setup. This allows applications to have full control over logging output.

Comment on lines +58 to +60
for handler in logger.handlers:
handler.setLevel(level)
handler.setFormatter(_logging.Formatter(format))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configure_logging function modifies existing handlers. If this function is called multiple times, it might lead to unintended modifications of handlers that were not meant to be reconfigured. For a complete reconfiguration, it's often safer to remove existing handlers before adding new ones, or ensure that the function is idempotent.

)

# Initialize private attributes
self._client = genai.Client(vertexai=True, location="global")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Initializing genai.Client directly in the __init__ method can make testing more difficult and tightly couples the agent to a specific client configuration (e.g., vertexai=True, location="global"). Consider making the genai.Client configurable via dependency injection (passed as an argument) or initializing it lazily with a factory function. This improves testability and flexibility for different deployment environments.

# Execute code asynchronously while streaming child events in real-time
from google.adk.code_executors.code_execution_utils import CodeExecutionInput

# Reset queue state BEFORE starting the task to avoid race conditions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This import statement is inside a loop. It should be moved to the top of the file to avoid repeated imports, which can have a minor performance impact and is generally considered bad practice for readability and maintainability.


# Also check REPL locals for FINAL_VAR pattern
if final_answer is None:
final_answer = find_final_answer(response_text, executor._repl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing executor._repl directly from RLMAgent violates encapsulation, as _repl is a private attribute of RLMCodeExecutor. It would be better if RLMCodeExecutor exposed a public method (e.g., get_final_answer_from_repl_state()) to retrieve this information, or if find_final_answer was designed to work with the executor object's public interface.

Comment on lines +339 to +341
if lazy:
file_collection = self.create_lazy_files(files)
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if context is None: raise ValueError(...) check and the subsequent ctx = context assignment seem to belong to the completion function's logic rather than FileLoader's build_context method. This indicates a slight mixing of concerns. FileLoader's primary responsibility should be to load and process files, not to validate the overall context input for the RLM system. Consider moving this validation and context merging logic to the completion function or a dedicated context preparation utility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant