The AI Telco Engineer
This framework deploys a swarm of parallel agents to autonomously design and optimize wireless communication algorithms for user-defined tasks, such as channel estimation, link adaptation, or LDPC decoding. Each agent is powered by a large language model (LLM) and operates within an isolated, containerized environment. Agents have access to a toolkit that includes file editing capabilities, Sionna documentation, and a task-specific evaluation tool that provides feedback on algorithmic performance.
The framework implements an idea-driven optimization loop. An orchestrator LLM proposes N distinct algorithmic approaches (ideas) for the task. A population of M agents is distributed across those ideas, with each agent implementing and improving one assigned approach in its own isolated workspace. When an agent completes, the orchestrator LLM summarizes its algorithm. At the end of each generation, the orchestrator reviews all summaries and metrics to propose N new ideas for the next generation, optionally referencing the best algorithms found so far as a starting point. Candidates are organized by their assigned idea on the leaderboard.
The system runs multiple LLM agents in parallel to explore and optimize algorithmic approaches. Each agent:
- Is assigned a distinct algorithmic approach by an orchestrator LLM
- Has access to tools for file operations, code execution, and Sionna documentation
- Runs in an isolated Docker container workspace
- Is evaluated with a task-specific evaluation tool
- Contributes to a leaderboard
Setup
1. Install Python dependencies
pip install -r requirements.txt
2. Set API key
Set your LLM API key as an environment variable:
export MODEL_API_KEY=<your-api-key>
Tasks
A task is a folder that provides everything needed to run the agentic framework for a given problem. Tasks live under tasks/.
Launching a task
Run:
python launch.py <task_folder>
Examples:
# Channel estimation python launch.py tasks/channel_estimation # Link adaptation python launch.py tasks/link_adaptation
The framework is bundled with example tasks in tasks/. Each includes a visualize_results.ipynb notebook that visualizes the algorithms found by the framework and compares them to baselines. You can run the notebook without running the framework; the notebooks use pre-copied algorithms produced by the agents.
Bundled tasks
| Task | Metric | Direction | Description |
|---|---|---|---|
channel_estimation | Normalized Validation Error (NVE) | Lower is better | MIMO channel estimation using Sionna |
channel_estimation_cov | Normalized Validation Error (NVE) | Lower is better | MIMO channel estimation with covariance information |
link_adaptation | Spectral efficiency (bits/s/Hz) | Higher is better | MCS selection controller for link adaptation |
Leaderboard
The leaderboard is the live record of the search: it lists all algorithms produced by the agents, their evaluation outcome (success or failure), and the task metric (e.g. BLER or spectral efficiency).
While a task is running (or after it has run), you can view the leaderboard in a web UI. From the repository root, run:
./view_leaderboard.py
Then open http://localhost:8000 in your browser.
To point the viewer at a specific workspace folder (e.g. for a task that uses a custom workspace path):
./view_leaderboard.py --workspace path/to/workspaces
Use ./view_leaderboard.py --help for more options (e.g. --port).
Creating a New Task
Create a new subfolder under tasks/ with the following:
- Required:
config.json,prompt.md,eval_tool.py, and adocker/folder (Dockerfile and build script). - Optional:
tool_factory.pyfor extra tools (e.g. Sionna documentation search).
1. Create the task folder
mkdir -p tasks/my_task/eval mkdir -p tasks/my_task/docker
2. Create the Docker container
Add a Dockerfile in tasks/my_task/docker/ with the dependencies for your task, and a build script:
# tasks/my_task/docker/build_agent_container.sh docker build -t agent_my_task -f dockerfile_agent_container .
This image is used to run agents in isolated workspaces. Agents can install additional packages via PyPI inside the container.
The image name must match workspace.docker_image in config.json.
3. Create required files
config.json — Task configuration. Copy from an existing task and adapt. Example:
{ "agent_llm": { "model": "<model-name>", "base_url": "<api-base-url>", "temperature": 0.7, "top_p": 0.95 }, "manager_llm": { "model": "<model-name>", "base_url": "<api-base-url>", "temperature": 0.0, "top_p": 0.95 }, "workspace": { "path": "workspaces", "docker_image": "agent_my_task", "memory_limit": "16g", "pids_limit": 2048, "use_gpu": true }, "tools_config": { "eval_timeout": 120 }, "num_workers": 10, "higher_is_better": false, "population_size": 20, "num_ideas": 5, "num_generations": 5, "timeout": 900, "task_submit_delay": 30.0, "prompt_path": "prompt.md" }
| Parameter | Description |
|---|---|
agent_llm.model | LLM model used by agents (workers) |
agent_llm.base_url | API base URL for the agent LLM |
agent_llm.temperature | Sampling temperature for agents |
agent_llm.top_p | Top-p (nucleus) sampling for agents (default: 0.95) |
agent_llm.model_kwargs | Optional extra model kwargs (e.g. {"reasoning_effort": "high"}) |
manager_llm.model | LLM model used by the orchestrator (ideas and summaries) |
manager_llm.base_url | API base URL for the orchestrator LLM |
manager_llm.temperature | Sampling temperature for the orchestrator (typically 0.0) |
manager_llm.top_p | Top-p (nucleus) sampling for the orchestrator (default: 0.95) |
manager_llm.model_kwargs | Optional extra model kwargs for the orchestrator |
workspace.path | Directory for agent workspaces (relative to task folder) |
workspace.docker_image | Docker image for agent containers |
workspace.memory_limit | Memory limit per container (default: "16g") |
workspace.pids_limit | Max processes per container (default: 2048) |
workspace.use_gpu | Enable GPU access in containers (default: true); falls back to CPU if NVIDIA runtime is unavailable |
tools_config | Configuration passed to ToolFactory and EvalTool |
tools_config.eval_timeout | Timeout in seconds for each evaluation run (default: 120) |
num_workers | Number of parallel agent workers |
higher_is_better | If true, higher metric values are better |
population_size | Total number of candidates per generation |
num_ideas | Number of distinct algorithmic approaches per generation |
num_generations | Number of optimization generations |
timeout | Timeout in seconds per agent |
task_submit_delay | Delay between task submissions (rate limiting) |
prompt_path | Path to the prompt file, relative to the task folder |
prompt.md — Task description for the agents. Describe in natural language the problem they should solve.
eval_tool.py — Must define an EvalTool class with:
-
run_evaluation(filename: str) -> str(required). The framework calls this after each run of an agent to score the algorithm. It must evaluate the workspace filefilenameand return the string format described below. If the file is missing or invalid, returnFAILURE,optionally followed by message lines. -
Output format. Both the agent-facing evaluation tool and
run_evaluationmust return a string in this format:- First line:
SUCCESS, <metric>orFAILURE, <metric>orFAILURE,<metric>is a numeric value (e.g.3.3687,12.5). UseFAILURE,(nothing after the comma) when there is no meaningful metric (e.g. crash before any run).
- Remaining lines (optional): Details for the agent and logs (e.g. error messages, statistics). The framework uses only the first line when recording the result.
- First line:
Example first lines: SUCCESS, 3.3687, FAILURE, 1.25, FAILURE,
from tool_lib.base import ToolProvider from langchain_core.tools import tool, BaseTool class EvalTool(ToolProvider): def __init__(self, eval_timeout: int = 120): self._eval_timeout = eval_timeout self._workspace = None self.evaluate = tool(self._evaluate) def run_evaluation(self, filename: str) -> str: """Evaluate the given file and return 'SUCCESS, <metric>' or 'FAILURE,'.""" # Run your evaluation logic on the workspace file `filename` # Return first line "SUCCESS, <metric>" or "FAILURE," + optional lines return "SUCCESS, 3.14\nOptional details..." def _evaluate(self) -> str: """Evaluate the algorithm. Docstring becomes tool description for the agent.""" return self.run_evaluation("draft.py") # --- ToolProvider interface --- def get_tools(self) -> list[BaseTool]: return [self.evaluate] def set_workspace(self, workspace): self._workspace = workspace
tool_factory.py (optional) — Provides additional tools (e.g. Sionna documentation search).
The class must define a TOOL_TYPES class attribute listing the ToolProvider types it uses. The framework calls build() on each type before spawning workers, allowing expensive one-time setup (e.g. building a vector-store index) to run once in the orchestrator process.
from tool_lib.base import ToolProvider from config import ToolsConfig class ToolFactory(ToolProvider): TOOL_TYPES = [...] # List of ToolProvider types used by this factory def __init__(self, tools_config: ToolsConfig): # Initialize tools using tools_config pass # --- ToolProvider interface --- def get_tools(self): return [...] def set_workspace(self, workspace): pass
4. Build the container and run
Build the Docker image, then launch the task:
python launch.py tasks/my_task
Stopping
Press Ctrl+C to stop the agents gracefully. The leaderboard is saved after each candidate completes, so progress is preserved.
Tool-Specific Configuration
Configure these only if your task uses the corresponding tools via tool_factory.py.
Sionna Documentation (RAG-based documentation search)
The SionnaDoc tool indexes Sionna documentation for semantic search. It requires an embedding model and, optionally, a cross-encoder reranker. Indexing is performed once and cached to disk.
Configure the tool through tools_config.sionna_doc_config in config.json:
{ "tools_config": { "sionna_doc_config": { "cache_dir_path": "api_doc_cache", "embedding_model": "<embedding-model-name>", "embedding_base_url": "<embedding-server-url>", "reranker_model": "<reranker-model-name>", "reranker_base_url": "<reranker-server-url>", "retrieve_k": 12, "rerank_top_n": 4, "summarize_llm": { "model": "<summarization-model-name>", "base_url": "<summarization-api-url>", "temperature": 0.0 } } } }
| Parameter | Description |
|---|---|
cache_dir_path | Directory for the FAISS index cache |
embedding_model | Embedding model name (served via any OpenAI-compatible endpoint) |
embedding_base_url | Base URL of the embedding server (e.g. TEI, Ollama /v1, vLLM) |
reranker_model | Cross-encoder model for reranking (optional; leave empty to skip) |
reranker_base_url | Base URL of the reranker server |
retrieve_k | Number of documents to retrieve before reranking |
rerank_top_n | Number of documents to return after reranking |
summarize_llm | Optional LLM config for summarizing tutorials before indexing (omit or set to {} to skip) |
summarize_llm.model | LLM model name for summarization |
summarize_llm.base_url | API base URL for the summarization LLM |
summarize_llm.temperature | Sampling temperature for summarization (default: 0.0) |
The embedding and reranker endpoints must speak the OpenAI-compatible protocol (/v1/embeddings and /v1/rerank). You can serve them with TEI, Ollama, vLLM, or any compatible server.
How to Cite
If you use this software, please cite it as:
@software{the-ai-telco-engineer, title = {The AI Telco Engineer}, author = {{Aït Aoudia}, Fayçal and Hoydis, Jakob and Cammerer, Sebastian and Maggi, Lorenzo and Marti, Gian and Keller, Alexander}, note = {https://github.com/NVlabs/the-ai-telco-engineer}, year = {2026} }