NodeRun
Execution Infrastructure for AI Agents
Reproducible, auditable, cost-efficient distributed execution layer—taking AI Agents from demos to production scale.
Abstract
As LLM-powered AI Agents transition from theory to practice, a structural bottleneck has emerged: execution. This paper examines the inherent tensions in current Agent execution approaches—cost, security, scalability, and reproducibility—and presents NodeRun's architecture and technical implementation in detail.
NodeRun is an execution layer purpose-built for AI Agents. By decoupling compute from the LLM and placing it in a stateless, isolated, auditable, and cost-efficient distributed runtime, NodeRun provides the critical infrastructure for scaling AI Agents from demos to production. For deterministic workloads, reproducible execution is available as an opt-in feature.
The NodeX Labs Ecosystem: NodeRun & NodeHub Synergy
NodeRun is the AI Agent execution layer product from NodeX Labs, designed to work in tight synergy with NodeHub, the company's core infrastructure. Understanding this ecosystem positioning is key to grasping NodeRun's strategic value and technical advantages.
NodeX Labs Product Portfolio
NodeX Labs is building next-generation distributed compute infrastructure. The core products include:
- •NodeHub: The flagship product—a decentralized compute network with 11,000+ globally distributed nodes. NodeHub provides elastic, low-cost compute capacity for diverse workloads and serves as the infrastructure layer for the entire NodeX Labs ecosystem.
- •NodeRun: The AI Agent execution layer built on NodeHub. NodeRun abstracts NodeHub's distributed compute into a standardized, Agent-friendly execution service—NodeX Labs' strategic product for the AI Agent market.
Technical Synergy Between NodeRun & NodeHub
| Layer | NodeHub Provides | NodeRun Abstracts |
|---|---|---|
| Compute Supply | 11,000+ globally distributed nodes with elastic, redundant capacity | Intelligent scheduling that routes tasks to optimal nodes for low latency and high availability |
| Cost Structure | Structural cost advantages from distributed supply—far below centralized cloud providers | Pay-per-run, sub-second billing that passes cost savings directly to Agent developers |
| Trust Mechanisms | Node registration, reputation systems, and economic incentives as foundational trust framework | Proof-Lite / Proof-Strong execution attestation system providing auditability for AI Agents |
| Network Coverage | Global node distribution across multiple geographic regions | Proximity-based routing and IP diversity for compliance and performance in public web interactions |
Ecosystem Role: The "Execution Gateway" for AI Agents
Within the NodeX Labs ecosystem, NodeRun serves as the "execution gateway for AI Agents":
- •Upstream: Provides plug-and-play execution capabilities for AI Agent frameworks (LangChain, AutoGPT, Claude Computer Use, etc.) through standard interfaces like MCP.
- •Downstream: Translates Agent execution demands into NodeHub distributed compute calls, fully leveraging NodeHub's scale and cost advantages.
- •Lateral: Integrates with other NodeX Labs ecosystem products (storage, data services, etc.) to provide comprehensive infrastructure support for AI Agents.
NodeX Labs Ecosystem
The Structural Tension: LLM Probabilism vs. Agent Determinism
The AI Agent paradigm centers on the "perceive-reason-act" loop. While "reasoning" is powered by LLM capabilities, "acting" requires a reliable execution environment. The problem: LLM characteristics fundamentally conflict with traditional compute execution requirements, creating architectural friction in current Agent systems.
| Dimension | LLM Characteristics | Execution Requirements |
|---|---|---|
| Output | Probabilistic: Same input doesn't guarantee identical output; prone to hallucination. | Deterministic: Requires isolated, stateless environments for predictable results; reproducible execution available for closed-form tasks. |
| State | Stateful: Relies on large context windows to maintain conversation and reasoning coherence. | Stateless: Each execution should run in a clean, isolated environment to prevent cross-task state pollution. |
| Cost Model | Token-based: Cost correlates with input/output text length, not computational complexity. | Resource-based: Cost should correlate with actual compute resources consumed (CPU, memory, time). |
| Trust | Implicit: Users trust the model provider's brand and opaque internal processes. | Explicit Audit: Execution process and results must be auditable, ideally with cryptographic proofs for integrity and reproducibility. |
Force-coupling these fundamentally incompatible modules leads to predictable problems:
- •Cost Explosion: Having LLMs "simulate" execution or pass large intermediate data through context consumes massive tokens, making high-frequency tasks economically unviable.
- •Scalability Ceiling: Relying on LLM context for execution state management creates fragile, complex systems that resist large-scale, high-concurrency Agent deployment.
- •Audit Impossibility: In finance, automated testing, and other high-stakes domains, probabilistic outputs and opaque processes are deal-breakers.
Use Cases: NodeRun in Real-World Applications
NodeRun's design philosophy stems from real business needs. The following are typical application scenarios demonstrating how NodeRun solves practical challenges faced by AI Agents in production environments.
Multi-Agent Collaboration Systems
In complex business workflows, multiple AI Agents need to work together. For example, an automated research system might include: Data Collection Agent, Analysis Agent, Report Generation Agent, and Quality Review Agent.
Challenges
- •How to avoid resource contention and state pollution when multiple Agents execute simultaneously?
- •How to prevent fault propagation when one Agent crashes?
- •How to precisely track each Agent's resource consumption and costs?
NodeRun Solution
- •Independent Sandboxes: Each Agent executes in a completely isolated environment
- •Fault Isolation: Single Agent failures don't affect other Agents
- •Cost Attribution: Each execution has independent billing records for precise cost allocation
Financial Data Analysis & Compliance
The financial industry has strict compliance requirements for data processing. AI Agents performing financial analysis must ensure data security and operational auditability.
Challenges
- •How to prove Agent analysis results are computed from original data, not fabricated?
- •How to meet regulatory audit requirements for data processing?
- •How to ensure sensitive financial data doesn't leak outside the execution environment?
NodeRun Solution
- •Proof-Lite Attestation: Generate cryptographic proofs for each execution, ensuring verifiable results
- •Complete Audit Logs: Record all inputs, outputs, and intermediate states for compliance
- •Data Isolation: Sandbox destroyed immediately after execution, no data residue
Security Penetration Testing
AI Agents are increasingly used for automated security testing. These tasks require executing potentially dangerous code while needing diverse IP addresses to simulate real attacks.
Challenges
- •How to safely execute test scripts that may contain malicious code?
- •How to obtain diverse IP addresses to test target system defenses?
- •How to ensure the test environment doesn't affect production systems?
NodeRun Solution
- •Complete Isolation: Test code runs in fully isolated sandboxes, unable to affect host systems
- •11K+ Residential IPs: Globally distributed residential IP network simulating real user access
- •Ephemeral Environments: Environment automatically destroyed after each test, leaving no traces
Content Collection & Operations
Content operations teams use AI Agents to automate content collection, processing, and publishing. These tasks typically involve large volumes of network requests and data processing.
Challenges
- •How to avoid IP bans from target websites due to frequent access?
- •How to handle large concurrent collection tasks?
- •How to control collection task costs?
NodeRun Solution
- •IP Rotation: Automatic rotation through different residential IPs to avoid bans
- •Elastic Scaling: Auto-scale based on task volume for high concurrency
- •Pay-per-Run: Only pay for actually executed tasks, predictable costs
Trusted AI Inference
In high-stakes decision scenarios (healthcare, legal, finance), AI inference processes must be verifiable. Users need confidence that AI conclusions are based on correct data and logic.
Challenges
- •How to prove AI computation processes haven't been tampered with?
- •How to enable third parties to independently verify AI inference results?
- •How to prove computation correctness without exposing raw data?
NodeRun Solution
- •Proof-Strong Attestation: TEE-based trusted execution environment providing hardware-level execution proofs
- •Reproducible Execution: For deterministic tasks, anyone can re-execute and verify results
- •Zero-Knowledge Proofs (Roadmap): Prove computation correctness without exposing raw data
Data Processing Pipelines
Large-scale data processing tasks typically need to be split into multiple subtasks for parallel execution. AI Agents can intelligently orchestrate these tasks but need a reliable execution environment.
Challenges
- •How to efficiently execute thousands of data processing tasks in parallel?
- •How to handle partial task failures?
- •How to control costs for large-scale parallel tasks?
NodeRun Solution
- •Auto-Parallel: Intelligently distribute tasks to global nodes, maximizing parallelism
- •Fault Tolerance: Single task failures don't affect others, with automatic retry support
- •Cost Optimization: Select optimal nodes based on task characteristics, balancing cost and performance
| Metric | Traditional Approach | NodeRun |
|---|---|---|
| Parallelism | Limited by single-machine resources | Global 11K+ nodes |
| Fault Impact | May cause entire pipeline failure | Only affects single task |
| Cost Model | Pay for server time | Pay for actual execution |
Current Execution Approaches: Technical Trade-offs Analyzed
To build a better execution layer, we must first understand the specific trade-offs in current mainstream approaches. These aren't poorly designed—they're rational choices under specific constraints and goals. But these choices also determine why they can't serve as general-purpose, scalable Agent execution infrastructure.
Model-Embedded Execution: Google Gemini Code Interpreter
- •Implementation: Bundling a code interpreter as a built-in tool delivers "works out of the box" UX, but at the cost of ecosystem lock-in and opaque pricing.
- •Trade-offs: The execution environment is a "black box" tightly coupled to the model. Costs are bundled with expensive token usage—for high-frequency tasks, this hidden cost adds up fast.
Platform-Outsourced Cloud Sandboxes: Manus + E2B
- •Implementation: General-purpose Agent platforms integrate third-party cloud sandboxes for execution capabilities.
- •Trade-offs: Double cost structure. Users pay platform subscription fees while indirectly bearing the platform's high cloud sandbox costs—poor unit economics for end users.
Interaction Layer Optimization: Anthropic's "Code-as-Tool" Pattern
- •Implementation: Having Agents "write code" instead of "call tools directly" cleverly reduces token consumption.
- •Trade-offs: This solves "how to invoke" but sidesteps the core question of "where to execute"—leaving execution complexity to developers.
NodeRun Architecture: Purpose-Built for AI Agent Execution
NodeRun's design philosophy is "back to fundamentals." We don't build a sprawling general-purpose platform—we focus on doing one thing exceptionally well: providing AI Agents with an ultra-efficient, low-cost, trustworthy execution layer. The entire architecture is built around this singular goal.
NodeRun Architecture Overview
MCP-First Entry Point: The NodeRun Gateway
NodeRun's entry point is an ultra-lightweight gateway bridging Agents to the backend execution network. We've chosen MCP (Model Context Protocol) as the primary, native integration protocol—any MCP-compliant Agent can seamlessly add NodeRun as a standard "execution tool" with zero integration cost.
Distributed Scheduling System
This is NodeRun's "brain." The control plane (gateway, scheduling, policies) is centralized to ensure SLA guarantees and rapid iteration; the execution plane is distributed to capture structural cost advantages and global coverage.
Execution Sandbox: Docker-Based Cross-Platform Isolation
Sandbox Technology Choice: NodeRun's cross-platform execution foundation is Docker. Standardized container technology ensures high consistency between development and production environments. On this foundation, we've built a multi-layer security model:
- •Default Secure Mode (GA): All tasks run in rootless Docker containers with strict seccomp, cgroups, and namespace configurations—limiting process privileges, resource usage, and system visibility for solid baseline isolation.
- •Optional Hardened Mode (Roadmap): For scenarios requiring stronger isolation guarantees, NodeRun plans to offer pluggable hardening engines. On Linux nodes, gVisor can provide stronger kernel-level isolation; for tasks requiring full virtualization, MicroVM (e.g., Firecracker) may be supported as a higher-cost, higher-isolation option.
Extreme Lightweight Design: NodeRun uses lightweight container sandboxes with layered image caching and Warm Pool mechanisms. For common runtimes (Python/Node) in warm-start scenarios, typical sandbox startup is in the hundreds of milliseconds (target: 100-500ms); for cold starts, pre-fetching and local image caching keep overhead to sub-second or low single-digit seconds.
Stateless Execution Model: Isolation & Determinism
NodeRun strictly adheres to stateless and run-to-completion execution models. Every execution runs in complete isolation, ensuring deterministic and auditable results. For closed-form compute tasks (fixed inputs, no external state dependencies), reproducible execution is available as an opt-in feature supporting third-party re-computation and verification.
Each task runs in a freshly created, just-in-time sandbox instance. Upon completion, the sandbox and all its state are immediately and completely destroyed. This provides a solid foundation for code behavior reproducibility and simplifies system design.
Execution Proofs: Building Auditability & Trust
This is what fundamentally differentiates NodeRun from traditional cloud execution services. NodeRun doesn't claim that any single execution is "mathematically provable"—instead, we build auditability through layered engineering and economic mechanisms, progressively increasing overall execution trustworthiness.
NodeRun's proof system operates at two levels:
Execution Proof System
Proof-Lite: Integrity & Reproducibility Foundation
This lightweight proof is generated by default for all NodeRun executions. Its core goal is ensuring execution integrity and auditability, while providing the foundation for third-party re-computation. It contains these key fields:
- •
input_hash: Combined hash of code, parameters, and resource limits. - •
output_hash: Combined hash ofstdout/stderrand all output artifacts. - •
runtime_image_hash: Container image digest used for execution—ensuring absolute environment consistency. - •
dependency_lock_hash: Hash of dependency lock files (poetry.lock,package-lock.json)—ensuring precise third-party library version reproducibility. - •
sandbox_policy_hash: Hash of sandbox policies applied (network, file permissions, syscall restrictions). - •
node_idandtimestamp. - •
signature: Executing node's digital signature over all the above.
Replay Bundle: Optional Reproducible Execution
For closed-form compute tasks (fixed inputs, no external state dependencies—math calculations, code execution, data transformations, etc.), NodeRun offers an optional Replay Bundle feature. This self-contained package includes everything needed to reproduce execution:
- •Original code and input parameters.
- •Exact container image reference (
runtime_image_hash). - •Complete dependency lock files.
- •Sandbox policy configuration.
- •Original execution's
Proof-Literecord.
Third-party auditors can download this Bundle, re-execute the task in their own environment using standard Docker commands, and compare the newly generated output_hash against the original record to independently verify reproducibility.
Note: For open-form tasks (tasks depending on external state—web scraping, API calls, real-time data queries, etc.), inputs change over time, making reproducible Replay Bundles impossible. However, Proof-Lite attestations and complete audit logs are still provided.
Node Signing Key Management
The trustworthiness of Proof-Lite signatures depends on robust node key management. NodeRun implements strict lifecycle management:
- •Registration: Nodes generate key pairs when joining the network and register public keys with the centralized control plane.
- •Rotation: Keys rotate on policy-defined schedules; old keys expire after transition periods.
- •Revocation (CRL): Nodes detected cheating or going offline have their keys added to the Certificate Revocation List (CRL)—their signatures are no longer trusted.
- •Reputation Isolation: Node reputation scores are bound to their keys. Low-reputation nodes are restricted to low-value tasks, and their signatures carry reduced "trust weight."
Proof-Strong: Economic Mechanisms for Enhanced Trust
We recognize that Proof-Lite alone can't fully prevent malicious nodes. For high-value or high-risk tasks, NodeRun's roadmap includes the Proof-Strong mechanism.
It achieves probabilistic trustworthiness through sampled redundant execution (N-of-M) and arbitration: regular tasks are randomly sampled at a certain probability and sent to multiple nodes for redundant execution. When multiple nodes' output_hash values disagree, broader re-computation and arbitration are automatically triggered, with economic penalties (e.g., stake slashing) for malicious nodes.
Public Web Interaction Layer: Hard Boundaries
NodeRun's distributed architecture provides unique advantages as an efficient, robust public web interaction layer. However, distributed node networks also face potential abuse risks: malicious users might attempt DDoS attacks, pollute node IP reputations, or use nodes as proxies to access illegal sites.
We enforce these hard boundary rules for compliance and security:
- •Default Network Isolation with Explicit Opt-in: All sandboxes are offline by default. Network access is only enabled through transparent proxy when explicitly declared in task definition (
network_enabled: true). - •Dynamic Reputation Firewall: NodeRun includes a lightweight firewall that validates outbound request destination domains and IP addresses only. Business content is not collected by default—only minimal metadata for billing and abuse prevention.
- •Multi-Source Threat Intelligence & Blocklists: The system uses a pluggable multi-source threat intelligence architecture with blocklist mechanisms to intercept requests to known botnets, phishing sites, or illegal content.
- •No Persistent Identity Credentials (GA): Sandbox environments are designed as "memoryless." All cookies and tokens are physically wiped on sandbox destruction. GA explicitly does not support account hosting or persistent login state.
- •Distributed Anti-Abuse Rate Limiting: The scheduling system maintains global-view access counters. Abnormally high-frequency requests to single targets automatically trigger circuit breakers.
Unit Economics: Technical Analysis of Order-of-Magnitude Cost Advantages
NodeRun's unit economics must consider both compute costs and non-compute costs.
Order-of-Magnitude Compute Cost Advantages
Through distributed supply, Warm Pools with intelligent scheduling, layered image and dependency caching, and stateless short-lifecycle design, NodeRun fundamentally reshapes the compute cost curve—reducing per-execution compute costs from traditional cloud sandbox "minute-level VM rental" to "second-level container runtime + distributed network scheduling costs," achieving order-of-magnitude cost advantages.
Non-Compute Cost Considerations
We also recognize that a commercially viable service must account for non-compute costs. These are offered as premium, billable capabilities:
- •Network & IP Costs: For tasks requiring public web interaction, network egress and usage of specific geographic locations or high-quality IP profiles are significant cost components.
- •Abuse & Risk Control Costs: Maintaining a healthy, compliant network requires ongoing investment in abuse monitoring and risk control.
- •High-Trust Level Costs: Tasks requiring
Proof-Stronginvolve redundant execution, re-computation, and arbitration—additional compute and scheduling overhead. - •Failure Retry Costs: In distributed networks, automatic retries for failed tasks (to maintain SLA) also incur additional costs.
Execution-as-a-Service: Product Contract & Roadmap
NodeRun delivers not just technology, but a clear, trustworthy "Execution-as-a-Service" product contract. We define our service through standard APIs and quantifiable metrics.
Service Contract
| Service Dimension | Contract Details |
|---|---|
| Quotas & Limits | Clear QPS, concurrency limits, and domain-specific rate limits based on subscription tier. |
| Service Level Objectives (SLO) | We're committed to high-availability, low-latency execution. Specific SLAs will be progressively defined based on product maturity and user feedback, published at commercial launch. |
| Audit & Trust | All executions include Proof-Lite by default for integrity and reproducibility auditing. Proof-Strong available as an optional capability for enterprise or high-risk tasks. |
Evolution Roadmap
NodeRun's development path strictly follows its core value proposition, avoiding the heavy-asset "cloud computer" narrative.
- •Phase A (Current): Stateless Execution: Focus on high-frequency, one-shot, stateless execution capabilities—NodeRun's foundational value.
- •Phase B: Public Web Interaction: Building on Phase A, develop compliant, robust public web interaction capabilities with clear cost models.
- •Phase C: Enhanced Trust (Proof-Strong): Progressively launch and refine the economically-backed
Proof-Strongmechanism for high-stakes scenarios. - •Phase D (Optional Branch): Long-Running Tasks / VM Mode: We recognize market demand for long-running, stateful tasks. But this is a high-cost branch product line that won't enter NodeRun's mainline narrative or pricing structure.
Product Roadmap
Conclusion: Building Critical Infrastructure for Next-Gen AI Applications
NodeRun isn't an incremental improvement on existing cloud compute models—it's a paradigm shift targeting AI Agent execution requirements. By combining distributed compute, lightweight sandbox technology, and cryptographic proofs, we've built an execution layer that architecturally satisfies all five requirements: reproducibility, statelessness, auditability, low cost, and easy integration.