Building a Multi-Tenant AI Agent Platform: Scaling Independent Reasoning Engines
This article details the design and implementation of Bring Your Own Planner (BYOP), a multi-tenant AI agent platform developed at Salesforce. BYOP enables independent teams to build, deploy, and scale custom reasoning engines on shared infrastructure, supporting over 7,000 active agent sessions and managing a high volume of daily requests with minimal overhead.
Addressing Monolithic Planner Limitations
Prior to BYOP, a monolithic planner architecture led to significant challenges:
- Code Coupling: Regressions in one agent could impact others, necessitating system-wide validation for minor changes.
- Infrastructure Contention: Shared compute resources caused the "noisy neighbor" effect, leading to unpredictable resource allocation and scaling issues.
- Deployment Bottlenecks: Centralized CI/CD pipelines for over 100 engineers became a significant bottleneck, slowing down development velocity.
BYOP resolves these issues by isolating each reasoning engine, allowing teams to own their code, scale independently, and deploy through self-service pipelines, thus eliminating cross-team interference.
BYOP Platform Mission and Architecture
The core mission of BYOP is to provide a multi-tenant platform that empowers teams to develop and operate custom reasoning engines without adhering to a restrictive, centralized planner. This autonomy is achieved while reusing common infrastructure components.
The platform offers:
- Session management
- State persistence
- Streaming capabilities
- Tool invocation
- Integration with LLMs and enterprise data sources
This allows development teams to focus solely on domain-specific reasoning logic rather than platform complexities. The architecture emphasizes observability, distributed tracing, and horizontal scaling from its inception to support decentralized execution and independent development.
Use Cases and Validation
BYOP has been validated by two production reasoning engines addressing diverse problems:
SearchAgent
- Functionality: Synthesizes answers from fragmented enterprise data sources (CRM records, reports, Chatter, knowledge articles, external sources) for read-heavy search queries.
- Example Query: "Show me all high-priority enterprise customer cases from Q4."
- Technical Implementation: Leverages pre-built API clients and Redis-backed session management. It orchestrates queries across SOQL, Reports API, Enterprise Search, and Agentforce.
- Performance: Handles 4,000+ daily requests with an average latency of 2.73 seconds. Approximately 70% of requests involve multi-tool workflows.
WaiiPlanner
- Functionality: Enables non-technical users to configure Data 360 elements (segments, data transforms, ML models, data streams) by executing write operations that require explicit confirmation.
- Technical Implementation: Implements an 8-state ReAct machine with state persistence across user interactions. It uses async job polling and streamed progress updates for multi-minute operations. BYOP's session management persists the state machine, action queue, and pending confirmations via SDK.
- Development: Production-ready in under 4 months.
- Scope: Orchestrates 24 distinct Data 360 actions across live customer environments.
Restoring Team Autonomy
As the number of engineers contributing to the AI agent system grew, coordination overhead became a significant bottleneck. BYOP restores team autonomy by strictly separating platform responsibilities (infrastructure, session handling, tool integration) from reasoning logic (custom reasoning engines). This clear ownership boundary eliminates unintended coupling.
Self-service CI/CD pipelines enable independent deployments without centralized approval. Because each agent operates in an isolated environment, changes are contained, allowing for rapid iteration without compromising system stability.
Platform Contract Design Challenges
Designing the BYOP platform contract involved balancing a simple interface with the needs of diverse AI reasoning engines. The challenge was to expose sufficient metadata and control for complex engines without making the interface overly complex or tightly coupling teams to platform internals.
This was resolved by standardizing contracts around the agent lifecycle (conversation initiation, multi-turn interaction, session termination) and providing a thin SDK that exposes essential platform capabilities like session persistence, streaming, and tool access. This balance ensures scalability and usability across varied AI workloads.
Managing Multi-Turn Conversations and Isolation
Preserving conversational context while enforcing strict tenant isolation presented security and scalability challenges. Isolation is paramount in multi-tenant systems to prevent data leaks.
- Isolation Enforcement: Tenant and session identifiers are embedded directly into storage keys for all read/write operations, ensuring boundary validation.
- Context Management: A 24-hour TTL policy is applied to conversation history to preserve relevant context while automatically evicting stale data, managing memory pressure.
- Concurrency Control: Thread-safe access patterns prevent race conditions with shared storage.
- Graceful Degradation: The system can operate even if session storage is unavailable, ensuring reliability.
Observability, Cost Attribution, and Performance Tuning at Scale
At production scale (over 7,000 active sessions and 14,000–15,000 daily requests, with platform overhead as low as 5 milliseconds per request), observability, cost attribution, and performance tuning become complex.
- Observability: A consistent session identifier is propagated across all services to enable distributed tracing and full lifecycle visibility.
- Cost Attribution: Detailed events are tagged with tenant and organization identifiers for accurate cost tracking across agents and workloads.
- Performance Tuning: Auto-scaling and continuous monitoring dynamically adjust resource allocation to handle the variability in agent workloads, from simple flows to complex multi-step reasoning involving LLM calls.
Key Takeaways
- BYOP is a multi-tenant AI agent platform enabling team autonomy and independent scaling of custom reasoning engines on shared infrastructure.
- It addresses limitations of monolithic architectures, such as code coupling, resource contention, and deployment bottlenecks.
- Strict tenant isolation is enforced through embedded identifiers in storage keys and TTL policies for session context.
- A well-defined platform contract and thin SDK balance interface simplicity with the needs of diverse AI workloads.
- Observability, cost attribution, and performance tuning are managed through consistent session identifiers, detailed event tagging, and dynamic resource allocation.
Leave a Comment