Building the World's First Reinforcement Learning Cloud Platform: How Jiuzhang Cloud Extreme Achieved It}

Jiuzhang Cloud Extreme launched the first industrial-grade reinforcement learning cloud platform, AgentiCTRL, supporting massive heterogeneous compute resources, leading a new era of AI infrastructure.

Building the World's First Reinforcement Learning Cloud Platform: How Jiuzhang Cloud Extreme Achieved It}

图片

From the ChatGPT-driven surge in general chatbots to the rapid development of intelligent agent models, AI is undergoing a profound paradigm shift: from passive language models to autonomous decision-making agents. We are entering the so-called Experience Era or Software 3.0 Era.

In this transformation, Reinforcement Learning (RL) is re-emerging as a key technology driving AI to achieve closed-loop perception-decision-action and even Artificial General Intelligence (AGI).

As Nobel laureate and DeepMind CEO Demis Hassabis said: "Reinforcement learning is enough to achieve intelligence because it is the way all mammals (including humans) learn." Similarly, Richard Sutton, father of RL and 2024 Turing Award winner, wrote in Experience Era: "By building the foundations of RL and adapting its core principles to new challenges, we can fully unleash autonomous learning potential and pave the way for superhuman intelligence."

However, compared to pre-trained large models, RL used post-training faces unique challenges: it requires high-frequency data interactions and environment feedback, as well as stable, elastic large-scale computing clusters. Traditional cloud platforms mainly handle static inference loads, making it difficult to support the dynamic, multi-stage, resource-intensive nature of RL training.

In this context, whoever can build a truly scalable RL-oriented AI computing platform will gain a strategic advantage in the next wave of AI infrastructure. In June 2025, Jiuzhang Cloud Extreme officially launched AgentiCTRL, the world’s first industrial reinforcement learning cloud platform supporting multi-thousand-GPU heterogeneous scheduling.

图片

AgentiCTRL is based on a Mixture of Experts (MoE) architecture, requiring minimal code to complete AI agent training and inference, significantly enhancing large model reasoning capabilities. Compared to traditional RL solutions, AgentiCTRL improves end-to-end training efficiency by 500% and reduces costs by 60%.

图片

In this era of cloud services transitioning to AI-native cloud, Jiuzhang Cloud Extreme has pioneered the full-chain path for large-scale RL cloudification, establishing a new industry paradigm for agent-native cloud.

Why Jiuzhang Cloud Extreme? System-level Reengineering for Leadership

Reinforcement learning is a long-term, dynamic, state-intensive process. Training effective decision-making agents in the real world requires more than just raw computing power; it involves complex system design including elastic resources, scheduling, strategy feedback, task orchestration, and fault tolerance.

Jiuzhang Cloud Extreme did not follow the traditional approach of simply scaling GPUs horizontally but restructured the entire RL training process from the ground up. The launch of AgentiCTRL is the result of this systemic overhaul.

图片

Let’s focus on some core highlights of AgentiCTRL.

First, AgentiCTRL simplifies the RL training process to the extreme.

Previously, deploying RL training often required dozens of scripts, complex resource configurations, and node orchestration. With AgentiCTRL, users only need minimal code to start the full training-inference-feedback loop.

图片

      Code demo

This is achieved through deep encapsulation and abstraction of environment simulation, strategy execution, and reward feedback mechanisms. For algorithm engineers, this means several times higher development efficiency; for enterprise clients, it means RL becoming truly usable, controllable, and reproducible.

Second, AgentiCTRL integrates serverless architecture natively into RL training platforms.

RL’s computational demands are highly unstable: periods of GPU idleness and sudden need for hundreds of cards. Traditional static resource binding leads to waste and scheduling difficulties. AgentiCTRL’s elastic compute orchestration enables resource on-demand, maximizing utilization and reducing costs.

More importantly, Jiuzhang Cloud Extreme’s self-developed heterogeneous compute OS and scheduling platform make AgentiCTRL the world’s first stable platform supporting tens of thousands of GPUs.

This capability has been validated in practice. For example, using AgentiCTRL, Jiuzhang Cloud Extreme retrained the base model Qwen-VL-2.5-7B on challenging Computer Use tasks, resulting in the Alaya-UI agent.

During this process, AgentiCTRL demonstrated strong performance, reducing training time by 37%, increasing GPU utilization by 25%, and decreasing manual intervention by 90%. Overall, costs dropped by 60%.

图片

The resulting Alaya-UI significantly outperformed the baseline, with task completion rates jumping from 6.87% to 24.8% on OSWorld benchmarks.

Extensive experiments show that AgentiCTRL can boost end-to-end training efficiency by over 5 times, reduce overall costs by 60%, and is currently the most cost-effective RL cloud platform.

In essence, Jiuzhang Cloud Extreme is not just adding an RL module to an existing AI cloud but has reconstructed the entire intelligent computing platform architecture and logic with RL as a native capability.

Beyond the Platform: Jiuzhang Cloud Extreme’s Strategic Layout for Intelligent Infrastructure

The RL cloud platform is just the surface. The real strength lies in Jiuzhang’s frontier exploration of the next-generation AI cloud’s core.

Traditional cloud providers treat AI as a feature patch, mainly focusing on resource distribution and compute services, akin to bare-metal provisioning. Jiuzhang’s strategy is different: reinforcement learning is not just a cloud service module but a fundamental operating system-level capability, supporting agent operation, scheduling, learning, and evolution.

Jiuzhang aims to build a complete native cloud infrastructure for intelligent agents, including not only compute resources but also three synchronized layers:

  • Software-defined AI infrastructure: unified scheduling and orchestration of heterogeneous compute, high-performance distributed storage, and networking.
  • Jiuzhang NeW OS: an abstraction and scheduling layer, including serverless architecture, AI-oriented data centers, multi-AIDC training, heterogeneous resource scheduling, and distributed compute networks.
  • Jiuzhang NeW Cloud: developer tools, large model inference platforms, RL cloud, elastic container VKS, and dedicated container DKS, forming an API and toolchain ecosystem for developers, model vendors, and applications.
图片

Jiuzhang NeW Cloud is not only the compute backbone of the RL platform but also the core of its strategic infrastructure. It offers a universal AI compute standard, with usage-based serverless architecture, lowering barriers and enabling scalable AI deployment in high-value industries like finance, industry, and energy.

Unlike traditional cloud vendors relying on GPU sales or pay-per-card models, Jiuzhang provides truly usage-based serverless architecture, making AI accessible to everyone.

As Turing Award laureate John McCarthy once said: "Computing power should be as accessible as water and electricity." Today, Jiuzhang is realizing this vision.

In scheduling, Jiuzhang’s self-developed heterogeneous compute OS and AI-native resource management system enable elastic scheduling, multi-tenant isolation, and support for second-level generation of millions of tokens, with GPU utilization exceeding 95%. The total cost of ownership (TCO) is reduced by 60%, creating a clear performance-price advantage.

Moreover, Jiuzhang’s industry deployment spans government, finance, telecom, manufacturing, energy, transportation, and biomedicine, supporting online training and inference of multiple RL models and agents. It remains a leading platform in the domestic RL cloud field, maintaining a long-term advantage in agent training capacity and scheduling efficiency.

Thanks to these capabilities, Jiuzhang Cloud Extreme has pioneered a full cycle from training engine to industrial deployment, forming a unique AI cloud-native ecosystem.

Looking ahead, Jiuzhang’s strategic vision extends beyond technology, aiming to dominate the future AI infrastructure: in a homogenized large model landscape, whoever controls the training-feedback-deployment loop will hold the key to the next AI ecosystem. Jiuzhang is already ahead with its unique advantages.

To accelerate this strategy, Jiuzhang launched the AI-STAR Enterprise Ecosystem Alliance and co-established the AI-STAR Intelligent Computing Ecosystem Fund with industry partners like Saifu Investment, with an initial investment of 180 million RMB. The goal is to attract algorithm companies, open-source communities, and industry clients to co-develop the RL platform ecosystem, expanding application scenarios in finance, industry, and energy sectors.

图片

Thus, the release of AgentiCTRL is a key step in Jiuzhang Cloud Extreme’s future roadmap. It integrates platform capabilities, development tools, ecosystem partners, and capital, forming a comprehensive strategic layout for the next decade of intelligent computing.

Leading the Reinforcement Learning Cloud Race!

When RL becomes the core engine for AI agent training, the decisive factor will be the gap between 'availability' and 'scalability.'

Jiuzhang Cloud Extreme demonstrates that the success of RL cloud deployment is not just about stacking computing power but involves a systemic overhaul—from architecture to operational logic: supporting multi-thousand-GPU heterogeneous scheduling, serverless elasticity, and native RL workload abstraction. This is a paradigm shift, not just an optimization.

From a customer perspective, this shift offers tangible benefits:

  • Lower development barriers: No need to build environments, orchestrate nodes, or maintain resources; RL training becomes as simple as calling an API.
  • Significantly improved training efficiency: Performance can increase by 5 times.
  • Reshaped cost structure: More efficient resource scheduling reduces costs by up to 60%, making RL truly cost-effective.

More importantly, Jiuzhang Cloud Extreme is not just a tool provider but is building a native cloud platform supporting agent operation at the OS level.

In the future, RL will no longer be a niche research privilege but a standard component of AI systems. Those who can bring it from labs into engineering environments will seize the future. Jiuzhang is already leading the way.

When the era of native agents arrives, we may look back at this pivotal shift, and Jiuzhang Cloud Extreme and AgentiCTRL might be recognized as its earliest pioneers.

Subscribe to QQ Insights

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe