Stanford Graduate Uses RL to Build Agent, Chinese Startup Raises $12 Million in Seed Round}
A Chinese startup, Pokee.ai, founded by Stanford alumni, has developed an RL-based AI agent, completed a $12 million seed funding, and launched its public testing version, marking a significant milestone.

Pokee AI public testing version is now officially live!
“Hello, can you hear me?” At 10 a.m. Beijing time, Zhu Zheqing, founder of Pokee.ai, connected with us from the US West Coast, where it was 7 p.m. local time.

He described his recent state as “busy”—busy launching the public beta of Pokee AI, handling post-funding matters, expanding his core team from 4 to 7 members, celebrating his 29th birthday on Xiaohongshu, and engaging with online comments.
“Busy” isn’t new for him. Over 200 days ago, he was busy founding Pokee.ai, talking to over 100 investors about building AI agents with reinforcement learning, and preparing for product testing.
Going further back to 2017, he was still “busy”—studying for a PhD in reinforcement learning at Stanford and working at Meta, leading teams to apply RL in ad bidding and content generation, generating significant revenue for the company.
Zhu Zheqing has become accustomed to “being busy,” but he says that entrepreneurship, despite the busyness, offers more time for reflection—a completely new experience.
He started his company in October last year, named Pokee.ai, which means “small pocket,” symbolizing a lightweight, decision-making, problem-solving model that is quick and efficient. The company's focus is on developing an interactive, personalized, and high-efficiency AI agent.

Unlike mainstream approaches that build AI agents around large language models (LLMs), Pokee.ai centers on reinforcement learning (RL). Zhu explains, “In Pokee’s architecture, LLMs mainly serve as the interface for human-computer interaction, like a ‘UI layer,’ to understand user intent, while decision-making and task execution are driven by RL structures.”
Back in October last year, OpenAI hadn’t released GPT-4, RL wasn’t popular, and DeepSeek hadn’t yet sparked global excitement.
Most investors thought Pokee.ai’s approach was unrealistic, but now it’s gradually gaining recognition, and the company is entering a new phase.
Recently, Pokee.ai completed a $12 million seed round led by Point72 Ventures, and its public beta is now available.
This week, before the official launch, Jiqizhixin interviewed Zhu Zheqing about the company’s journey. Here is the conversation:
Jiqizhixin: Pokee.ai was founded in October last year, and now it’s launching its product and completing seed funding. Is this pace within your expectations?
Zhu Zheqing: The pace is quite fast. From October last year, when we founded Pokee.AI, to product validation, building a general agent framework, and now product testing and funding, it’s only been about 7 months. Our goal was to finish product testing and secure seed funding in the first half of this year, and we’ve achieved that, which is in line with expectations.
Compared to Meta, our pace is about 4-5 times faster. It hasn’t changed my life or work much, and I even have more time to think. When I was working at Meta and pursuing my PhD simultaneously, I was extremely busy—over 100 hours a week. Now, I still work over 100 hours, but I have more time for reflection.
Jiqizhixin: Your work is quite unique. What’s the most common question from investors?
Zhu Zheqing: When I first talked to investors, they didn’t understand why we wanted to do things differently with agents, especially since RL wasn’t popular and DeepSeek hadn’t emerged. When I told them “Our ultimate goal is to turn an RL system into something like a universal operating system,” they thought it was a fantasy.
Jiqizhixin: You mentioned that Pokee.ai’s goal isn’t just to mimic human task completion but to surpass human strategy and planning in some tasks. Is this related to the current buzz around ASI?
Zhu Zheqing: I think the definitions of ASI and AGI are quite fuzzy. In some sense, we might have already achieved ASI—if you give a model a 1 million token article, it takes humans a long time to read it, but the model can finish in seconds or tens of seconds. From this perspective, it’s already superhuman intelligence.
Jiqizhixin: How far are we from the “ChatGPT moment” for agents?
What characteristics should a general agent have?
Zhu Zheqing: The core ability of a general agent is that, regardless of the scenario or problem, as long as you give it a prompt, it can complete the task without pre-configuring tools. Our idea is that clients provide a prompt describing their needs, and Pokee can automatically call the appropriate tools to solve the problem, then send the results back to the client or developer, who can then present the results in a better way.
Jiqizhixin: Is this the “ChatGPT moment” for AI agents? What stage are you at?
Zhu Zheqing: Yes, this is the fully autonomous AI agent we envision. Currently, AI agents require extensive tool configuration, which is a big bottleneck. Our goal is to minimize manual setup, making it almost no-code or low-code.
Jiqizhixin: Why did you start the company—what limitations did you see?
Zhu Zheqing: We want to enable third-party developers to build AI agents with minimal or no development effort—either through no-code or low-code approaches. No-code means running a prompt directly to get a workflow, which can be reused across scenarios. Low-code involves passing prompts via our interface to solve problems without telling us what tools to use.
Jiqizhixin: What’s the difference between RL-based and LLM-based AI agents?
Zhu Zheqing: Many current LLMs also use RL, but our RL models differ in their action space. While LLMs’ action space is tokens, our RL models’ actions are tools, leveraging their generalization to build AI agents.
Jiqizhixin: You mentioned high prompt requirements. Not everyone asks questions, so how do you see this?
Zhu Zheqing: That’s true. Users often give prompts that don’t fully reflect their true intent. Understanding that intent—alignment—is very challenging because there’s no ground truth. It requires long-term memory and personalization to truly understand what users mean.
For example, when an investor asked us to draft a LinkedIn post, it’s unclear whether they want a draft to review or to post directly. We need to analyze their past interactions and responses to understand their real intent, which requires personalized memory.
Jiqizhixin: What stage are you at now?
Zhu Zheqing: The first step in the industry isn’t even complete, let alone the second or third.
Jiqizhixin: What should be done next?
Zhu Zheqing: I think it’s a meaningful and forward-looking question, but from a commercial perspective, the priority is solving the core problem first—then explore understanding and alignment.
Jiqizhixin: Your architecture uses small LLMs as the human-computer interface, like a “UI layer,” to understand user intent, while decision-making and task execution are based on RL. Does this mean prompt quality is crucial?
Zhu Zheqing: Yes, it’s much more complex. I always say, “The better the LLM, the better we can do.” Although our core is RL, not LLM, we’re not competing with LLMs. If language understanding doesn’t improve, we’ll hit a bottleneck and won’t fully understand user needs.
Long Journey of Entrepreneurship
Jiqizhixin: Since leaving Meta and starting your own venture, over half a year has passed. What’s the biggest difference between working and entrepreneurship?
Zhu Zheqing: The difference is huge. I’ve faced struggles, not because of time management or fatigue, but because entrepreneurship is a very fuzzy path—there’s no fixed route. You decide what kind of company you want to build, and that’s the path.
In big companies, you can make countless decisions or even switch teams if things don’t work out. But in startups, you have limited decision-making power and must be responsible for the company and team.
Jiqizhixin: Your team was four people until April or May this year. How many are there now? Are you planning to expand?
Zhu Zheqing: Currently, there are 7 people, and we plan to hire two or three more. But we probably won’t expand beyond 10 people before revenue scales up.
Jiqizhixin: So, AI startup is more “lightweight” now?
Zhu Zheqing: Yes, in the AI era, models and product development don’t require many people. More people can slow things down.
Jiqizhixin: Where is your office? What’s your daily work routine?
Zhu Zheqing: We don’t have an office.
Jiqizhixin: Do you work remotely?
Zhu Zheqing: All remote. Some team members are in Seattle, others in the Bay Area and Singapore. We’re all used to remote work, and it’s efficient. We hold daily online meetings to discuss and decide on tasks.