By Insights Team in AI — 30 Jul 2025

Qwen Updates Again at Midnight: Runs on 3090, 3B Activation Matches GPT-4o}

The latest Qwen model, activated on just 3 billion parameters, now runs on a 3090 GPU and rivals top closed-source models like GPT-4o, marking a significant efficiency and performance breakthrough.

Following the recent release of three large AI models, Qwen has once again been updated—specifically, the new version: Qwen3-30B-A3B-Instruct-2507.

This new model is a non-thinking mode version. Its highlight is that, with only 3 billion (30B) parameters activated, it demonstrates capabilities comparable to industry-leading closed-source models like Google’s Gemini 2.5-Flash (non-thinking mode) and OpenAI’s GPT-4o, marking a major breakthrough in model efficiency and performance optimization.

The performance data shown below indicates that, compared to previous versions, the new model has achieved significant improvements across multiple benchmarks, such as AIME25 rising from 21.6 to 61.3, and Arena-Hard v2 from 24.8 to 69.0.

The next image compares the new version with models like DeepSeek-V3-0324, showing that in many benchmarks, the new model can match or even surpass DeepSeek-V3-0324.

This highlights the rapid speed of the model’s computational efficiency improvements.

Specifically, Qwen3-30B-A3B-Instruct-2507 has achieved key enhancements in several areas:

Significant improvement in general capabilities including instruction following, logical reasoning, text understanding, mathematics, science, programming, and tool use;
Remarkable progress in long-tail multilingual knowledge coverage;
Better alignment with user preferences in subjective and open-ended tasks, generating higher quality responses;
Extended long-text understanding up to 256K tokens.

Now, the model is open-sourced on platforms like Modao Community and HuggingFace. You can also experience it directly on QwenChat.

Experience link: http://chat.qwen.ai/

The release has quickly gained community support, with more usage channels and even quantized versions. This exemplifies the power of open source.

This new version allows everyone to run AI models on consumer-grade GPUs, such as the RTX 3090, opening new possibilities for AI deployment.

Some users have shared their experience running this new version on Mac and PCs equipped with RTX 3090.

Additional images show the model’s performance on various hardware setups, demonstrating its accessibility and efficiency.

If you want to run this model yourself, here are the configuration requirements:

It’s important to note that this new version is a non-inference model. Developer Simon Willison compared it with his previous testing of inference models like GLM-4.5 Air, concluding that inference ability might be crucial for tasks like generating complex, ready-to-use code.

Once again, the Qwen team’s update was done late at night, catching the attention of peers. Every morning, seeing AI capabilities advance is truly exciting.

Subscribe to QQ Insights