Ruoming Pang Hands Over Apple’s 2025 Foundation Model Tech Report to Zhifeng Chen as Apple Launches New AI Era}
Apple releases its 2025 foundational AI model report, highlighting multi-language, multi-modal models, and new architectures, with leadership transition from Ruoming Pang to Zhifeng Chen.

Apple Intelligence enters a new chapter.
Recently, Apple published the 2025 Apple Intelligence Foundation Language Model Technology Report.
Former Apple AI head Ruoming Pang, now at Meta, shared multiple tweets introducing the report.

In the report, Apple details the data, architecture, training schemes, optimization inference techniques, and evaluation results of their next-generation models. It emphasizes how Apple enhances user value while expanding functionalities and optimizing quality, significantly improving device and private cloud efficiency.

Link to the report: https://machinelearning.apple.com/research/apple-foundation-models-tech-report-2025
Apple introduced two multilingual, multimodal foundation models supporting Apple devices and services. These include:
- 1) A 3B parameter device model optimized for Apple chips with KV cache sharing and 2-bit quantization-aware training.
- 2) A scalable cloud model combining new parallel track mixture-of-experts (PT-MoE) transformer and interleaved global-local attention for efficient inference on Apple’s private cloud platform.
Both models are trained on large-scale multilingual, multimodal data via responsible web crawling, licensed corpora, and high-quality synthetic datasets, further refined through supervised fine-tuning and reinforcement learning. They support multiple new languages, image understanding, and tool invocation.

PT-MoE architecture diagram. Each track consists of multiple track blocks, each containing a fixed number of Transformer/MoE layers. With total layers L and track depth D, synchronization overhead reduces from 2L (tensor parallelism) to L/D (track parallelism). For example, D=4 reduces overhead by 87.5%.
PT Transformer is a novel architecture proposed by Apple researchers. Unlike standard decoder-only Transformers, it divides the model into multiple small Transformers called tracks, each with several track blocks. These blocks process tokens independently, only synchronizing at boundaries, enabling direct parallel execution and reducing synchronization costs, thus improving training and inference latency without sacrificing model quality.
To further scale server-side models, Apple introduced a mixture-of-experts (MoE) layer within each track block, forming the PT-MoE architecture. Experts in each MoE layer operate only within their track, overlapping communication with computation, which enhances training efficiency. This design allows high-efficiency, low-latency, scalable models thanks to sparsity and track-level independence.
Additionally, Apple developed a visual encoder to extract visual features from input images, pre-trained on large image datasets. The encoder includes a visual backbone (standard ViT-g with 1 billion parameters and ViTDet-L for efficiency) and a visual-language adapter to align visual features with model tokens. The device-side ViTDet uses window attention and a novel registered window (RW) mechanism to capture local and global features effectively.
Apple believes that combining on-device and cloud models can meet diverse performance and deployment needs. The optimized device models enable low-latency inference with minimal resources, while cloud models handle complex tasks with high accuracy and scalability.
In benchmark tests, Apple’s models outperform comparable open-source models across languages, text, and visual modes, demonstrating strong competitiveness.


In the technical report, Apple also introduced the new Swift core foundation model framework, which integrates guided generation, constraint-based tool invocation, and LoRA adapters. Developers can implement these features with just a few lines of code.
This framework enables developers to leverage approximately 3 billion parameter device-side language models to build reliable, production-quality generative AI functionalities. As the core of Apple Intelligence, it excels in summarization, entity extraction, text understanding, optimization, short conversations, and creative content generation. Apple emphasizes that, although optimized for device models, it is not designed for general knowledge Q&A. Developers are encouraged to customize practical functions for their apps.
Apple states that the latest progress of Apple Intelligence models adheres to the principle of “Responsible AI,” with content filtering, regional customization, and privacy protections based on private cloud technology.
After the report release, Ruoming Pang thanked all contributors, including those involved in modeling, post-training, multimodal integration, frameworks/APIs, and project management. He also handed over the baton to Apple’s next AI leaders, Zhifeng Chen and Mengyu Li.

Previously, media reports indicated that after joining Meta, Pang’s team at Apple would be led by Zhifeng Chen, who graduated from Fudan University, earned his master’s and PhD from Princeton and Illinois, and previously worked at Google on TensorFlow, Gemini, neural machine translation, and Palm 2. Chen and Pang, along with Wu Yonghui, were early members of Google Brain.
