By Insights Team in AI — 28 Jul 2025

EdgeAIGC Boosts Cache Hit Rate by 41% and Reduces AIGC Latency and Cost: Developed by Tsinghua, Guangzhou University, and Inspur}

Research teams from Tsinghua, Guangzhou University, and Inspur developed EdgeAIGC, significantly improving cache hit rates and reducing latency and costs in edge AI content generation.

Editor丨&

In 2024, AI tools like ChatGPT, Sora, and Stable Diffusion have driven unprecedented computational demands, with inference requests soaring globally and high latency in cloud communications.

Is there a way to optimize AIGC model computation while maintaining service quality? The answer from a joint team of Guangzhou University, Tsinghua, and Inspur is—by slicing large models into "lego-like" modules and dynamically assembling them with the TD3 algorithm in real-time.

This research aims to reduce response time and costs under limited edge storage, bandwidth, and computing resources.

The study, titled EdgeAIGC: Model Caching and Resource Allocation for Edge Artificial Intelligence Generated Content, was published on July 4, 2025, in Digital Communications and Networks.

EdgeAIGC Framework

The framework consists of a cloud service center, an edge service layer, and a user layer.

The cloud center includes a powerful cloud server (CS) with extensive computing and storage, hosting pre-trained AIGC models like text-to-speech and text-to-text, capable of handling all user inference requests but at higher time and cost.

Network architecture features nodes labeled 1+E+U, with CS and edge servers (ES) equipped with A800 GPUs. The goal is to optimize average response time and operational costs for all requests.

Once the framework is established, TD3 determines what to cache and who to serve.

TD3 Algorithm

In resource allocation, actions like bandwidth and compute distribution are continuous variables. TD3 effectively handles continuous action spaces and high-dimensional states, learning optimal strategies for caching and resource distribution with stable updates via delayed policy networks.

The problem P is modeled as a Markov Decision Process (MDP), involving state space, action space, and reward signals.

Illustration: TD3 architecture

The architecture includes six neural networks, with Actor and Critic networks mitigating overestimation of Q-values, and delayed updates ensuring stability.

During training, Actor outputs can fluctuate wildly, making convergence difficult. TD3's double critics and delayed updates significantly improve learning efficiency and stability, outperforming DDPG by about 1.72% in reward optimization.

As user numbers grow, cache hit rates increase. TD3 consistently outperforms benchmarks, with maximum improvements of 41.06%, 50.93%, and 57.85% over DDPG, GCRAS, and PCRAS respectively.

Summary

This is a joint optimization framework for edge AI model caching and resource allocation based on TD3. It constructs an EdgeAIGC network, coordinating model caching, bandwidth, and compute resources, with at least 41.06% improvement in cache hit rate, paving new paths for edge computing and AIGC integration.

Subscribe to QQ Insights