By Insights Team in AI — 24 Jul 2025

How to Achieve Verifiable Agentic Workflows? MermaidFlow Opens a New Paradigm for Safe and Robust Intelligent Agent Processes}

MermaidFlow introduces a structured, verifiable approach to agentic workflows, overcoming traditional challenges and enabling safe, scalable, and transparent multi-agent system deployment.

With continuous breakthroughs and rapid development of large language models, AI agents are evolving from single capabilities to complex system collaboration. Multi-Agent Systems (MAS) have become a new frontier in academia and industry. In this context, "Agentic Workflow" as a concept for autonomous decision-making and automatic process generation is gaining attention in MAS research and applications.

Leading teams from Google, Shanghai AI Lab, and others have launched innovative Agentic Workflow systems like Meta-GPT, ADAS, AFlow, promoting automation in task planning, division of labor, and process optimization using large models.

Despite their flexibility, these systems face significant challenges such as lack of rationality, poor verifiability, and difficulty in intuitive expression, which hinder reliable deployment and scaling of MAS.

Recently, a collaborative effort by CFAR at A*STAR in Singapore and Nanyang Technological University introduced an innovative workflow framework "MermaidFlow", advancing MAS towards a structured, safe, and verifiable new paradigm.

Paper link: https://arxiv.org/pdf/2505.22967
Open-source code: https://github.com/chengqiArchy/MermaidFlow

Breaking the Bottleneck: Replacing Scripts with Structured Workflow Expressions

Traditional Challenges: Imperative Scripts Cause Workflow Failures

Current MAS often output workflows as Python scripts or JSON trees, which tightly couple planning and implementation, leading to three core issues:

Opaque Structure: Workflow architecture is hidden in complex code, making it hard to understand and control globally.
Hard to Verify Rationality: Logic and implementation are tightly coupled, lacking static checks and automatic validation, risking hidden flaws.
Difficult Debugging and Optimization: Errors only surface during execution, making troubleshooting and refinement inefficient.

MermaidFlow: Leading Structured and Verifiable Workflow Expression

Based on Mermaid, a declarative graph language, MermaidFlow proposes a new workflow expression mechanism. Instead of generating executable scripts, it models agent behaviors explicitly as structured flowcharts with formal semantics, ensuring clarity, traceability, and verifiability.

Compared to traditional Python/JSON workflows, Mermaid-based expressions feature:

Clear Visual Structure: Each agent, dependency, and data flow is represented as nodes and edges, making the entire workflow transparent and interactive.
Embedded Validation: Semantic constraints (like dependency loops, role consistency, input/output matching) support static structural checks and consistency verification during generation.
Support for Evolution and Debugging: Structured flowcharts facilitate fragment replacement, incremental repair, and version comparison, enabling controlled evolution.

^{Figure 1: MermaidFlow’s end-to-end workflow expression loop from structured diagram to verifiable execution. The left shows declarative Mermaid workflow expression with clear dependencies and human readability, illustrating node and connection clarity.}

Using MermaidFlow’s structured diagram approach, MAS planning is no longer a fragile black box but a transparent “white box” with clear structure, visual nodes, and verifiable semantics. This greatly enhances explainability, verifiability, and operability for subsequent evolution, laying a solid foundation for large-scale deployment.

💡 The authors found that large language models naturally excel at generating Mermaid language, making MermaidFlow’s integration with LLMs particularly seamless and powerful🧠✨

Safety-Oriented Evolution Strategy in MermaidFlow: Self-Upgrade of Workflows

MermaidFlow models MAS explicitly using Mermaid language, turning each task node, data dependency, and execution order into visual, parseable, and operable semantic units. Compared to traditional imperative scripts, this structured approach is modular, supporting node insertion, deletion, and replacement, naturally fitting graph-level optimization.

Thanks to static verification mechanisms (such as type matching, dependency loops, role consistency), each generated workflow candidate undergoes structure compliance checks during evolution, filtering out semantically incomplete or risky graphs. This “pre-validation + post-optimization” strategy significantly improves search robustness and avoids invalid exploration paths.

^{Figure 2: Overview of MermaidFlow’s safe evolutionary optimization process. The system starts from structured Mermaid diagrams and uses safety-aware evolutionary algorithms to optimize workflows across type, structure, and static verifiability.}

Experimental Performance

MermaidFlow no longer depends on highly capable LLMs to generate high-quality workflows. It demonstrates excellent performance on mainstream datasets like GSM8K, MATH, HumanEval, and MBPP, showing strong practical value. More importantly, thanks to structured expression and static verification, MermaidFlow achieves over 90% success in generating executable, structurally sound workflows during evolution, greatly enhancing control and robustness compared to traditional script-based methods, supporting reliable deployment of agent systems.

^{Figure 3: Evaluation results of MermaidFlow on mainstream tasks.}

The following diagram shows an example of MermaidFlow’s evolutionary process under structured representation. Thanks to explicit semantic boundaries for each node and connection, the system can conveniently and safely perform local fragment replacements, reorganization, and evolution operations (such as crossover, node replacement, edge adjustment). The diagram illustrates how Workflow 5 and Workflow 4 undergo crossover to produce a more robust Workflow 8, incorporating better ensemble and testing modules. This controllable evolution mechanism significantly improves safety, control, and maintainability of workflow generation.

^{Figure 4: Flexible workflow evolution and synthesis process in MermaidFlow.}

Conclusion

As multi-agent systems and large models continue to evolve, how to realize structured, verifiable, and efficiently evolving workflows has become a key research challenge. MermaidFlow’s structured, verifiable workflow expression provides a foundational support for efficient, controllable agent collaboration. In the future, AI collaboration may require such “visible, traceable, evolvable” process bases. With expanding application fields and engineering deployment, this framework is expected to offer valuable insights for the ongoing development of intelligent agent ecosystems.

Subscribe to QQ Insights