Strata-Sword: A Hierarchical Safety Evaluation towards LLMs based on Reasoning Complexity of Jailbreak Instructions

Strata-Sword Strata-Sword is a multi-level safety evaluation benchmark proposed by Alibaba AAIG team. It aims to more comprehensively assess models' safety capabilities when facing jailbreak instructions of varying reasoning complexity, helping model developers better understand each model's safety boundaries.

🤗 Hugging Face | 🤖 ModelScope | 📄 Arxiv

简体中文 | English

🧩 Our Approach — Strata-Sword

Core Contribution

Reasoning complexity as a safety evaluation dimension We define and quantify "reasoning complexity" as an evaluable safety dimension, and categorize harmful jailbreak instructions into three different tiers — basic instructions, simple reasoning, and complex reasoning — based on three key elements of reasoning complexity.
Tiered jailbreak evaluation dataset construction We classify 15 different jailbreak attack methods into 3 different levels according to reasoning complexity, and the dataset includes a total of 700 jailbreak prompts.
Language-specific jailbreak attack methods Strata-Sword also accounts for language characteristics, customizing attack methods for both Chinese and English, and for the first time introduces three Chinese-specific jailbreak attack methods: acrostic-poem attack, lantern-riddle attack, and Chinese-character decomposition attack.

Evaluation Results

We systematically evaluate 23 mainstream open-source and closed-source commercial large language models, characterizing models' safety capability boundaries from the perspective of reasoning complexity.

We also provide statistics for the 15 jailbreak attack methods used in Strata-Sword, evaluating each method's overall performance.

🚀 Quick Start

1. Environment installation: install the required dependencies

pip install -r requirements.txt

2. Test: run the Chinese and English jailbreak prompt sets for the three Strata-Sword levels

python strata_sword.py

📚 Citation

If you use Strata-Sword in your research, please cite the following paper:

@article{Strata-Sword,
  title={Strata-Sword: A Hierarchical Safety Evaluation towards LLMs based on Reasoning Complexity of Jailbreak Instructions},
  author={Zhao, Shiji and Duan, Ranjie and Liu, Jiexi and Jia, Xiaojun and Wang, Fengxiang and Wei, Cheng and Cheng, Ruoxi and Xie, Yong and Liu, Chang and Guo, Qing and Tao, Jialing and Chen, YueFeng and Xue, Hui and Wei, Xingxing},
  year={2025},
  url={https://github.com/Alibaba-AAIG/Strata-Sword}
}

🤝 Contribution

We welcome collaboration and discussion in the areas of security evaluation and alignment: Red-team work is continuous and ongoing; Strata-Sword will continue to release new versions in the future! We welcome contributions from more red-team developers for large models to brainstorm and continuously propose jailbreak attack methods to be added to subsequent Strata-Sword evaluation sets! In addition, feel free to submit Issues to report problems and engage in Discussions to share ideas!

📄 License

This project is licensed under the Apache 2.0 License.

🙏 Acknowledgments

We thank the open-source community and the researchers advancing AI safety.

Strata is part of Alibaba AAIG's commitment to responsible AI.

“The LLM is my oyster, which I with Strata-Sword will open.” 大模型是我的牡蛎，我将用六脉神剑打开它。

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
strata_sword		strata_sword
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
risk_rules.xlsx		risk_rules.xlsx
strata_sword.py		strata_sword.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Strata-Sword: A Hierarchical Safety Evaluation towards LLMs based on Reasoning Complexity of Jailbreak Instructions

🧩 Our Approach — Strata-Sword

Core Contribution

Evaluation Results

🚀 Quick Start

1. Environment installation: install the required dependencies

2. Test: run the Chinese and English jailbreak prompt sets for the three Strata-Sword levels

📚 Citation

🤝 Contribution

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Alibaba-AAIG/Strata-Sword

Folders and files

Latest commit

History

Repository files navigation

Strata-Sword: A Hierarchical Safety Evaluation towards LLMs based on Reasoning Complexity of Jailbreak Instructions

🧩 Our Approach — Strata-Sword

Core Contribution

Evaluation Results

🚀 Quick Start

1. Environment installation: install the required dependencies

2. Test: run the Chinese and English jailbreak prompt sets for the three Strata-Sword levels

📚 Citation

🤝 Contribution

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages