A port of the RWKV v7 language model (100% attention free), implemented in Rust with the Burn deep learning framework.
- 🔥 Pure Burn: built with Burn
- 🔤 Tokenizer support via
rwkv-tokenizer
- 🧠 Supports stateful sequential or parallel generation or mixed generation
- 🧪 Sampling with top-k, temperature, and top-p
- ⚙️ Load RWKV v7 models from
safetensors
Currently best results are achieved with RWKV v7 World 0.1B Model.
I appriciate any help investigating the observed performance drop with larger models (0.4B, 1.5B, ...).
git clone https://github.com/dymat/rwkv-burn.git
cd rwkv-burn
cargo build --release
Download the model weights from Huggingface: https://huggingface.co/BlinkDL/rwkv-7-world
Convert them into SafeTensors format:
pip3 install torch --index-url https://download.pytorch.org/whl/cpu
pip3 install safetensors
python weights_to_safetensors.py path/to/model_weights.pth
wget https://huggingface.co/BlinkDL/rwkv-7-world/resolve/main/RWKV-x070-World-0.1B-v2.8-20241210-ctx4096.pth
python weights_to_safetensors.py RWKV-x070-World-0.1B-v2.8-20241210-ctx4096.pth
cargo run --release -- \
--weights /path/to/model_weights.pth.safetensors \
--top_p 0.6
--temperature 0.8
Flag | Description | Default |
---|---|---|
-l , --n_layer |
Number of RWKV layers | 12 |
-d , --d_model |
Embedding (hidden) size | 768 |
-H , --n_heads |
Number of attention heads | 12 |
-v , --vocab_size |
Vocabulary size | 65536 |
-w , --weights |
Path to .safetensors weight file |
optional |
-t , --temperature |
Temperature for token sampling | 0.6 |
-p , --top_p |
Top-p sampling parameter | 0.6 |
-k , --top_k |
Top-k sampling parameter | 50 |
--inference_mode |
Parallel, RNN, Mixed | Mixed |
--tokenizer_vocab_file |
Path to tokenizer vocab file | rwkv_vocab_v20230424.txt |
This project implements a minimal RWKV-v7 model inference pipeline using the Burn deep learning framework and rwkv-tokenizer for tokenization.
main.rs
: Entry point with CLI parsing and REPL loop.model
: Implements the RWKVv7 model architecture usingburn
modules.generator.rs
: Handles text generation (sequential and parallel modes).rwkv_vocab_v20230424.txt
: Vocabulary file used by the tokenizer.weights_to_safetensors.py
: Optional script to convert pre-trained weights into.safetensors
format.
- 🔄 RNN-style token-by-token generation
- 🔥 Top-k andTop-p sampling
- 🧠 Inference state caching across generations
- 🎛 CLI configuration for model size, heads, tokenizer, weights
This project is under active development. Below are the planned features and improvements:
- Basic RWKVv7 model implementation with
burn
framework - Sequential token-by-token text generation
- CLI interface with configurable parameters
- Integration with
rwkv-tokenizer
- Investigate poor performance on larger models (e.g. 0.4B, 1.5B)
- Implement parallel generation mode
- Improve support for additional backends (e.g. WGPU); Inference with WGPU seems numerically unstable and produces bad results
- Improve inference speed
- Save and load model states (checkpointing)
- Model quantization for faster inference and smaller memory footprint
- Implement batch generation support
- Improve sampling strategies (temperature annealing, beam search)
- Full training support (fine-tuning on custom datasets, weight initialization)
- Multi-GPU and distributed inference support
- Model export to ONNX and interoperability with other frameworks
- Web-based interface and API service for model inference
- RWKV original repository: https://github.com/BlinkDL/RWKV-LM
- RWKV project website with many more ressources: https://www.rwkv.com/
- RWKV v7 paper: https://arxiv.org/abs/2503.14456
- Valuable re-implementation of RWKV models with focus on readability rather than performance: https://github.com/SmerkyG/RWKV_Explained
Contributions and feedback are welcome!
Feel free to open issues or submit pull requests for any feature requests or bug fixes.
Last updated: 2025-06-01