RWKVv7 CLI Inference (Burn Framework)

A port of the RWKV v7 language model (100% attention free), implemented in Rust with the Burn deep learning framework.

✨ Features

🔥 Pure Burn: built with Burn
🔤 Tokenizer support via rwkv-tokenizer
🧠 Supports stateful sequential or parallel generation or mixed generation
🧪 Sampling with top-k, temperature, and top-p
⚙️ Load RWKV v7 models from safetensors

Help wanted

Currently best results are achieved with RWKV v7 World 0.1B Model.

I appriciate any help investigating the observed performance drop with larger models (0.4B, 1.5B, ...).

🚀 Getting Started

Installation

git clone https://github.com/dymat/rwkv-burn.git
cd rwkv-burn
cargo build --release

Download and convert model weigths

Download the model weights from Huggingface: https://huggingface.co/BlinkDL/rwkv-7-world

Convert them into SafeTensors format:

pip3 install torch --index-url https://download.pytorch.org/whl/cpu
pip3 install safetensors
python weights_to_safetensors.py path/to/model_weights.pth

🧠 Usage

Download weights and convert to safetensors

wget https://huggingface.co/BlinkDL/rwkv-7-world/resolve/main/RWKV-x070-World-0.1B-v2.8-20241210-ctx4096.pth
python weights_to_safetensors.py RWKV-x070-World-0.1B-v2.8-20241210-ctx4096.pth

Run the CLI

cargo run --release -- \
  --weights /path/to/model_weights.pth.safetensors \
  --top_p 0.6
  --temperature 0.8

⚙️ CLI Options

Flag	Description	Default
`-l`, `--n_layer`	Number of RWKV layers	`12`
`-d`, `--d_model`	Embedding (hidden) size	`768`
`-H`, `--n_heads`	Number of attention heads	`12`
`-v`, `--vocab_size`	Vocabulary size	`65536`
`-w`, `--weights`	Path to `.safetensors` weight file	optional
`-t`, `--temperature`	Temperature for token sampling	`0.6`
`-p`, `--top_p`	Top-p sampling parameter	`0.6`
`-k`, `--top_k`	Top-k sampling parameter	`50`
`--inference_mode`	Parallel, RNN, Mixed	`Mixed`
`--tokenizer_vocab_file`	Path to tokenizer vocab file	`rwkv_vocab_v20230424.txt`

🛠️ Developer Notes

This project implements a minimal RWKV-v7 model inference pipeline using the Burn deep learning framework and rwkv-tokenizer for tokenization.

Structure

main.rs: Entry point with CLI parsing and REPL loop.
model: Implements the RWKVv7 model architecture using burn modules.
generator.rs: Handles text generation (sequential and parallel modes).
rwkv_vocab_v20230424.txt: Vocabulary file used by the tokenizer.
weights_to_safetensors.py: Optional script to convert pre-trained weights into .safetensors format.

Supported Features

🔄 RNN-style token-by-token generation
🔥 Top-k andTop-p sampling
🧠 Inference state caching across generations
🎛 CLI configuration for model size, heads, tokenizer, weights

🛤️ Roadmap

This project is under active development. Below are the planned features and improvements:

Short-term Goals

Basic RWKVv7 model implementation with burn framework
Sequential token-by-token text generation
CLI interface with configurable parameters
Integration with rwkv-tokenizer
Investigate poor performance on larger models (e.g. 0.4B, 1.5B)
Implement parallel generation mode
Improve support for additional backends (e.g. WGPU); Inference with WGPU seems numerically unstable and produces bad results

Mid-term Goals

Improve inference speed
Save and load model states (checkpointing)
Model quantization for faster inference and smaller memory footprint
Implement batch generation support
Improve sampling strategies (temperature annealing, beam search)

Long-term Goals

Full training support (fine-tuning on custom datasets, weight initialization)
Multi-GPU and distributed inference support
Model export to ONNX and interoperability with other frameworks
Web-based interface and API service for model inference

Ressources

RWKV original repository: https://github.com/BlinkDL/RWKV-LM
RWKV project website with many more ressources: https://www.rwkv.com/
RWKV v7 paper: https://arxiv.org/abs/2503.14456
Valuable re-implementation of RWKV models with focus on readability rather than performance: https://github.com/SmerkyG/RWKV_Explained

Community and Contribution

Contributions and feedback are welcome!
Feel free to open issues or submit pull requests for any feature requests or bug fixes.

Last updated: 2025-06-01

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Readme.md		Readme.md
rwkv_vocab_v20230424.txt		rwkv_vocab_v20230424.txt
weights_to_safetensors.py		weights_to_safetensors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RWKVv7 CLI Inference (Burn Framework)

✨ Features

Help wanted

🚀 Getting Started

Installation

Download and convert model weigths

🧠 Usage

Download weights and convert to safetensors

Run the CLI

⚙️ CLI Options

🛠️ Developer Notes

Structure

Supported Features

🛤️ Roadmap

Short-term Goals

Mid-term Goals

Long-term Goals

Ressources

Community and Contribution

About

Uh oh!

Releases

Packages

Languages

License

dymat/rwkv-burn

Folders and files

Latest commit

History

Repository files navigation

RWKVv7 CLI Inference (Burn Framework)

✨ Features

Help wanted

🚀 Getting Started

Installation

Download and convert model weigths

🧠 Usage

Download weights and convert to safetensors

Run the CLI

⚙️ CLI Options

🛠️ Developer Notes

Structure

Supported Features

🛤️ Roadmap

Short-term Goals

Mid-term Goals

Long-term Goals

Ressources

Community and Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages