Skip to content

DreamLM/DreamOn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✏️ DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas

Static Badge Static Badge Static Badge Static Badge

Overview

DreamOn is a novel discrete diffusion algorithm designed to address the variable-length generation challenge in code infilling. Unlike current discrete diffusion language models, our approach enables dynamic expansion and contraction of mask tokens during inference, providing flexible length control without requiring predetermined canvas sizes.

This work is done as part of the HKU NLP Group and Klear Team@Kuaishou Technology.

News

[2025/7/25] We open-source our code for training and evaluation. We also release a demo for DreamOn on Hugging Face Spaces. You can try it out here.

[2025/7/15] We release our model DreamOn and its accompanying model DreamCoder.

Installation

Our implementation follows our previous work Dream. Please install transformers by pip install transformers==4.46.2 and torch==2.5.1 as Dream uses the SdpaAttention built in torch.

Quickstart

import torch
import time
from transformers import AutoModel, AutoTokenizer

def process_infilling_prompt(prefix, suffix, tokenizer, number_of_mask):
    prefix = [tokenizer.bos_token_id] + tokenizer.encode(prefix, add_special_tokens=False)
    middle = [tokenizer.mask_token_id] * number_of_mask
    suffix = tokenizer.encode(suffix, add_special_tokens=False) + [tokenizer.eos_token_id]
    return prefix + middle + suffix

prefix = '''from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
'''

suffix = '''        for idx2, elem2 in enumerate(numbers):
        if idx != idx2:
            distance = abs(elem - elem2)
            if distance < threshold:
                return True

return False
'''
model_path = 'Dream-org/DreamOn-v0-7B'
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

## Set the initial mask token length when processing the prompt
input_ids = process_infilling_prompt(prefix, suffix, tokenizer, 4)
input_ids = torch.LongTensor([input_ids]).to("cuda")

model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
model = model.to("cuda").eval()


output = model.diffusion_generate(
    input_ids,
    temperature=0.2,
    alg = 'entropy',
    alg_temp = 0,
    top_p = 0.9,
    max_new_tokens = 64, ## Set the maximum number of new tokens for infilling
    return_dict_in_generate = True,
    output_history = True,
    number_transfer_tokens = 1
)


history = output.history
for i, h in enumerate(history):
    print(f"########################")
    time.sleep(0.2)
    print(tokenizer.decode(h.tolist()), end="\n\n")   

Parameters

  • input_ids: The input token ids.
  • max_new_tokens: The maximum tokens to generate. Note that the context length (input+output) of Dream currently is 2048. And the mask added to the prompt is counted as new tokens. Therefore, max_new_tokens can not be set to a value smaller than the number of mask tokens in the prompt. For example, if you set number_of_mask to 4, then max_new_tokens should be at least 4.
  • output_history: Whether to return the output at each intermediate step.
  • return_dict_in_generate: The output format, mostly set to True.
  • number_of_transfer_tokens: The number of tokens to predict at each denoising step. We mainly test our model with number_of_transfer_tokens set to 1. Other settings are not fully tested.
  • temperature: The value used to module the next token probabilities. By default 0.0. The smaller the value, the more accurate the results (e.g., in math or coding). The larger the value, the more diverse the results (e.g., in general conversation). If you notice repeated results, you might consider increasing the temperature.
  • top_p: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. By default None. Control the diversity of generation.
  • top_k: The number of highest probability vocabulary tokens to keep for top-k-filtering. By default None. Control the diversity of generation.
  • alg: The remasking strategy in diffusion sampling, controlling the token generation order. Support one random strategy and three confidence-based strategies:
  • alg_temp: Add some randomness to alg when using confidence-based strategies. By default None.

Note: We currently do not support attention mask, as we recompute attention mask each denoising step to support variable-length generation.

Evaluation

Use the following command to replicate our results.

git clone https://github.com/openai/human-eval-infilling
pip install -e human-eval-infilling
pip install omegaconf
bash eval/eval_humaneval_infilling.sh
bash eval/eval_santa_fim.sh

Training

Our training implementation is built upon the SFT trainer from verl. To train DreamOn, please install verl first, and then execute the following command:

python data/prepare_data.py
bash run_dreamon.sh

Citation

@misc{Dreamon2025,
    title = {DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas},
    url = {https://hkunlp.github.io/blog/2025/dreamon},
    author = {Wu, Zirui and Zheng, Lin and Xie, Zhihui and Ye, Jiacheng and Gao, Jiahui and Feng, Yansong and Li, Zhenguo and W., Victoria and Zhou, Guorui  and Kong, Lingpeng},
    year = {2025}
}

About

Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published