Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

News

05/30/2025: 🎊 Initial repo created 🎉

Description

This repository will contain the source code for our paper Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models to be presented at ACL2025.

Abstract

The generation of toxic content by large language models (LLMs) remains a critical challenge for the safe deployment of language technology. We propose a novel framework for implicit knowledge editing and controlled text generation by fine-tuning LLMs with a prototype-based contrastive perplexity objective. Central to our method is the construction of hard negatives—toxic outputs that are generated through adversarial paraphrasing to be semantically similar and closely matched in length and model probability to their non-toxic counterparts. By training on these challenging and realistic pairs, our approach ensures robust and stable contrastive optimization. Experimental results in the domain of detoxification demonstrate that our method significantly reduces toxic generation while maintaining strong performance on downstream tasks such as commonsense reasoning and reading comprehension. Our findings highlight the effectiveness of leveraging hard negatives for attribute-aware language model fine-tuning.

Authors:

Requirements

Python (version 3.6 or later)
PyTorch
HuggingFace Transformers

Citations

If you use this code in your research or want to refer to our work, please cite:

@inproceedings{klein-nabi-2025-contrastive-perplexity,
    title = "Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models",
    author = "Klein, Tassilo  and
      Nabi, Moin",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",

    abstract = "This paper presents miCSE, a mutual information-based contrastive learning framework that significantly advances the state-of-the-art in few-shot sentence embedding.The proposed approach imposes alignment between the attention pattern of different views during contrastive learning. Learning sentence embeddings with miCSE entails enforcing the structural consistency across augmented views for every sentence, making contrastive self-supervised learning more sample efficient. As a result, the proposed approach shows strong performance in the few-shot learning domain. While it achieves superior results compared to state-of-the-art methods on multiple benchmarks in few-shot learning, it is comparable in the full-shot scenario. This study opens up avenues for efficient self-supervised learning methods that are more robust than current contrastive methods for sentence embedding.",
}

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

For additional support, ask a question in SAP Community.

Contributing

If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSES		LICENSES
LICENSE		LICENSE
README.md		README.md
REUSE.toml		REUSE.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

News

Description

Abstract

Authors:

Requirements

Citations

How to obtain support

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

SAP-samples/acl2025-contrastive-perplexity

Folders and files

Latest commit

History

Repository files navigation

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

News

Description

Abstract

Authors:

Requirements

Citations

How to obtain support

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages