Skip to content

This reposity contains the source code of the ACL'25 paper "Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models". Paper abstract: "The generation of undesirable and factually incorrect content of large language models poses a significant challenge and remains largely an unsolved issue. This pap...

License

Notifications You must be signed in to change notification settings

SAP-samples/acl2025-contrastive-perplexity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

made-with-python License arXiv REUSE status

News

  • 05/30/2025: 🎊 Initial repo created 🎉

Description

This repository will contain the source code for our paper Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models to be presented at ACL2025.

Abstract

The generation of toxic content by large language models (LLMs) remains a critical challenge for the safe deployment of language technology. We propose a novel framework for implicit knowledge editing and controlled text generation by fine-tuning LLMs with a prototype-based contrastive perplexity objective. Central to our method is the construction of hard negatives—toxic outputs that are generated through adversarial paraphrasing to be semantically similar and closely matched in length and model probability to their non-toxic counterparts. By training on these challenging and realistic pairs, our approach ensures robust and stable contrastive optimization. Experimental results in the domain of detoxification demonstrate that our method significantly reduces toxic generation while maintaining strong performance on downstream tasks such as commonsense reasoning and reading comprehension. Our findings highlight the effectiveness of leveraging hard negatives for attribute-aware language model fine-tuning.

Authors:

Requirements

Citations

If you use this code in your research or want to refer to our work, please cite:

@inproceedings{klein-nabi-2025-contrastive-perplexity,
    title = "Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models",
    author = "Klein, Tassilo  and
      Nabi, Moin",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",

    abstract = "This paper presents miCSE, a mutual information-based contrastive learning framework that significantly advances the state-of-the-art in few-shot sentence embedding.The proposed approach imposes alignment between the attention pattern of different views during contrastive learning. Learning sentence embeddings with miCSE entails enforcing the structural consistency across augmented views for every sentence, making contrastive self-supervised learning more sample efficient. As a result, the proposed approach shows strong performance in the few-shot learning domain. While it achieves superior results compared to state-of-the-art methods on multiple benchmarks in few-shot learning, it is comparable in the full-shot scenario. This study opens up avenues for efficient self-supervised learning methods that are more robust than current contrastive methods for sentence embedding.",
}

How to obtain support

Create an issue in this repository if you find a bug or have questions about the content.

For additional support, ask a question in SAP Community.

Contributing

If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses the standard DCO text of the Linux Foundation.

License

Copyright (c) 2025 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.

About

This reposity contains the source code of the ACL'25 paper "Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models". Paper abstract: "The generation of undesirable and factually incorrect content of large language models poses a significant challenge and remains largely an unsolved issue. This pap...

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •