generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
Description
Feature request
Filter most easy and hard samples can increased the diversity of examples, which is beneficial for fine-tuning performance.
trl can provide a custom function between generation and computing per_token_logps for user to evaluate the quality of prompt, to determinate whether current sample be used in next step or not.
Motivation
- dynamic sampling from DAPO, page 5, 3.2 section.
- Rollout Rescue Mechanism and Intra-Batch Informative Substitution in Polaris
Your contribution
I'm pleased to contribute this feature
konstantinjdobler, persuedream and evehsu