[Feature Request] support dynamic sampling for GRPO trainer

### Feature request

Filter most easy and hard samples can increased the diversity of examples, which is beneficial for fine-tuning performance. 

trl can provide a custom function between generation and computing per_token_logps for user to evaluate the quality of prompt, to determinate whether current sample be used in next step or not. 


### Motivation

- dynamic sampling from [DAPO](https://arxiv.org/abs/2503.14476), page 5, 3.2 section. 
- Rollout Rescue Mechanism and Intra-Batch Informative Substitution in [Polaris](https://hkunlp.github.io/blog/2025/Polaris/)

### Your contribution

I'm pleased to contribute this feature 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] support dynamic sampling for GRPO trainer #3708

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] support dynamic sampling for GRPO trainer #3708

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions