Skip to content

Conversation

Kyu3224
Copy link

@Kyu3224 Kyu3224 commented Jun 25, 2025

Summary

This PR improves the mini-batch generator in the reinforcement learning training pipeline by introducing the reshuffle_each_epoch parameter. This parameter controls whether data indices are reshuffled at each epoch or kept fixed.

Motivation

In reinforcement learning, especially in PPO-style policy optimization, shuffling the training data indices at each epoch can improve generalization and reduce correlation between samples. However, some use cases require fixed mini-batch ordering across epochs to enable reproducible experiments and debugging. This PR introduces an explicit toggle to support both workflows.

Details

  • The reshuffle_each_epoch flag defaults to False to maintain deterministic iteration over mini-batches across epochs.
  • When reshuffle_each_epoch=True, mini-batch indices are reshuffled at the start of every epoch, enabling new mini-batch orders per epoch and improved generalization.

References

The reshuffle_each_epoch argument implemented here serves a role analogous to the shuffle parameter in PyTorch's torch.utils.data.DataLoader. Setting shuffle=True causes the data sampler to reshuffle dataset indices at the start of each epoch, which helps reduce model overfitting and improves generalization.

Similarly, this PR's reshuffle_each_epoch flag controls whether mini-batch indices are reshuffled every epoch (True), or fixed after the initial shuffle (False), providing flexibility in how training data is fed during reinforcement learning updates.

Testing

  • Verified that reshuffle_each_epoch=True produces new mini-batch orders per epoch.
  • Verified that reshuffle_each_epoch=False preserves mini-batch order across epochs.
  • All existing unit tests and pre-commit hooks pass successfully.

Please review and advise if further adjustments are necessary.

@ClemensSchwarke
Copy link
Collaborator

Hi @Kyu3224,
Thanks a lot for your PR! Do you have evidence that this improves performance by any chance?

@Kyu3224
Copy link
Author

Kyu3224 commented Jul 18, 2025

Thank you for your feedback. While I do not have the strong supporting evidence ready at this moment, I plan to prepare and share the relevant data within the next two weeks.

@ClemensSchwarke
Copy link
Collaborator

Awesome, looking forward to that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants