Regarding the save/load of LR schedulers #3355

lucianyao · 2025-07-05T16:24:39Z

lucianyao
Jul 5, 2025

Hi,

Question 1

I noticed that when saving and loading learning rate schedulers, only the mutable state (e.g., current_iter) is persisted. For example:

pub struct CosineAnnealingLrScheduler {
    min_lr: LearningRate,
    max_lr: LearningRate,
    num_iters: usize,
    current_iter: usize,
}

The current implementation of to_record and load_record only serializes current_iter:

fn to_record<B: Backend>(&self) -> Self::Record<B> {
    self.current_iter
}

fn load_record<B: Backend>(mut self, record: Self::Record<B>) -> Self {
    self.current_iter = record;
    self
}

This works, but I wonder if it might be more robust and self-contained to serialize the full scheduler state, including configuration fields like min_lr, max_lr, and num_iters. This would:

Self-descriptive

Make checkpoints fully self-descriptive
Simplify logging, debugging, and model reproducibility

Avoid dependency

Avoid dependency on external configurations when resuming
Prevent potential inconsistencies during load

Question 2

Would it be possible (or desirable) to use the serde library directly for saving and loading scheduler states, instead of manually defining Record types and implementing to_record / load_record?

This could significantly reduce boilerplate and improve flexibility.

Thanks!
-Lucian

laggui · 2025-07-10T16:08:16Z

laggui
Jul 10, 2025
Maintainer

Hey, great points/questions! Let me break it down for both.

Question 1

As you mentioned, the current approach separates the concerns. Configuration (min_lr, max_lr, num_iters) is treated as immutable hyperparameters, while state (current_iter) is the actual state (evolving). I can see the benefit of a self-contained record, but I also think it might lead to conflicting state if configs differ between training runs and the checkpointed values are restored.

How should we handle a discrepancy if the config does not match the loaded values? Seems like a similar problem, and in such a trade-off I might prefer the current separation of concerns. Definitely opened to suggestions!

Question 2

The LrScheduler was initially added to be quite similar to the Optimizer and Module trait such as it defines a Record associative type to save and load its state. This also ties in with the checkpointers usage to save records. This additional flexibility also allows users to define what the state record represents, so that only this is serialized/deserialized. It might be a little overkill for the current state of schedulers, but I think the additional flexibility helps. And it follows burn's record system.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding the save/load of LR schedulers #3355

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Regarding the save/load of LR schedulers #3355

Uh oh!

lucianyao Jul 5, 2025

Question 1

Self-descriptive

Avoid dependency

Question 2

Replies: 1 comment

Uh oh!

laggui Jul 10, 2025 Maintainer

lucianyao
Jul 5, 2025

laggui
Jul 10, 2025
Maintainer