Skip to content

Possible normalization issue with optimizer.critic in commit d0d9d14 #5583

@ShirelJosef

Description

@ShirelJosef

Describe the bug
I have a custom environment operating in large environment where some observations can be much larger than others (x1000), for example 1000 meters and 1 radians. Therefore, I use the observation normalization feature. When switching from release 14 to release 15 the PPO stopped learning at all.
I pinpoint the commit in which the learning stopped working and it is commit: d0d9d14
where "Move the Critic into the Optimizer".
I noticed that before the commit the actor AND the critic are normalized as can be seen here: https://github.com/Unity-Technologies/ml-agents/blob/5b8cbd2f902aea8550221a64592bfa016fe67bc8/ml-agents/mlagents/trainers/torch/networks.py#L600
however, after moving the critic into the optimizer, when doing the update and updating normalziation https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76 only the actor as normalized as can be seen here https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/policy/torch_policy.py#L123.

After I changed the code block in https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76 :

if self.is_training:
            self.policy.update_normalization(agent_buffer_trajectory)

to

if self.is_training:
            self.policy.update_normalization(agent_buffer_trajectory)
            self.optimizer.critic.update_normalization(agent_buffer_trajectory)

The environment started to learn and not collapse to zero reward.
I think this should be done in each trainer file!
I think there should be an environment where observations can be much larger than others (x1000), however, this is also a theoretical issue so I don't know if this is necessary.

To Reproduce
Not applicable

Console logs / stack traces
Not applicable

Screenshots
Not applicable

Environment (please complete the following information):

  • Unity Version: Unity 2019.4.31f
  • OS + version: ubuntu 18
  • Version information:
    ml-agents: 0.25.0.dev0,
    ml-agents-envs: 0.25.0.dev0,
    Communicator API: 1.5.0,
    PyTorch: 1.7.1+cu110
  • Environment: custom environment (But I believe this can be achieved with every environment if you intentionally scale the observation, multiply some by 1000, some by 50, some by 20000, and so...

Metadata

Metadata

Labels

bugIssue describes a potential bug in ml-agents.staleIssues that have been idle for a while. Automatically closed by a bot if idle for too long.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions