-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Describe the bug
I have a custom environment operating in large environment where some observations can be much larger than others (x1000), for example 1000 meters and 1 radians. Therefore, I use the observation normalization feature. When switching from release 14 to release 15 the PPO stopped learning at all.
I pinpoint the commit in which the learning stopped working and it is commit: d0d9d14
where "Move the Critic into the Optimizer".
I noticed that before the commit the actor AND the critic are normalized as can be seen here: https://github.com/Unity-Technologies/ml-agents/blob/5b8cbd2f902aea8550221a64592bfa016fe67bc8/ml-agents/mlagents/trainers/torch/networks.py#L600
however, after moving the critic into the optimizer, when doing the update and updating normalziation https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76 only the actor as normalized as can be seen here https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/policy/torch_policy.py#L123.
After I changed the code block in https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76 :
if self.is_training:
self.policy.update_normalization(agent_buffer_trajectory)
to
if self.is_training:
self.policy.update_normalization(agent_buffer_trajectory)
self.optimizer.critic.update_normalization(agent_buffer_trajectory)
The environment started to learn and not collapse to zero reward.
I think this should be done in each trainer file!
I think there should be an environment where observations can be much larger than others (x1000), however, this is also a theoretical issue so I don't know if this is necessary.
To Reproduce
Not applicable
Console logs / stack traces
Not applicable
Screenshots
Not applicable
Environment (please complete the following information):
- Unity Version: Unity 2019.4.31f
- OS + version: ubuntu 18
- Version information:
ml-agents: 0.25.0.dev0,
ml-agents-envs: 0.25.0.dev0,
Communicator API: 1.5.0,
PyTorch: 1.7.1+cu110 - Environment: custom environment (But I believe this can be achieved with every environment if you intentionally scale the observation, multiply some by 1000, some by 50, some by 20000, and so...