Possible normalization issue with optimizer.critic in commit d0d9d14

**Describe the bug**
I have a custom environment operating in large environment where some observations can be much larger than others (x1000), for example 1000 meters and 1 radians. Therefore, I use the observation normalization feature. When switching from release 14 to release 15 the PPO stopped learning at all.
I pinpoint the commit in which the learning stopped working and it is commit: d0d9d14afd76618aac2f09ad26d85da77d08b979
where "Move the Critic into the Optimizer".
I noticed that before the commit the actor AND the critic are normalized as can be seen here: [https://github.com/Unity-Technologies/ml-agents/blob/5b8cbd2f902aea8550221a64592bfa016fe67bc8/ml-agents/mlagents/trainers/torch/networks.py#L600](url)
however, after moving the critic into the optimizer, when doing the update and updating normalziation [https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76](url) only the actor as normalized as can be seen here [https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/policy/torch_policy.py#L123](url).

After I changed the code block in [https://github.com/Unity-Technologies/ml-agents/blob/65c1550cfaee89c980a7b9f722e8925363507834/ml-agents/mlagents/trainers/ppo/trainer.py#L76](url) :
```
if self.is_training:
            self.policy.update_normalization(agent_buffer_trajectory)
```
to
```
if self.is_training:
            self.policy.update_normalization(agent_buffer_trajectory)
            self.optimizer.critic.update_normalization(agent_buffer_trajectory)
```
The environment started to learn and not collapse to zero reward.
I think this should be done in each trainer file!
I think there should be an environment where observations can be much larger than others (x1000), however, this is also a theoretical issue so I don't know if this is necessary. 

**To Reproduce**
Not applicable

**Console logs / stack traces**
Not applicable

**Screenshots**
Not applicable

**Environment (please complete the following information):**
- Unity Version:  Unity 2019.4.31f
- OS + version: ubuntu 18
 - Version information:
  ml-agents: 0.25.0.dev0,
  ml-agents-envs: 0.25.0.dev0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.1+cu110
- _Environment_: custom environment (But I believe this can be achieved with every environment if you intentionally scale the observation, multiply some by 1000, some by 50, some by 20000, and so...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible normalization issue with optimizer.critic in commit d0d9d14 #5583

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible normalization issue with optimizer.critic in commit d0d9d14 #5583

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions