Skip to content

Commit ad2680e

Browse files
authored
Set ignore done=False in GAIL (#4971)
1 parent d2c5697 commit ad2680e

17 files changed

+116
-59
lines changed

com.unity.ml-agents/CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ and this project adheres to
1414
### Minor Changes
1515
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
1616
#### ml-agents / ml-agents-envs / gym-unity (Python)
17-
17+
- The `encoding_size` setting for RewardSignals has been deprecated. Please use `network_settings` instead. (#4982)
1818
### Bug Fixes
1919
#### com.unity.ml-agents (C#)
2020
#### ml-agents / ml-agents-envs / gym-unity (Python)
21-
21+
- An issue that caused `GAIL` to fail for environments where agents can terminate episodes by self-sacrifice has been fixed. (#4971)
2222

2323
## [1.8.0-preview] - 2021-02-17
2424
### Major Changes

config/imitation/CrawlerStatic.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,11 @@ behaviors:
1919
gail:
2020
gamma: 0.99
2121
strength: 1.0
22-
encoding_size: 128
22+
network_settings:
23+
normalize: true
24+
hidden_units: 128
25+
num_layers: 2
26+
vis_encode_type: simple
2327
learning_rate: 0.0003
2428
use_actions: false
2529
use_vail: false

config/imitation/FoodCollector.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,11 @@ behaviors:
1919
gail:
2020
gamma: 0.99
2121
strength: 0.1
22-
encoding_size: 128
22+
network_settings:
23+
normalize: false
24+
hidden_units: 128
25+
num_layers: 2
26+
vis_encode_type: simple
2327
learning_rate: 0.0003
2428
use_actions: false
2529
use_vail: false

config/imitation/Hallway.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ behaviors:
2424
strength: 1.0
2525
gail:
2626
gamma: 0.99
27-
strength: 0.1
28-
encoding_size: 128
27+
strength: 0.01
2928
learning_rate: 0.0003
3029
use_actions: false
3130
use_vail: false

config/imitation/PushBlock.yaml

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,28 @@ behaviors:
1616
num_layers: 2
1717
vis_encode_type: simple
1818
reward_signals:
19-
gail:
19+
extrinsic:
2020
gamma: 0.99
2121
strength: 1.0
22-
encoding_size: 128
22+
gail:
23+
gamma: 0.99
24+
strength: 0.01
25+
network_settings:
26+
normalize: false
27+
hidden_units: 128
28+
num_layers: 2
29+
vis_encode_type: simple
2330
learning_rate: 0.0003
2431
use_actions: false
2532
use_vail: false
2633
demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
2734
keep_checkpoints: 5
28-
max_steps: 15000000
35+
max_steps: 1000000
2936
time_horizon: 64
3037
summary_freq: 60000
3138
threaded: true
39+
behavioral_cloning:
40+
demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
41+
steps: 50000
42+
strength: 1.0
43+
samples_per_update: 0

config/imitation/Pyramids.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ behaviors:
2222
curiosity:
2323
strength: 0.02
2424
gamma: 0.99
25-
encoding_size: 256
25+
network_settings:
26+
hidden_units: 256
2627
gail:
2728
strength: 0.01
2829
gamma: 0.99
29-
encoding_size: 128
3030
demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
3131
behavioral_cloning:
3232
demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo

config/ppo/Pyramids.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ behaviors:
2222
curiosity:
2323
gamma: 0.99
2424
strength: 0.02
25-
encoding_size: 256
25+
network_settings:
26+
hidden_units: 256
2627
learning_rate: 0.0003
2728
keep_checkpoints: 5
2829
max_steps: 10000000

config/ppo/PyramidsRND.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ behaviors:
2222
rnd:
2323
gamma: 0.99
2424
strength: 0.01
25-
encoding_size: 64
25+
network_settings:
26+
hidden_units: 64
2627
learning_rate: 0.0001
2728
keep_checkpoints: 5
2829
max_steps: 3000000
2930
time_horizon: 128
3031
summary_freq: 30000
31-
framework: pytorch
3232
threaded: true

config/ppo/VisualPyramids.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ behaviors:
2222
curiosity:
2323
gamma: 0.99
2424
strength: 0.01
25-
encoding_size: 256
25+
network_settings:
26+
hidden_units: 256
2627
learning_rate: 0.0003
2728
keep_checkpoints: 5
2829
max_steps: 10000000

config/sac/Pyramids.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ behaviors:
2424
gail:
2525
gamma: 0.99
2626
strength: 0.01
27-
encoding_size: 128
2827
learning_rate: 0.0003
2928
use_actions: true
3029
use_vail: false

0 commit comments

Comments
 (0)