- Environment usage
- Environment Parameters
- Agents
- State and Action Spaces
- Reward Function
- Termination Rules
- Environment Dynamics
- State Space and Combinations
- Launched application
To run the simulation, use a bash script as follows:
./run.sh
Be careful, this command will do everything for you, but the code execution will take a very long time (up to several days on some PC configurations).
For the command to work you need to use it in the Git Bash terminal. You can find the required terminal in Visual Studio Code, on the terminal bar:
In order to quickly reproduce the results of numerical experiments follow these steps:
-
Install Python 3.12
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment
For Windows/Linux, go to the
env/Scripts
folder and run the script in the Powershell console./activate
If you use the Windows integrated command console instead of Powershell the activation will look like this:
.\activate
On a Mac, the procedure will be similar, but the directory you need will be at
env/bin
.After that, go back to the directory where the files
requirements.txt
andppo-scheduler.py
are located. -
Install the necessary dependencies:
python -m pip install -r requirements.txt
- Number of Agents (N): 28 agents participate in this environment.
- Number of Days (N_DAYS): The setup includes 14 days during which agents interact.
-
Number of Iterations (NUM_ITERS): Calculated by the formula
$(\frac{N^2}{N_{DAYS} \cdot C})$ , where$C = 4$ . This determines the total number of steps in an episode. - Moves:
- Move forward in time:
$0$ ; - Move back in time:
$1$ ; - Hold the position:
$2$ ; - Possible agent movements are encoded as
$( {0 \rightarrow +1, 1 \rightarrow -1, 2 \rightarrow 0} )$ .
-
Base Reward Parameter (b): Set to
$0.2$ .
Each agent is characterized by the following parameters:
- Name: Randomly generated from a list of popular names and surnames.
-
Urgency: Takes values from the set
$({1, 2, 3})$ . -
Completeness: Takes values from the set
$({0, 1})$ . -
Complexity: Takes values from the set
$({0, 1})$ . -
Position: Agent's position within the range
$({0, 1, ..., 13})$ . -
Coefficient (k): Calculated as
$\ k = (\text{complexity} + (1 - \text{completeness})) \times \text{urgency} $ . - Mutation Rate: Ranges from 0 to 1.
-
Observation space: Discrete space represented by a set of size 7
$((\mathbb{O} = Discrete(7)))$ . -
Action space: Also discrete, containing 3 possible actions
$((\mathbb{A} = Discrete(3)))$ .
Actions of each agent are denoted as
An agent's reward is determined by its position and the chosen action according to the following formula:
where
Environment termination occurs under the following conditions:
- If the number of iterations
$NUM MOVES$ reaches$NUM ITERS - 1$ and more than 80% of agents choose action$2$ :
- If the number of iterations reaches
$(2 \times NUMITERS - 1)$ :
-
Position Update: Agent's position changes according to the chosen action. The position is bounded within the range
$[0, N_{DAYS}-1]$ . -
Mutation Level: If an agent's position exceeds half of the days
$N_{DAYS}/2$ :- Mutation level increases if the action is 0.
- Mutation level decreases if the action is 1.
- Agent Parameter Changes: Depending on the mutation level, urgency, completeness, and complexity parameters of agents may change.
Environment Initialization
Upon environment reset, agent parameters and positions are initialized randomly within specified ranges. Initial observations and information are updated according to the current environment state.
The environment state space is discrete and defined by the set of parameters of all agents. Each agent can be assigned to one of the 14 days, resulting in a large number of possible system configurations.
Considering possible combinations of 4 agents over 14 days, the number of combinations can be expressed using the binomial coefficient.
Therefore, in this environment, it is possible to have 20475 different combinations of agents over 4 days out of 14. These combinations create a rich state space, allowing modeling of diverse scenarios and strategies.
The result of launch 'human' render mode is down below:
Read the detailed documentation of how the simulation program code works here.