Simulator Interface

Simulator Interface#

The simulator takes the form of a cooperative multi-agent system [11] implemented using the OpenAI Gym and Petting Zoo interfaces. The environment is advanced by calling the environment step method:

obs, rewards, dones, infos = env.step(actions)

This method receives actions for all agents from the algorithm, and updates the internal state of the environment. It then returns an observation of the environment (which is customizable by the participants) and a reward. All variables are obs, rewards, dones, infos, and actions are dictionaries which are indexed by agent ID. A list of agent IDs can be obtained from env.agents.

Action Space#

The simulator implements incorporates aspects of the OpenAI Gym and Petting Zoo “Parallel Environment” reinforcement learning interfaces. The action space is defined in Table 4. This space adheres to the OpenAI Gym space standard, and incorporates a custom-defined “List” space. The actions are designed such that the agent does not need to provide fine-grained control of the airplane, but rather can provide a list of “orders” for the airplane to follow at each airport. For example, by issuing a single action, the agent can order an airplane to (1) load cargo, (2) unload cargo, (3) set priority and (4) take off for a destination when done processing. At subsequent steps, the airplane state machine will execute the action automatically, and the agent need not issue a new action until the airplane reaches its next destination (or a change in the environment requires an updated action plan). As the state machine proceeds, it will automatically update the current action (e.g. emptying the cargo_to_load list once the cargo has started loading).

Priority of the airplane effects the processing queue. This priority can be updated at every time-step, but cannot be updated after an agent enters the queue for processing. Once processing is complete, the priority can then be updated again. The number of priorities allowed in the action space is equal to the total number of agents within the scenario.

Table 4 Action Space#
Key	Model Definition	Space Type (see Open AI Gym spaces)	Description	Valid values from observation `obs`
`priority`	\(\priority{p}\)	`Discrete(num_agents)`	Integer indicating the priority of the airplane. Effects the ordering of the queue for processing.	0 = Do not process 1 = Highest priority. num_agents in scenario = Lowest priority.
`cargo_to_load`	\(\cargotoload{p}\)	`List(Discrete(max_num_cargo), maxsize=max_cargo_on_plane)`	Cargo ID’s to load onto plane when processing	Choose any subset of cargo from `obs["cargo_at_current_airport"]`
`cargo_to_unload`	\(\cargotounload{p}\)	`List(Discrete(max_num_cargo), maxsize=max_cargo_on_plane)`	Cargo ID’s to unload from plane when processing	Choose any subset of cargo from `obs["cargo_onboard"]`
`destination`	\(destination\)	`Discrete(max_airports + 1)`	Contains ID of an airport where the Airplane will travel to next.	Choose a single airport ID from `obs["available_routes"]`

Observation Space#

Table 5 shows the observation space itself. The observation contains information specific to an agent, whereas the “globalstate” entry contains information global to all agents (including the observations of all other agents). Note that the “globalstate” value is the same as that returned by the environment’s state() method. Table 6 and Table 7 describe subspaces that are used in the observation, and we use a custom space named DiGraph which represents a NetworkX directed graph.

Table 5 Observation Space (Dictionary). Provides observation for Airplane/Agent \(p \in \Airplanes\)#
Key	Model Definition	Space Type	Description
`current_airport`	\(\currentairport{p}{t}\)	`Discrete(max_airports+1)`	ID of airport where airplane currently is located, or `NOAIRPORT_ID` if `MOVING`
`cargo_onboard`	\(\CargoOnPlane{p}{t}\)	`List(Discrete(max_num_cargo), maxsize=max_cargo_on_airplane)`	ID’s of cargo that is onboard the airplane
`max_weight`	\(\weightcapacity{p}\)	`Discrete(10000)`	Maximum cargo weight that can be carried by airplane
`current_weight`	\(\sum_{c \in \CargoOnPlane{p}{t}} \cargoweight{c}\)	`Discrete(10000)`	Current cargo weight carried by airplane
`cargo_at_current_airport`	\(\CargoAtAirport{\currentairport{p}{t}}{t}\)	`List(Discrete(max_num_cargo), maxsize=max_cargo_on_airplane)`	ID’s of cargo that is stored at the current airport.
`state`	N/A	`Discrete(4)`	Contains the agent state. See: list of airplane states and state machine definition.
`available_routes`	\(\{ a_2 \in \Airports \| (\currentairport{p}{t}, a_2) \in \AvailableRoutes{t} \}\)	`List(Discrete(max_airports+1), maxsize=max_airports)`	ID’s of airports that can be reached from the current airport. Routes which are disabled will not be included in this list.
`globalstate`	N/A	State Space	The global state of the environment.
`next_action`	N/A	Action Space (see Table 4)	Contains the last action issued by the agent. Entries will be modified by the state machine as transitions occur.
`scenario_info`	N/A	`Discrete(1000)`	Contains information about the scenario. At the moment this only includes processing time.

The state space uses the following named tuple to describe cargo:

Table 6 CargoInfo Space (NamedTuple). Describes an individual piece of cargo \(c \in \Cargos\).#
Key	Model Definition	Space Type	Description
`id`	N/A	`Discrete(max_num_cargo)`	ID of the cargo
`location`	\(\cargoloc{c}\)	`Discrete(max_airports+1)`	ID of airport where cargo is located, or `NOAIRPORT_ID` if cargo is on an airplane or being loaded/unloaded
`destination`	\(\cargodest{c}\)	`Discrete(max_airports+1)`	ID of destination airport where cargo is to be delivered.
`weight`	\(\cargoweight{c}\)	`Discrete(10000)`	Weight of the cargo
`soft_deadline`	\(\softdeadline{c}\)	`Discrete(10000)`	Soft deadline for delivery. This is the target delivery time.
`hard_deadline`	\(\harddeadline{c}\)	`Discrete(10000)`	Hard deadline to deliver cargo by. Considered missed if not met.

Table 7 StateSpace (Dictionary) used by observation space. This contains global state that is common to all agents.#
Key	Space Type	Description
`active_cargo`	`List(CargoInfo, maxsize=max_num_cargo)`	List of active cargo (cargo not delivered or missed)
`event_new_cargo`	`List(CargoInfo, maxsize=max_num_cargo)`	List of new cargo which was generated and added to the active cargo list in the last step.
`agents`	Dictionary of agent observations	Contains the observation for each agent
`route_map`	`DiGraph(number_of_nodes=max_airports, node_attributes=['working_capacity'], edge_attributes=['cost', 'time', 'route_available', 'mal', 'expected_mal_steps'])`	Route map which contains information about airports and routes. This is a NetworkX DiGraph class.

Table 8 ScenarioSpace (NamedTuple) used by observation space. This contains general information about the scenario itself.#
Key	Space Type	Description
`processing_time`	`Discrete(1000)`	Processing time for all airports.

Reward#

The reward signal is shown in Table 9, and follows directly from the objective function defined in (4).

Table 9 Rewards given at each step of the simulation.#
Condition	Reward Value
When cargo item \(c \in \Cargos\) becomes missed	\(-\alpha\)
For each time step where cargo item \(c \in \Cargos\) is late	\(-\beta\)
Each time step during which an airplane is in flight	\(-\gamma\)

Info Dictionary#

The step method returns an info dictionary with an entry for each agent. The value is a dictionary with the following entries:

Table 10 Info dictionary.#
Key	Space Type	Description
`warnings`	`List(str)`	List of warnings resulting from the action (for example trying to load cargo that is not at the current airport).
`timeout`	`boolean`	(Only returned by the local evaluator env_step method) Indicates whether there was a timeout during this step. This will occur when the solution policy takes too long to issue the next action.

Stopping criteria#

The episode ends when all cargo is either (1) delivered or (2) missed, and when there is no more cargo to be generated by the dynamic cargo generator.