Vignette: The need for cooperation#
Due to the complexity present in multi-agent environments, the optimal policy of an agent depends not only on the environment, but on the policies of other agents as well. It is important that agents are capable of coordinating to find a good solution . As an illustrative example, consider the scenario shown in Fig. 6(a). Cargo must be picked up from airports 1 and 2 and delivered to airport 3. An optimal policy may route the cargo between airports 1 and 3 via airport 4 (where the airplane can be refueled). Likewise, it may route cargo between airports 2 and 3 via airport 0. Now, suppose that the route between airports 4 and 3 become unavailable for a long duration due to a disruption, as shown in Fig. 6(b). In this case, since airport 4 no longer provides a viable route to the destination, a new policy for airplanes at airport 1 may route them through airport 0. However, this introduces a bottleneck at Airport 0. If we instead re-calculate the optimal policy for all planes, we may find a new policy as shown in Fig. 6©. This policy takes into account the interaction of the planes at airport 0, and opts to re-route the cargo from airport 2 through airport 5 instead of airport 0.
We include the following two baselines:
Random Agent. This agent performs random actions. It ensures that the actions are valid, e.g., by ensuring that it only loads cargo that is stored at the current airport.
Shortest Path Agent. The algorithm assigns a single random cargo item to each airplane, and routes the airplane to pickup and deliver the cargo by following the shortest path. If the airplane cannot bring the cargo all the way to its destination, the cargo will be dropped off as close to the destination as possible, where it can be assigned to a different airplane.