next up previous
Next: Learning Up: Work Plan Previous: Perception

Deliberative and Reactive Planning

Reactive planning, consisting of pre-defined sensor-action rules, is well suited to effectively respond to dynamic changes in real-time environments. However, it is in general challenging to strategically reason about long or short-term objectives using reactive planning. Therefore, ideally, deliberative and reactive planning should be integrated.

We propose to develop an adaptive interleaving of deliberative and reactive planning as an extension to our initial approach for dealing with real-time dynamic environments [20].

Two main aspects are responsible for the success of our approach. First, the deliberative planner uses depth-bounded forward chaining guided by goal-based heuristics. Second, the real-time state space is discretized as a function of the average time that the deliberative planner needs to generate a plan. This ensures that the state, as seen by the deliberative planner, does not change in average while the plan is being generated. When a plan fails or a new plan is needed, the reactive planner takes over. We propose to extend our approach to fully autonomous multi-agent real-time domains, where the need for collaborative deliberative planning is particularly needed.

We propose to implement and demonstrate our integration using a deliberative planner (e.g. Prodigy [38] and the RoboCup soccer simulator server [24].

Several approaches have been pursued for the integration of planning and execution, which is related to the integration of deliberative and reactive planning. Our discussion of some approaches in this section is clearly not exhaustive, but we try to identify close work.

Most existing architectures for integrating deliberative planning in dynamic environments consist of a plan constructor and a plan executer interleaving passive planning phases and active plan execution phases. Recent systems in this category is the ROGUE architecture [15,16] that integrates and learns from dynamic changes in the environment, and a system developed by Nourbakhsh [25] that uses conditional planning for dealing with environmental uncertainty. These approaches encounter problems in domains with a high frequency of dynamic changes, such as robotic soccer. The agents must be able to respond instantly to changes in the environment during plan execution.

Architectures for addressing these problems have been proposed by Lansky's Procedural Reasoning System (PRS) [13] and Brooks' subsumption architecture [7], both with significant success in reactive systems. The subsumption approach lacks of high level problem-solving and lookahead which we are looking for.

Recently PRS has been applied to the RoboSoccer simulator domain [4]. Here KAs takes the form of simple plays among several agents, to allow agents to behave deliberatively and perform actions not just reactively optimal, but efficient in the overall game. A limitation of this approach though is the hand-coded nature of KAs. Agents should be able to develop new plays exploring the current situation using the available actions. In our approach the purpose of the depth bounded forward chaining planning is just to do that.

The Agenda Based Control for Agents Behaviors Coordination model (ABC2) [23] also addresses the problem of combining classical and reactive planning. This model resembles PRS by having a central control process selecting acts from an agenda containing a set of acts currently under consideration. Unfortunately ABC2like PRS does not reason about future states. When applied to the RoboSoccer simulator domain this approach thus has much in common with elaborate reactive approaches, where collaboration is handled by role based formations [31].

An important difference between our proposed approach and PRS and ABC2 is that we base reactive and deliberative planning on different representations of the domain, allowing a long lookahead of the deliberative planning while keeping a low response time of the reactive planning.

We further propose to develop techniques for multi-agent deliberative and reactive planning. Current research in this field mainly deals with how to represent and share multi-agent plans in order to make distributed plan generation and negotiate which plan to follow [30,12,40]. We use one central planning system, which all agents connect to. The resulting architecture is shown in Figure 1. The multi-agent plan generation consists of three steps:

1.
Fusion of possibly inconsistent and inaccurate world state observations from each agent to a consistent global world state defining the initial state of the planner.
2.
Generation of a multi-agent plan.
3.
Decomposition of the generated multi-agent plan to a coordinated single-agent plan for each agent.

In order to reduce the problem of coordinating the agents, their actions are synchronized by assuming all operators to take a fixed and equal amount of time. Further, to allow the planning system to handle an arbitrary number of agents, an operator only changes the state of a single agent.


  
Figure: Multi-agent deliberative and reactive planning architecture.

Nothing prevents the planning it-self to be asynchronous. But because operators for interactions between agents require synchronous state knowledge about the involved agents, we use synchronous planning. Thus, all agents must at least have reached time t before any operator, that leads an agent to time t+1 can be applied. Consequently the depth bounded forward chaining planning takes the form of a simulation of the combined multi-agent state some number of steps ahead in time. The reactive planning used in the multi-agent extension of the architecture is similar to the reactive planning used in the single-agent case. As argued by [31] roles are assigned to agents to enable efficient collaboration.

The timing diagram in Figure 2 illustrates how plan generation and plan execution changes the low level action generation of an agent. In the diagram it is assumed, that the system consist of two agents. Thus, statei denotes the state send by agent i to Prodigy, while plani denotes the plan received by agent i from prodigy.


  
Figure: Action generation timing diagram ( nstep = 3 and nagt = 2).

In summary, we propose to fully investigate the integration of deliberative planning in real-time dynamic environments. We believe it is feasible and efficient given an abstract representation of the domain.

We propose to introduce a planner-dependent approach for constructing a state abstraction used for the deliberative planning. Further, we will develop algorithms for adaptive interleaving of reactive and deliberative planning, that in contrast to other previous approaches (e.g. [13,4,23]) build upon different representations of the domain. This work will contribute to the open question on how to generate plans in dynamic domains by proposing frequent depth bounded re-planning as a means for integrating dynamic changes. Our approach will apply to highly dynamic real-time domains like the soccer domain or a flight control domain.


next up previous
Next: Learning Up: Work Plan Previous: Perception