Skip to content

7. Planning

Introduction 1

  • Planning introduces the idea that an agent must be able to reason about the future.
  • In the study of AI, planning is the decision-making process performed by intelligent agents like robots, or computer programs when trying to achieve a goal state.
  • Planning determines a sequence of necessary actions and WHEN those actions are necessary to accomplish the goal.

Planning with Certainty

  • It happens in a closed world where knowledge is complete and certain.
  • Within a closed world where an agent has complete knowledge, the result of a particular action can be known with certainty.
  • In this situation, the deterministic activity or activities required to achieve a goal are known.
  • Planning is simply a process of organizing the actions to achieve intermediary or final states that lead to the goals of the agent.
  • When there is certainty there are a number of different ways to represent planning.  These include:
    • Explicit State-space representations:
      • Actions are represented in terms of a graph that covers every possible state of the world.
      • Example: Tic-tac-toe game (unit 3).
    • Feature-based representations of actions:
      • The problem is represented by a set of rules that specify the preconditions and effects of actions.
      • The preconditions limits the states to be processed to those that satisfy the preconditions; hence, no need to consider all possible states.
      • Propositional logic is used to represent the preconditions and effects of actions.
    • The STRIPS representation:
      • Stanford Research Institute Problem Solver, one of the first robots built using AI.
      • Similar to feature-based representations, but it uses a more compact representation and can not directly represent conditional effects.
  • In some cases, a goal can be achieved with a single action.
  • In many cases, many actions must be performed to complete a goal.
  • Developing these lists of actions requires a planning strategy such as:
    • Forward planning.
    • Regression planning.
    • Partial-order planning.
    • Planning as CSPs (Constraint Satisfaction Problems).

Forward Planning:

  • Simply searching the state space for a solution.
  • The agent starts from the initial state and works forward to the goal state.
  • As a new action is added to the plan, the set of states may change, and the agent must re-search the state space again from the beginning.

Regression Planning:

  • The state space graph is replaced with a graph of goals, where each goal represents a set of assignments to some set of features.

Planning as CSPs (Constraint Satisfaction Problems):

  • The goal is a solution that meets a set of constraints.

Planning with Uncertainty

  • The best that we sometimes have is a strong likelihood of and outcome from an action.
  • Example: stock market planner agent.
  • In such uncertain situations we act based upon two factors”
    • The first is our beliefs and the second is our preferences.
    • The belief is an estimate of the probability that an event will occur.
    • The preference is the value of the goal that we are trying to attain.
  • Utility can be seen as a measure of this relationship between belief and preference.
  • Decision making when there is uncertainty is essentially a process of searching for a goal using the utility of any action as a cost factor in the process. In this process, the agent would section an action that would result in the greatest utility.
  • Decision networks are belief networks that have been extended to include the utility of actions. As we know, belief networks are graphs that incorporate potential actions and the probability (belief) that the event or outcome of the action will occur.
  • These probabilities can then be evaluated in the context of the utility (utility * probability) to determine the best course of action based upon the action that produces the greatest utility.

CH9: Planning with Uncertainty 2

12.3 Sequential Decisions

  • A more typical scenario is that the agent makes an observation, decides on an action, carries out that action, makes observations in the resulting world, then makes another decision conditioned on the observations, and so on.
  • Subsequent actions can depend on what is observed, and what is observed can depend on previous actions.
  • Information-seeking actions are actions that are taken to acquire more information and hence reduce uncertainty about the world.
  • Decision Networks:
    • It is a graphical representation of a finite sequential decision problem.
    • It extends belief networks to include decision variables and utilities.
    • It also extends the single-state decision network to allow for sequential decisions, and allows both chance nodes and decision nodes to be parents of decision nodes.
    • It is a directed acyclic graph (DAG) with chance nodes (drawn as ovals), decision nodes (drawn as rectangles), and a utility node (drawn as a diamond).
    • Arcs coming into decision nodes represent the information that will be available when the decision is made.
    • Arcs coming into chance nodes represent probabilistic dependence.
    • Arcs coming into the utility node represent what the utility depends on.
  • no-forgetting agent:
    • It is an agent whose decisions are totally ordered in time, and the agent remembers its previous decisions and any information that was available to a previous decision.
    • The nodes that come before other nodes in the graph are parents of those nodes.
    • All information available to the parents are available to the children.
  • Policies:
    • A policy specifies what the agent should do under all contingencies.
    • A policy consists of a decision function for each decision variable.
    • A decision function for a decision variable is a function that specifies a value for the decision variable for each assignment of values to its parents.
    • Thus, a policy specifies, for each decision variable, what the agent will do for each of the possible observations.
    • Examples:
      • Always bring the umbrella.
      • Bring the umbrella only if the forecast is “rain”.
      • Bring the umbrella if the forecast is “rain” or “cloudy”.

12.5 Decision Processes

  • For ongoing processes, it may not make sense to consider only the utility at the end, because the agent may never get to the end. Instead, an agent can receive a sequence of rewards.
  • A Markov decision process can be seen as a Markov chain augmented with actions and rewards or as a decision network extended in time.
  • At each stage, the agent decides which action to perform; the reward and the resulting state depend on both the previous state and the action performed.
  • A Markov decision process, or MDP, consists of:
    • A set of states.
    • A set of actions.
    • A transition model, which specifies the probability of each state given the previous state and the action performed.
    • A reward function, which specifies the expected reward for each state and action.

References


  1. Learning Guide Unit 7: Introduction | Home. (2025). Uopeople.edu. https://my.uopeople.edu/mod/book/view.php?id=454716&chapterid=555065 

  2. Poole, D. L., & Mackworth, A. K. (2017). Artificial Intelligence: Foundations of computational agents. Cambridge University Press. https://artint.info/2e/html/ArtInt2e.html - Chapter 9 – Planning with Uncertainty.