All Packages Class Hierarchy This Package Previous Next Index
Class sim.mdp.GridWorld
java.lang.Object
|
+----sim.mdp.MDP
|
+----sim.mdp.GridWorld
- public class GridWorld
- extends MDP
A Markov Decision Process or Markov Game that takes a state and action
and returns a new state and a reinforcement. It can be either
deterministic or nondeterministic. If the next state is
fed back in as the state, it can run a simulation. If the
state is repeatedly randomized, it can be used for learning
with random transitions.
This code is (c) 1996 Leemon Baird and Mance Harmon
<leemon@cs.cmu.edu>,
http://www.cs.cmu.edu/~baird
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.05, 22 Aug 97
- Author:
- Mance Harmon
-
GridWorld()
-
-
actionSize()
- Return the number of elements in the action vector.
-
findValAct(Matrix, Matrix, FunApp, Matrix, PBoolean)
- Find the value and best action of this state.
-
findValue(Matrix, Matrix, PDouble, FunApp, PDouble, Matrix, PDouble, PBoolean, NumExp, Random)
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor.
-
getAction(Matrix, Matrix, Random)
- Return the next action possible in a state given the last action performed.
-
getParameters(int)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
-
getState(Matrix, PDouble, Random)
- Return the next state to be used for training in an epoch-wise system.
-
initialAction(Matrix, Matrix, Random)
- Return the initial action possible in a state.
-
initialState(Matrix, Random)
- Return an initial state used for the start of epoch-wise training or for
training on trajectories.
-
nextState(Matrix, Matrix, Matrix, PDouble, PBoolean, Random)
- Find a next state given a state and action, and return the reinforcement received.
-
numActions(Matrix)
- Return the number of actions in a given state.
-
numPairs(PDouble)
- Return the number of state/action pairs in the MDP for a given dt.
-
numStates(PDouble)
- The number of states for this MDP is determined by the granularity factor that is passed in as
a parameter.
-
randomAction(Matrix, Matrix, Random)
- Generates a random action from those possible.
-
randomState(Matrix, Random)
- Generates a random state from those possible and returns it in the vector passed in.
-
stateSize()
- Return the number of elements in the state vector.
GridWorld
public GridWorld()
getParameters
public Object[][] getParameters(int lang)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
- Overrides:
- getParameters in class MDP
- See Also:
- getParameters
numStates
public int numStates(PDouble dt)
- The number of states for this MDP is determined by the granularity factor that is passed in as
a parameter. A granularity of 10 would produce a state space containing 121 states:
sqr(granularity+1)
- Overrides:
- numStates in class MDP
stateSize
public int stateSize()
- Return the number of elements in the state vector. In this case the state is a point
(x,y) in a 2D Euclidean space.
- Overrides:
- stateSize in class MDP
initialState
public void initialState(Matrix state,
Random random) throws MatrixException
- Return an initial state used for the start of epoch-wise training or for
training on trajectories. The start state for this MDP is the lower left
corner of the 2D grid (0,0).
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- initialState in class MDP
getState
public void getState(Matrix state,
PDouble dt,
Random random) throws MatrixException
- Return the next state to be used for training in an epoch-wise system.
This method is different than nextState() in that nextState() returns the state
transitioned to as a function of the dynamics of the system. This object simply
returns another state to be trained upon when performing epoch-wise training.
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- getState in class MDP
actionSize
public int actionSize()
- Return the number of elements in the action vector. The action vector is of length 1 and
has 4 possible values: 0 - East, 0.25 - North, 0.5 - West, 0.75 - South.
- Overrides:
- actionSize in class MDP
initialAction
public void initialAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the initial action possible in a state. This method is used when one has to iterate
over all possible actions in a given state. Given a state, this method should return the
initial action possible in the given state.
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- initialAction in class MDP
getAction
public void getAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the next action possible in a state given the last action performed.
This performs the same function as that of getState() in the sense that this serves
as an iterator over actions instead of states.
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- getAction in class MDP
numActions
public int numActions(Matrix state)
- Return the number of actions in a given state. For this MDP this number is constant for
all states. There are 4 actions possible in each state:
0 - East, 0.25 - North, 0.5 - West, 0.75 - South.
- Overrides:
- numActions in class MDP
numPairs
public int numPairs(PDouble dt)
- Return the number of state/action pairs in the MDP for a given dt. This is used for epoch-wise
training. An epoch would consist of all state/action pairs for a given MDP and is a function
of the step size dt. For this MDP we have a continuum of state/action pairs because we have
a continuum of states. The value returned from this method will be the pseudo-epoch size
passed in to this MDP is the parameter called epochSize.
- Overrides:
- numPairs in class MDP
randomAction
public void randomAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Generates a random action from those possible. Accepts a state and passes back an action.
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- randomAction in class MDP
randomState
public void randomState(Matrix state,
Random random) throws MatrixException
- Generates a random state from those possible and returns it in the vector passed in.
This returns a vector of length 2. Each element is in the range [0,1].
- Throws: MatrixException
- Vector passed in was wrong length.
- Overrides:
- randomState in class MDP
nextState
public double nextState(Matrix state,
Matrix action,
Matrix newState,
PDouble dt,
PBoolean valueKnown,
Random random) throws MatrixException
- Find a next state given a state and action, and return the reinforcement received.
All 3 should be vectors (single-column matrices).
The duration of the time step, dt, is also returned. Most MDPs
will generally make this a constant, given in the parsed string.
The goal state is the upper right corner of the grid world (x>1-dt, y>1-dt).
- Throws: MatrixException
- if sizes aren't right.
- Overrides:
- nextState in class MDP
findValAct
public double findValAct(Matrix state,
Matrix action,
FunApp f,
Matrix outputs,
PBoolean valueKnown) throws MatrixException
- Find the value and best action of this state. This returns the value of a given state as a double.
This also destroys the action that is passed in by replacing it with the best action. This
method always returns a value that is a function of state/action pairs. The value associated with
these state/action pairs might be Q-values or advantages, but it is not important to know which
learning algorithm is being used. This method should simply find the min or max value as a function
of the state/action pairs in the given state. For example, if Q-learning is the learning algorithm,
then one would find the max Q-value for the given state and return that value.
The action associated with that Q-value would be passed back. The state/action pair with the
max Q-value should be evaluated last so that findGradients() can be called from within
the learning algorithm without having to call function.evaluate().
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValAct in class MDP
findValue
public double findValue(Matrix state,
Matrix action,
PDouble gamma,
FunApp f,
PDouble dt,
Matrix outputs,
PDouble reinforcement,
PBoolean valueKnown,
NumExp explorationFactor,
Random random) throws MatrixException
- Find the max over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor. This method is used in
the object ValIteration (value iteration). The max value over actions () is returned.
The state associated with the optimal action is return 1-explorationFactor percent of the time.
Otherwise, a random next state is returned. The next state is passed back in state.
- Throws: MatrixException
- column vectors are wrong size or shape
- Overrides:
- findValue in class MDP
All Packages Class Hierarchy This Package Previous Next Index