All Packages Class Hierarchy This Package Previous Next Index
Class sim.mdp.MDP
java.lang.Object
|
+----sim.mdp.MDP
- public abstract class MDP
- extends Object
- implements Watchable, Parsable
a Markov Decision Process or Markov Game that takes a state and action
and returns a new state and a reinforcement. It can be either
deterministic or nondeterministic. If the next state is
fed back in as the state, it can run a simulation. If the
state is repeatedly randomized, it can be used for learning
with random transitions. If an MDP class is written for which
an optimal policy and value function are known, then
findAction() and findValue() will return them, otherwise
they just return null and zero respectively.
Revision 1.01 added the state parameter to the findValAct method
This code is (c) 1996 Leemon Baird and Mance Harmon
<leemon@cs.cmu.edu>,
http://www.cs.cmu.edu/~baird
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.12, 23 Aug 97
- Author:
- Leemon Baird, Mance Harmon
-
action
- an action vector (created in parse())
-
nextState
- the state vector resulting from doing action in state (created in parse())
-
state
- a state vector (created in parse())
-
watchManager
- the WatchManager that variables here may be registered with
-
wmName
- the prefix string for the name of every watched variable (passed in to setWatchManager)
-
MDP()
-
-
actionSize()
- Return the number of elements in the action vector.
-
BNF(int)
-
-
findValAct(Matrix, Matrix, FunApp, Matrix, PBoolean)
- Find the value and best action of this state.
-
findValue(Matrix, Matrix, PDouble, FunApp, PDouble, Matrix, PDouble, PBoolean, NumExp, Random)
- Find the optimum over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor.
-
getAction(Matrix, Matrix, Random)
- Return the next action possible in a state given the last action performed.
-
getName()
- Return the variable "name" that was passed into setWatchManager
-
getParameters(int)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
-
getState(Matrix, PDouble, Random)
- Return the next state to be used for training in an epoch-wise system.
-
getWatchManager()
- Return the WatchManager set by setWatchManager().
-
initialAction(Matrix, Matrix, Random)
- Return the initial action possible in a state.
-
initialize(int)
- Initialize, either partially or completely.
-
initialState(Matrix, Random)
- Return an initial state used for the start of an epoch (for epoch-wise training) or for
the start of a trial (when training on trajectories).
-
nextState(Matrix, Matrix, Matrix, PDouble, PBoolean, Random)
- Find a (possibly stochastic) next state given a state and action,
and return the (possibly stochastic) reinforcement received.
-
numActions(Matrix)
- Return the number of actions in a given state.
-
numPairs(PDouble)
- Return the number of state/action pairs in the MDP for a given dt.
-
numStates(PDouble)
- Return the number of states in the given MDP.
-
parse(Parser, int)
- Parse the input file to get the parameters for this object.
-
randomAction(Matrix, Matrix, Random)
- Generates a random action from those possible.
-
randomState(Matrix, Random)
- Generates a random state from those possible and returns it in the vector passed in.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
stateSize()
- Return the number of elements in the state vector.
-
unparse(Unparser, int)
- Output a description of this object that can be parsed with parse().
watchManager
protected WatchManager watchManager
- the WatchManager that variables here may be registered with
wmName
protected String wmName
- the prefix string for the name of every watched variable (passed in to setWatchManager)
state
protected Matrix state
- a state vector (created in parse())
action
protected Matrix action
- an action vector (created in parse())
nextState
protected Matrix nextState
- the state vector resulting from doing action in state (created in parse())
MDP
public MDP()
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
Override this if there are internal variables that
should be registered here.
getName
public String getName()
- Return the variable "name" that was passed into setWatchManager
getWatchManager
public WatchManager getWatchManager()
- Return the WatchManager set by setWatchManager().
numStates
public int numStates(PDouble dt)
- Return the number of states in the given MDP. If the true number of states is infinite, then
this defines the sample size of a pseudo-epoch. If the number of states is finite, then the number
of states might be a function of the time step size dt. For this reason a step size dt is passed
into this object. There is no need to override this if epoch-wise training will never be done.
stateSize
public abstract int stateSize()
- Return the number of elements in the state vector.
initialState
public abstract void initialState(Matrix state,
Random random) throws MatrixException
- Return an initial state used for the start of an epoch (for epoch-wise training) or for
the start of a trial (when training on trajectories).
This might not always be the same state,
but could randomly return one of a set of legal starting states.
- Throws: MatrixException
- Vector passed in was wrong length.
getState
public void getState(Matrix state,
PDouble dt,
Random random) throws MatrixException
- Return the next state to be used for training in an epoch-wise system.
This method is different than nextState() in that nextState() returns the state
transitioned to as a function of the dynamics of the system. getState() simply
returns another state to be trained upon when performing epoch-wise training. This
method should incrementally return unique states until all states for the epoch have
been used for training. For example: if state space consists of 20 unique states, then
this method will return a unique state until all 20 states have been return. The method
would then start over in a new series of the same 20 states. The parameters are the last
state used and a time step size. In short, this is an iterator over all states in state space.
If the state space is infinite this method should not be used and is not meaningful.
There is no need to override this for infinite state spaces.
- Throws: MatrixException
- Vector passed in was wrong length.
actionSize
public abstract int actionSize()
- Return the number of elements in the action vector.
initialAction
public void initialAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the initial action possible in a state. This method is used when one has to iterate
over all possible actions in a given state. Given a state, this method should return the
initial action possible in that state. There is no need to override this if the action is
a scalar ranging from 0 to some maximum value.
- Throws: MatrixException
- Vector passed in was wrong length.
getAction
public void getAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Return the next action possible in a state given the last action performed.
This performs the same function as that of getState() in the sense that this serves
as an iterator over actions instead of states. There is no need to override this if
the legal actions are some range of contiguous integers.
- Throws: MatrixException
- Vector passed in was wrong length.
numActions
public abstract int numActions(Matrix state)
- Return the number of actions in a given state. For simplicity this should be the same
for all states. However, the state is being passed in to this method so that future
code can take advantage of this parameter if necessary.
numPairs
public int numPairs(PDouble dt)
- Return the number of state/action pairs in the MDP for a given dt. This is used for epoch-wise
training. An epoch would consist of all state/action pairs for a given MDP and may be a function
of the step size dt. There is no need to override this if there is an infinite state space
so no "epoch" is defined.
randomAction
public void randomAction(Matrix state,
Matrix action,
Random random) throws MatrixException
- Generates a random action from those possible. Accepts a state and passes back an action.
Each action variable should be on a seperate row. action should be a vector (single-column matrix): Nx1
There is no need to override this if the legal actions are integers from 0 to numActions(state).
- Throws: MatrixException
- Vector passed in was wrong length.
randomState
public void randomState(Matrix state,
Random random) throws MatrixException
- Generates a random state from those possible and returns it in the vector passed in.
This should NOT include terminal states where the value is known. There is no need to
override this is the MDP is such that random states cannot be jumped to, and all
training will be on trajectories starting from legal start states.
- Throws: MatrixException
- Vector passed in was wrong length.
nextState
public abstract double nextState(Matrix state,
Matrix action,
Matrix newState,
PDouble dt,
PBoolean valueKnown,
Random random) throws MatrixException
- Find a (possibly stochastic) next state given a state and action,
and return the (possibly stochastic) reinforcement received.
All 3 should be vectors (single-column matrices).
The duration of the time step, dt, is also returned. Most MDPs
will generally make this returned dt a constant, given in the parsed string.
But a semi-Markov decision process could return a different dt every time.
If the resulting state's value is perfectly known then the flag valueKnown should be
set to true.
- Throws: MatrixException
- if sizes aren't right.
findValAct
public abstract double findValAct(Matrix state,
Matrix action,
FunApp f,
Matrix outputs,
PBoolean valueKnown) throws MatrixException
- Find the value and best action of this state. This returns the value of a given state as a double.
This also destroys the action that is passed in by replacing it with the best action. This
method always returns a value that is a function of state/action pairs. The value associated with
these state/action pairs might be Q-values or advantages, but it is not important to know which
learning algorithm is being used. This method should simply find the min or max value as a function
of the state/action pairs in the given state. For example, if Q-learning is the learning algorithm,
then one would find the max Q-value for the given state and return that value.
The action associated with that Q-value would be passed back. The state/action pair with the
max Q-value should be evaluated last so that findGradients() can be called from within
the learning algorithm without having to call function.evaluate().
- Throws: MatrixException
- column vectors are wrong size or shape
findValue
public abstract double findValue(Matrix state,
Matrix action,
PDouble gamma,
FunApp f,
PDouble dt,
Matrix outputs,
PDouble reinforcement,
PBoolean valueKnown,
NumExp explorationFactor,
Random random) throws MatrixException
- Find the optimum over action for where V(x') is the value of the successor state
given state x, R is the reinforcement, gamma is the discount factor. This method is used in
the object ValIteration (value iteration). The max value over actions () is returned.
The state reached after performing the optimal action should be returned 'explorationFactor' percent of
the time in the parameter 'state'. The state resulting from a random action will be returned
1-explorationFactor percent of the time. The possibility of explorationFactor==null must be handled.
The action parameter must be checked for a null value before implementing. The learning
object 'ValueIteration' passes in a null in the place 'action'.
- Throws: MatrixException
- column vectors are wrong size or shape
getParameters
public Object[][] getParameters(int lang)
- Return a parameter array if BNF(), parse(), and unparse() are to be automated, null otherwise.
- See Also:
- getParameters
initialize
public void initialize(int level)
- Initialize, either partially or completely.
- See Also:
- initialize
BNF
public String BNF(int lang)
unparse
public void unparse(Unparser u,
int lang)
- Output a description of this object that can be parsed with parse().
parse
public Object parse(Parser p,
int lang) throws ParserException
- Parse the input file to get the parameters for this object.
- Throws: ParserException
- parser didn't find the required token
All Packages Class Hierarchy This Package Previous Next Index